This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Principal ML Ops Engineer who will lead the design and operationalization of ML systems on AI/ML platforms such as AWS SageMaker and H2O.ai. This role focuses on building scalable ML systems rather than individual models and includes leadership responsibilities for mentoring and grooming talent in global capability centers (GCC) with potential onshore leadership opportunities. The position also requires hands-on experience with GenAI, including building intelligent agents and exposure to Agentic AI concepts.
Job Responsibility:
Lead and mentor engineering teams, including GCC talent development and potential onshore leadership
Architect, design, and build ML engineering systems on the CFG ML Platform to accelerate ML pipeline delivery
Develop and enhance platform capabilities and frameworks to standardize and automate ML pipeline deployment
Implement capabilities such as feature stores, feature tracking, feature serving (real-time and batch), model performance monitoring, model lineage tracking, model health, and model serving and consumption (real-time, batch, event-triggered, near real-time using Kafka)
Define processes, research market trends, and implement best practices for ML pipeline development and deployment
Collaborate with business teams, data science teams, enterprise architects, and security to uphold ML engineering standards
Develop CI/CD pipelines for continuous integration and delivery of ML models
Identify and automate ML pipeline and model deployment patterns to streamline workflows
Troubleshoot and resolve issues related to ML system performance and deployment
Contribute to GenAI initiatives, including building intelligent agents and integrating them into ML Ops workflows
Demonstrate exposure to Agentic AI concepts and proof-of-concepts (POCs)
Requirements:
7+ years of experience with Python for scripting ML workflows
5+ years of experience deploying ML pipelines and systems using AWS SageMaker
3+ years of experience developing APIs with Flask, Django, or FastAPI
2+ years of experience with ML frameworks and tools such as scikit-learn, PyTorch, XGBoost, LightGBM, MLflow
Solid understanding of the ML lifecycle: model development, training, validation, deployment, and monitoring
Solid understanding of CI/CD pipelines for ML workflows using Bitbucket, Jenkins, Nexus
Experience with ETL processes for ML pipelines using Spark and Kafka
Bachelor’s Degree or equivalent combination of education, training, and experience required
Nice to have:
Preferred experience with H2O.ai
Preferred experience with containerization using Docker and orchestration using Kubernetes
Required exposure to GenAI and Agentic AI concepts, including building or contributing to POCs