This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Shape the Future of Intelligent Operations as a Site Reliability Engineer (AI Ops / ML Ops). Are you passionate about deploying, monitoring, and scaling machine learning systems in production environments? Trimble's Construction Management Solutions (CMS) division is looking for a driven, early-career Site Reliability Engineer to join our high-performing team in Chennai. In this role, you will help build and manage robust AI infrastructure, bridging the gap between cutting-edge data science and resilient cloud operations.
Job Responsibility
Assist in the deployment and maintenance of machine learning models in production under direct supervision, building skills in containerization and orchestration architectures
Support the development of robust continuous integration and deployment pipelines for ML workflows, including model versioning, automated testing, and release processes
Monitor production ML model performance, detect data drift, and track system health by implementing foundational logging, alerting, and metrics solutions
Contribute to infrastructure automation and configuration management for machine learning workloads, learning to treat infrastructure as software
Partner closely with ML engineers and data scientists to operationalize complex models, ensuring reliability, scale, and strict adherence to established operational patterns
Requirements
1 to 2 years of professional experience in a DevOps, MLOps, or systems engineering environment
Bachelor's degree in Computer Science, Engineering, Information Technology, or a closely related technical field
Direct experience with Microsoft Azure cloud platforms and its specialized ecosystem services (such as Azure ML and Azure DevOps)
Proficiency with Python or other scripting languages (Shell / Bash / PowerShell) for rapid system integration and task automation
Foundational understanding of containerization (Docker), basic orchestration concepts (Kubernetes fundamentals), and version control system workflows (Git)
Solid baseline knowledge of fundamental DevOps principles (CI/CD, system administration) and a basic understanding of the end-to-end machine learning model lifecycle
Nice to have
Familiarity with MLOps tracking tools and open-source frameworks (MLflow, Kubeflow, DVC, or similar)
Basic experience with monitoring software suites (Prometheus, Grafana, ELK stack)
Exposure to Infrastructure as Code (IaC) configuration tools like Terraform or Ansible
Knowledge of database systems, data pipeline technologies, or model serving frameworks (TensorFlow Serving, TorchServe, ONNX Runtime)
Experience with cross-platform (Windows/Linux) command-line administration and a basic understanding of cloud security best practices for AI workloads
What we offer
Structured environment to accelerate technical skills
Direct guidance from experienced engineering professionals
Projects that improve productivity, quality, safety, transparency and sustainability