This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Platform Engineer – AIOps & Infrastructure will be responsible for designing, automating, and maintaining scalable infrastructure and platform services for AI/ML operations. This role combines Platform Engineering, DevOps, Cloud Infrastructure, and MLOps, ensuring high availability, observability, security, and operational excellence across production environments.
Job Responsibility
Design and maintain scalable cloud-native infrastructure for AI/ML workloads
Manage Kubernetes environments, container orchestration, and platform services
Build and optimize CI/CD pipelines and Infrastructure-as-Code frameworks
Support MLOps and LLMOps workflows, including deployment, monitoring, and lifecycle management
Implement monitoring, logging, alerting, and observability solutions
Drive DevSecOps, automation, security, and reliability best practices
Collaborate with AI Engineers, Data Scientists, and Infrastructure teams to support production AI systems
Participate in troubleshooting, incident response, and platform optimization initiatives
Requirements
Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent experience
5+ years of experience in Platform Engineering, DevOps, Cloud Infrastructure, SRE, MLOps, or related fields
Strong experience with AWS, Azure, or GCP
Hands-on expertise with Kubernetes, Docker, and Infrastructure-as-Code tools (Terraform, CloudFormation, or similar)
Experience building CI/CD pipelines and automation workflows
Strong scripting skills using Python, Bash, or similar languages
Experience with monitoring and observability platforms such as Grafana, Prometheus, Datadog, or ELK
Advanced English proficiency (B2 - C1)
Comfortable working remotely with minimal supervision
Proactive, detail-oriented, and collaborative
Ability to thrive in a fast-paced, startup-like environment.
Nice to have
Experience supporting enterprise-scale AI/ML or Generative AI platforms in production
Strong knowledge of MLOps and LLMOps ecosystems
Experience with MLflow, Kubeflow, Airflow, SageMaker, or similar tools
Familiarity with GPU workloads, distributed systems, and AI inference infrastructure
Experience implementing DevSecOps, governance, compliance, and security frameworks
Background in high-availability and scalable cloud environments