This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our enterprise clients are moving from fragmented data foundations to AI-first data platforms capable of supporting large-scale, business-critical AI systems. AI performance is directly constrained by data quality, availability, governance, and latency. This role exists to build and operate the data backbone that enables reliable, scalable, and compliant AI at enterprise scale.
Job Responsibility:
Design and operate compute platforms for AI workloads (CPU, GPU, accelerators)
Manage hybrid and cloud-based AI infrastructures
Ensure high availability, resilience, and performance of AI platforms
Plan and manage capacity for training and inference workloads
Operate containerized and virtualized environments supporting AI systems
Manage storage and networking optimized for data-intensive workloads
Implement observability, monitoring, and incident response for AI platforms
Ensure operational readiness and 24/7 reliability where required
Optimize infrastructure for cost, throughput, and latency
Implement FinOps practices for AI compute and storage
Balance performance requirements with budget and sustainability constraints
Support scaling strategies from POC to enterprise-wide deployment
Requirements:
Strong background in infrastructure, cloud, or platform operations
Experience operating high-performance or data-intensive systems
Exposure to AI or ML workloads in production environments
Operations-driven engineering mindset
Strong sense of ownership and accountability
Comfortable operating under reliability and performance constraints
Continuous improvement approach to scalability and cost efficiency