This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Senior DevOps Engineer to design, build, and operate scalable, reliable cloud-native infrastructure that supports modern applications, including platforms running LLM and Generative AI workloads. You will work closely with software engineers, ML/AI engineers, and product teams to enable fast, secure, and resilient deployments. You’ll work closely with data scientists, ML engineers, platform teams, and product managers to operationalize AI/ML workflows, enabling smooth transitions from experimentation to production. You’ll also define test strategies, automate validation pipelines, and champion the overall quality of Teradata’s AI/ML platform and analytic products.
Job Responsibility:
Design, implement, and maintain Kubernetes-based platforms for production workloads
Build and manage containerized applications using Docker
Create, maintain, and version Helm charts for application and service deployments
Own CI/CD pipelines and deployment strategies (rolling, blue-green, canary, etc.)
Improve system reliability, scalability, observability, and operational excellence
Partner with AI/ML teams to support LLM and GenAI workloads in production environments
Troubleshoot complex infrastructure and deployment issues across environments
Establish best practices for infrastructure automation, security, and cost efficiency
Requirements:
5+ years of experience in DevOps, SRE, or Platform Engineering roles
Strong hands-on experience with Kubernetes in production environments
Proficient with Docker and container lifecycle management
Solid experience building and managing Helm charts
Deep understanding of deployment strategies and CI/CD pipelines
Experience operating cloud-native systems (AWS, Azure, or GCP)
Understanding of how LLM and Generative AI systems are deployed and operated
Familiarity with serving LLM models, inference workloads, or AI services in Kubernetes
Awareness of performance, scaling, and cost considerations for GenAI workloads
Nice to have:
Bachelor’s or Master’s in Computer Science, Artificial Intelligence, or a related field (or equivalent experience)
Experience supporting ML platforms, model serving, or vector databases
Knowledge of GPU scheduling, resource optimization, or high-performance workloads
Experience with observability tools (Prometheus, Grafana, OpenTelemetry, etc.)
Infrastructure as Code (Terraform, CloudFormation, etc.)