This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Sr. Manager/Staff Engineer, AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting, building, and scaling Pfizer’s AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering, DevOps, and MLOps to deliver robust, high-performance solutions supporting advanced AI/ML workloads in biotechnology, healthcare, and enterprise technology. The successful candidate will drive innovation in automation, reliability, and scalability, enabling scientists and engineers to rapidly develop, deploy, and monitor machine learning models in production environments.
Job Responsibility:
Design, implement, and own large-scale cloud-based HPC and MLOps platforms supporting AI model training, genomic sequencing, and precision medicine
Lead the development of developer and cloud platforms, including internal engineering accelerators and reusable toolsets
Design, implement, and manage unified platform catalogs using Backstage, enhancing developer experience and application metadata management
Develop custom plugins and APIs for Backstage to support internal engineering workflows and documentation
Build and maintain Python-based automation frameworks, CI/CD pipelines, and Infrastructure-as-Code (Terraform, Helm, Pulumi, AWS CDK)
Operationalize containerized solutions using Docker and Kubernetes, integrating MLflow, Kubeflow, and other orchestration platforms
Implement robust automation for provisioning, configuring, and managing cloud resources across multiple environments
Lead the implementation of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and advanced observability (Prometheus, Grafana, PagerDuty)
Develop and maintain APIs and services for model management, feature stores, and inference pipelines
Operationalize ML model serving at scale using frameworks such as TensorFlow Serving, TorchServe, KServe, and Seldon Core
Ensure compliance with industry standards (e.g., HIPAA, FDA) for data protection and reliability
Mentor engineers and lead cross-functional teams to deliver integrated solutions
Champion engineering excellence through design documentation, code reviews, and testing automation
Present at industry summits, author technical proposals, and contribute to open-source projects (Kubernetes, Helm, Go, Envoy)
Drive agile delivery, sprint planning, and performance optimization
Lead incident response and disaster recovery initiatives for mission-critical platforms
Foster a culture of shared ownership, transparency, and innovation
Requirements:
8+ years of hands-on software engineering experience in cloud infrastructure, DevOps, and MLOps
Deep expertise in Python, Kubernetes, Terraform, Helm, and CI/CD pipeline development
Proven experience architecting and operating containerized solutions on AWS, GCP, and Azure
Strong knowledge of Infrastructure-as-Code, distributed systems, and production system reliability
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
Nice to have:
Expertise in AWS cloud services (EC2, S3, Lambda, EKS, SageMaker, API Gateway, CloudFormation, IAM, etc.)
Experience deploying and customizing Backstage as a unified catalog for teams, services, and technical documentation
Experience building and deploying microservices and REST/gRPC APIs for AI model delivery
Familiarity with MLflow, Kubeflow, and other MLOps orchestration platforms
Proficiency with model serving frameworks (TensorFlow Serving, TorchServe, KServe, Seldon Core, BentoML, etc.)