This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The AI Platform team is responsible for building the foundational infrastructure that powers intelligent experiences across Mozilla products. This includes model training pipelines, high-throughput inference services, GPU orchestration, and secure, privacy-respecting AI systems that operate reliably at global scale. We’re looking for a Machine Learning Engineer with a strong platform mindset to help design, build, and operate Mozilla’s AI platform.
Job Responsibility
Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
Requirements
Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
Hands-on experience working with GPU-based workloads and accelerated computing in production settings
Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
Ability to independently scope and drive technical initiatives while balancing product and operational priorities
Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Nice to have
Experience implementing inference optimization strategies such as batching, quantization, compilation, model conversion, or hardware-specific tuning
Familiarity with containerization and orchestration systems (e.g., Docker, Kubernetes) in production environments
Experience designing observability systems for distributed services, including metrics strategy and performance profiling
Exposure to privacy-preserving ML techniques, security best practices, or responsible AI system design
Contributions to open-source ML infrastructure projects or leadership in building reusable internal ML tooling
What we offer
Generous performance-based bonus plans
Rich medical, dental, and vision coverage
Generous retirement contributions with 100% immediate vesting
Quarterly all-company wellness days
Country specific holidays plus a day off for your birthday