This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As the leading delivery platform in the region, we have a unique responsibility and opportunity to positively impact millions of customers, restaurant partners, and riders. To achieve our mission, we must scale and continuously evolve our machine learning capabilities, including cutting-edge Generative AI (genAI) initiatives. This demands robust, efficient, and scalable ML platforms that empower our teams to rapidly develop, deploy, and operate intelligent systems. As an ML Platform Engineer, your mission is to design, build, and enhance the infrastructure and tooling that accelerates the development, deployment, and monitoring of traditional ML and genAI models at scale. You’ll collaborate closely with data scientists, ML engineers, genAI specialists, and product teams to deliver seamless ML workflows—from experimentation to production serving—ensuring operational excellence across our ML and genAI systems.
Job Responsibility:
Design, build, and maintain scalable, reusable, and reliable ML platforms and tooling that support the entire ML lifecycle, including data ingestion, model training, evaluation, deployment, and monitoring for both traditional and generative AI models
Develop standardized ML workflows and templates using MLflow and other platforms, enabling rapid experimentation and deployment cycles
Implement robust CI/CD pipelines, Docker containerization, model registries, and experiment tracking to support reproducibility, scalability, and governance in ML and genAI
Collaborate closely with genAI experts to integrate and optimize genAI technologies, including transformers, embeddings, vector databases (e.g., Pinecone, Redis, Weaviate), and real-time retrieval-augmented generation (RAG) systems
Automate and streamline ML and genAI model training, inference, deployment, and versioning workflows, ensuring consistency, reliability, and adherence to industry best practices
Ensure reliability, observability, and scalability of production ML and genAI workloads by implementing comprehensive monitoring, alerting, and continuous performance evaluation
Integrate infrastructure components such as real-time model serving frameworks (e.g., TensorFlow Serving, NVIDIA Triton, Seldon), Kubernetes orchestration, and cloud solutions (AWS/GCP) for robust production environments
Drive infrastructure optimization for generative AI use-cases, including efficient inference techniques (batching, caching, quantization), fine-tuning, prompt management, and model updates at scale
Partner with data engineering, product, infrastructure, and genAI teams to align ML platform initiatives with broader company goals, infrastructure strategy, and innovation roadmap
Contribute actively to internal documentation, onboarding, and training programs, promoting platform adoption and continuous improvement
Requirements:
Strong software engineering background with experience in building distributed systems or platforms designed for machine learning and AI workloads
Expert-level proficiency in Python and familiarity with ML frameworks (TensorFlow, PyTorch), infrastructure tooling (MLflow, Kubeflow, Ray), and popular APIs (Hugging Face, OpenAI, LangChain)
Experience implementing modern MLOps practices, including model lifecycle management, CI/CD, Docker, Kubernetes, model registries, and infrastructure-as-code tools (Terraform, Helm)
Demonstrated experience working with cloud infrastructure, ideally AWS or GCP, including Kubernetes clusters (GKE/EKS), serverless architectures, and managed ML services (e.g., Vertex AI, SageMaker)
Proven experience with generative AI technologies: transformers, embeddings, prompt engineering strategies, fine-tuning vs. prompt-tuning, vector databases, and retrieval-augmented generation (RAG) systems
Experience designing and maintaining real-time inference pipelines, including integrations with feature stores, streaming data platforms (Kafka, Kinesis), and observability platforms
Familiarity with SQL and data warehouse modeling
capable of managing complex data queries, joins, aggregations, and transformations
Solid understanding of ML monitoring, including identifying model drift, decay, latency optimization, cost management, and scaling API-based genAI applications efficiently
Bachelor’s degree in Computer Science, Engineering, or a related field
advanced degree is a plus
3+ years of experience in ML platform engineering, ML infrastructure, generative AI, or closely related roles
Proven track record of successfully building and operating ML infrastructure at scale, ideally supporting generative AI use-cases and complex inference scenarios
Strategic mindset with strong problem-solving skills and effective technical decision-making abilities
Excellent communication and collaboration skills, comfortable working cross-functionally across diverse teams and stakeholders
Strong sense of ownership, accountability, pragmatism, and proactive bias for action