This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a proactive MLOps Engineer to join our staff data engineer and form a new squad. This role is for a forward-thinking engineer who wants to seamlessly bridge the gap between high-throughput data engineering and Machine Learning infrastructure. You will work on our Python and AWS-hosted data streaming platform, owning the full data lifecycle for real-time event tracking to ensure scalability, reliability, and cost-effectiveness. While the basic components are built, you will drive a large rollout of event tracking across different teams, tackling significant data validation, data modelling, and scaling challenges. Crucially, the events you process will directly fuel our AI feature stores and models. You will collaborate closely with analysts, engineers, and product managers to enable accurate reporting for new product features and business KPIs, while simultaneously laying the foundation for our ML lifecycle. As we expand our AI capabilities, you will introduce MLOps best practices to deploy and serve models, with future opportunities to shape our LLMOps architecture.
Job Responsibility:
Build and operate a scalable data platform ingesting real-time events with a high-throughput rate
Collaborate with Data Scientists to transition ML models from experimentation to production
Build and maintain ML infrastructure for model serving (using FastAPI) and track model performance and lifecycle over time
Collaborate with analysts, engineers and product managers to understand user needs and take ownership of producing new event tracking functionality
Implement automations, high data quality controls, ensure data integrity in the ingested data, and create necessary monitoring alerts
Requirements:
Experience deploying and maintaining Python services in a major cloud environment (AWS, GCP, Azure)
Specific experience with the AWS stack, including Kinesis, Lambdas, Glue, Firehose, and Athena
Experience with MLOps frameworks and experiment tracking tools such as AWS Sagemaker, MLFlow, Databricks, or W&B
Basic knowledge of ML Inference REST APIs (FastAPI)
Great skills improving and operating Schema Registries and Data Catalogs (Glue, Databricks, etc.)
Relevant CI/CD experience (e.g., GitHub Actions, Gitlab Pipelines) automating tests and updates in schema registries and lambda releases
Solid experience with SQL and data warehousing, data lake environments (e.g. DataBricks, BigQuery, Redshift, S3)
Familiarity with cloud observability tools (e.g. Datadog, NewRelic, or Cloudwatch)
Nice to have:
Hands-on experience in a cloud data platform for event streaming such as Kinesis, Pub-Sub or Kafka
Knowledge and practical experience with data modelling, proto or AVRO schemas, and managing schema evolution
Production experience with containerization (Docker), orchestration (Kubernetes, AWS ECS/Fargate, etc.) and IaC (Terraform, Crossplane)
Familiarity with LLMOps and frameworks for building LLM agents (e.g., LangGraph)
Large scale data processing with PySpark and Flink