CrawlJobs Logo

ML Platform Engineer

duettocloud.com Logo

Duetto

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a Machine Learning Engineer to help build and scale our machine learning infrastructure and workflows. At Duetto, you’ll take on the unique challenge of supporting the development, training, deployment, and monitoring of thousands of machine learning models, one for each hotel customer. You’ll work closely with data scientists, DevOps, and platform engineers to deliver robust, reusable tooling for the entire ML lifecycle—including training pipelines, inference APIs, feature workflows, and monitoring hooks—within our AWS-native environment. Your work will help us ensure that ML models are delivered quickly, reliably, and cost-effectively into production. This is an opportunity to build ML systems at scale, contribute to the design of modern ML infrastructure on top of AWS and Kubernetes, and shape the future of machine learning at Duetto.

Job Responsibility:

  • Develop, maintain, and scale machine learning pipelines for training, validation, and batch or real-time inference across thousands of hotel-specific models
  • Build reusable components to support model training, evaluation, deployment, and monitoring within a Kubernetes- and AWS-based environment
  • Partner with data scientists to translate notebooks and prototypes into production-grade, versioned training workflows
  • Implement and maintain feature engineering workflows, integrating with custom feature pipelines and supporting services
  • Collaborate with platform and DevOps teams to manage infrastructure-as-code (Terraform), automate deployment (CI/CD), and ensure reliability and security
  • Integrate model monitoring for performance metrics, drift detection, and alerting (using tools like Prometheus, CloudWatch, or Grafana)
  • Improve retraining, rollback, and model versioning strategies across different deployment contexts
  • Support experimentation infrastructure and A/B testing integrations for ML-based products

Requirements:

  • 3+ years of experience in ML engineering or a similar role building and deploying machine learning models in production
  • Strong experience with AWS ML services (SageMaker, Lambda, EMR, ECR) for training, serving, and orchestrating model workflows
  • Hands-on experience with Kubernetes (e.g., EKS) for container orchestration and job execution at scale
  • Strong proficiency in Python, with exposure to ML/DL libraries such as TensorFlow, PyTorch, scikit-learn
  • Experience working with feature stores, data pipelines, and model versioning tools (e.g., SageMaker Feature Store, Feast, MLflow)
  • Familiarity with infrastructure-as-code and deployment tools such as Terraform, GitHub Actions, or similar CI/CD systems
  • Experience with logging and monitoring stacks such as Prometheus, Grafana, CloudWatch, or similar
  • Experience working in cross-functional teams with data scientists and DevOps engineers to bring models from research to production
  • Strong communication skills and ability to operate effectively in a fast-paced, ambiguous environment with shifting priorities

Additional Information:

Job Posted:
December 08, 2025

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for ML Platform Engineer

Senior Platform Engineer, ML Data Systems

We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet t...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems
  • Familiarity with machine learning workflows — from training data preparation to evaluation
  • Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations
  • Attention to detail and an obsession with data quality and reproducibility
  • Motivated by the Khan Academy mission “to provide a free world-class education for anyone, anywhere.”
  • Proven cross-cultural competency skills demonstrating self-awareness, awareness of other, and the ability to adopt inclusive perspectives, attitudes, and behaviors to drive inclusion and belonging throughout the organization.
Job Responsibility
Job Responsibility
  • Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
  • Clean, normalize, and enrich data while preserving semantic meaning and consistency
  • Prepare and format datasets for human labeling, and integrate results into ML datasets
  • Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
  • Implement automated tests and validation to detect data drift or labeling inconsistencies
  • Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring
  • Contribute to shared tools and documentation for dataset management and AI evaluation
  • Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery.
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
  • Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
  • Generous parental leave
  • An exceptional team that trusts you and gives you the freedom to do your best
  • The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
  • Opportunities to connect through affinity, ally, and social groups
  • 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life.
  • Fulltime
Read More
Arrow Right

Sr. Staff ML Platform Engineer

Machine learning is the crucial enabler for every financial service that EarnIn ...
Location
Location
United States , Mountain View
Salary
Salary:
360000.00 - 440000.00 USD / Year
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Engineering, or a related field
  • 8+ years of industry machine learning experience and excellent software engineering skills
  • Strong programming skills in Python, with familiarity in ML frameworks such as TensorFlow or PyTorch
  • Experience with ML cloud platforms such as AWS Sagemaker, Databricks, or GCP Vertex AI
  • Familiarity with data pipelines and workflow management tools
  • Strong communication and collaboration skills
  • Passion for learning and staying updated with the latest industry trends in machine learning and platform engineering
Job Responsibility
Job Responsibility
  • Design, build, and maintain a robust ML platform and tooling ecosystem that supports the entire machine learning lifecycle, from experimentation to production
  • Lead and mentor a team of ML engineers, deeply understanding their workflows to streamline model training, deployment, and monitoring, while ensuring reproducibility and consistency of results
  • Drive scalability, reliability, and cost efficiency of the ML platform, balancing performance with ease of use for scientists and engineers
  • Evaluate and adopt emerging technologies to continually advance the organization’s machine learning capabilities and maintain a competitive edge
  • Champion operational excellence, setting a high bar for engineering quality, reliability, and automation
  • Act as a catalyst for innovation, spearheading step-change improvements that unlock new opportunities for growth and efficiency
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer SRE – ML platform

Location
Location
United States , Sunnyvale
Salary
Salary:
Not provided
thirdeyedata.ai Logo
Thirdeye Data
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS
  • Good understanding of Apache SOLR
  • Proficient with Linux administration
  • Knowledge of ML models and LLM
  • Ability to understand tools used by data scientists and experience with software development and test automation
  • Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
  • Experience working with cloud computing and database systems
  • Experience building custom integrations between cloud-based systems using APIs
  • Experience developing and maintaining ML systems built with open-source tools
  • Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Job Responsibility
Job Responsibility
  • Continuous Deployment using GitHub Actions, Flux, Kustomize
  • Design and implement cloud solutions, build MLOps on AWS cloud
  • Data science model containerization, deployment using Docker, VLLM, Kubernetes
  • Communicate with a team of data scientists, data engineers, and architects, and document the processes
  • Develop and deploy scalable tools and services for our clients to handle machine learning training and inference
  • Knowledge of ML models and LLM
  • Fulltime
Read More
Arrow Right

ML Engineer

The IT company Andersen invites an experienced ML Engineer for a large-scale pro...
Location
Location
Poland , Kraków; Warszawa
Salary
Salary:
Not provided
andersenlab.com Logo
Andersen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience as a ML Engineer for 3+ years
  • Strong proficiency in Python, with deep knowledge of software development principles, architecture patterns, and ML model integration
  • Hands-on experience with TTS systems (e.g., Tacotron, FastSpeech, VITS) and an understanding of SST pipelines
  • Familiarity with real-time AI systems, including LLM integration and latency-sensitive applications
  • Experience tuning and maintaining ML models for performance, scalability, and quality in production
  • Level of English – from Intermediate+
Job Responsibility
Job Responsibility
  • Designing, integrating, and optimizing Text-to-Speech (TTS) systems within real-time conversational AI pipelines
  • Fine-tuning models based on user feedback, improving clarity, naturalness, and emotional expression in voice output
  • Contributing to customer-specific deployments with high adaptability and quick turnaround requirements
  • Collaborating with ML, product, and engineering teams to ensure seamless voice experiences across our platform
What we offer
What we offer
  • Experience in teamwork with leaders in FinTech, Healthcare, Retail, Telecom, and others
  • The opportunity to change the project and/or develop expertise in an interesting business domain
  • Guarantee of professional, financial, and career growth
  • The opportunity to earn up to an additional 1,000 EUR per month, depending on the level of expertise, which will be included in the annual bonus, by participating in the company's activities
  • Access to the corporate training portal
  • Bright corporate life (parties / pizza days / PlayStation / fruits / coffee / snacks / movies)
  • Certification compensation (AWS, PMP, etc)
  • Referral program
  • English courses
  • Private health insurance and compensation for sports activities
Read More
Arrow Right

ML Engineer

The IT company Andersen invites an experienced ML Engineer for a large-scale pro...
Location
Location
Salary
Salary:
Not provided
andersenlab.com Logo
Andersen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience as a ML Engineer for 3+ years
  • Strong proficiency in Python, with deep knowledge of software development principles, architecture patterns, and ML model integration
  • Hands-on experience with TTS systems (e.g., Tacotron, FastSpeech, VITS) and an understanding of SST pipelines
  • Familiarity with real-time AI systems, including LLM integration and latency-sensitive applications
  • Experience tuning and maintaining ML models for performance, scalability, and quality in production
  • Level of English – from Intermediate+
Job Responsibility
Job Responsibility
  • Designing, integrating, and optimizing Text-to-Speech (TTS) systems within real-time conversational AI pipelines
  • Fine-tuning models based on user feedback, improving clarity, naturalness, and emotional expression in voice output
  • Contributing to customer-specific deployments with high adaptability and quick turnaround requirements
  • Collaborating with ML, product, and engineering teams to ensure seamless voice experiences across our platform
What we offer
What we offer
  • Experience in teamwork with leaders in FinTech, Healthcare, Retail, Telecom, and others
  • The opportunity to change the project and/or develop expertise in an interesting business domain
  • Guarantee of professional, financial, and career growth
  • The opportunity to earn up to an additional 1,000 EUR per month, depending on the level of expertise, which will be included in the annual bonus, by participating in the company's activities
  • Access to the corporate training portal
  • Bright corporate life (parties / pizza days / PlayStation / fruits / coffee / snacks / movies)
  • Certification compensation (AWS, PMP, etc)
  • Referral program
  • English courses
  • Private health insurance and compensation for sports activities
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Platform Engineer

Motorica is at a breakthrough moment. We’ve built a generative AI animation plat...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
motorica.ai Logo
Motorica
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in Platform Engineering, SRE, or DevOps, ideally in high-growth or AI/ML-heavy environments
  • Strong grasp of CI/CD systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes)
  • Familiarity with observability, monitoring, and incident response best practices
  • Security mindset with hands-on experience in audits, compliance (ISO 27001, SOC2, etc.), and vulnerability management
  • Strong communication skills
  • you’ll be interfacing with developers daily and need to translate infrastructure into clarity, not complexity
  • A proactive, solution-oriented mindset: you anticipate friction before others feel it
Job Responsibility
Job Responsibility
  • Provide common infrastructure guidance, reusable patterns, and automated tooling to engineering teams
  • Own the “paved road” for developers, reducing friction and cognitive load
  • Champion and implement security best practices across the entire platform
  • Play a key role in achieving ISO 27001 certification through technical implementation and evidence gathering
  • Build and operate a highly reliable and cost-efficient platform, with particular focus on optimizing GPU-heavy AI/ML workloads
  • Manage CI/CD systems (GitHub Actions, GitLab CI) and track key metrics like build times, deployment frequency, and failure rates
  • Oversee cloud environments (AWS, GCP), including health, security, and cost reporting
  • Lead security scans, audits, and vulnerability remediation
  • Maintain observability stack (Prometheus, Grafana, Datadog, GCP Logging), ensuring meaningful dashboards and alerts
  • Act as point-of-contact for ML Research team’s infra requests (GPU access, specialized pipelines)
What we offer
What we offer
  • Stock Options program
  • Retirement Plan
  • Health Benefits (5000 SEK/year)
  • Life Insurance / Health Insurance / Injury Insurance
  • Competitive compensation
  • Fulltime
Read More
Arrow Right