CrawlJobs Logo

ML Engineer, Cloud Platform

priorlabs.ai Logo

Prior Labs

Location Icon

Location:
Germany; United States , Berlin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

You will have ownership over designing, building, and scaling the core infrastructure that brings Prior Labs' foundation models to the world. This is a unique opportunity to make fundamental architectural decisions, establish engineering best practices from the ground up, and profoundly shape the technical direction for serving state-of-the-art AI. You'll work directly with world-class AI researchers, translating cutting-edge models into reliable, scalable production systems. This role offers significant autonomy and impact, with clear paths to specialize in areas you're passionate about (like ML infrastructure or core backend systems) or grow into a technical leadership position as our team expands. You won't just be implementing features; you'll be building the backbone of our company.

Job Responsibility:

  • Architect & Design: Design robust, scalable, and secure backend systems and production-grade APIs for serving and finetuning our foundation models
  • Build & Implement: Develop high-quality, maintainable code (Python/FastAPI experience highly valued) for core backend services
  • Own Infrastructure: Design, deploy, and manage core infrastructure on cloud platforms, focusing on reliability, monitoring, observability, and cost-efficiency
  • Core MLOps Concepts: Understanding of the entire machine learning lifecycle (MLLC) from data ingestion and preparation to model deployment, monitoring, and retraining
  • Ensure Compliance & Security: Implement secure, GDPR-compliant systems, including data storage, access control, usage tracking, and quota management
  • Champion Best Practices: Drive high standards for testing, CI/CD, documentation, and security within the engineering team

Requirements:

  • 3+ years of professional experience in a cloud engineering, data platform, or SRE role, with a proven track record of managing production infrastructure
  • Proven experience building and maintaining data-intensive systems, with a strong understanding of data modeling, storage, and processing technologies
  • Strong, hands-on experience with Infrastructure as Code (IaC) using tools like Terraform
  • Significant experience with containerization and orchestration technologies (Docker, Kubernetes)
  • Proficiency in Python

Nice to have:

  • Experience building or managing infrastructure specifically for machine learning (MLOps, model serving frameworks, feature stores, data pipelines)
  • Hands-on experience with modern data warehousing and processing platforms like Databricks, Snowflake, or BigQuery
  • Experience in a technical leadership or mentorship role
  • Contributions to relevant open-source projects
What we offer:
  • Competitive compensation package with meaningful equity
  • 30 days of paid vacation + public holidays
  • Comprehensive benefits including healthcare, transportation, and fitness
  • Work with state-of-the-art ML architecture, substantial compute resources and with a world-class team

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for ML Engineer, Cloud Platform

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Sr. Staff ML Platform Engineer

Machine learning is the crucial enabler for every financial service that EarnIn ...
Location
Location
United States , Mountain View
Salary
Salary:
360000.00 - 440000.00 USD / Year
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Engineering, or a related field
  • 8+ years of industry machine learning experience and excellent software engineering skills
  • Strong programming skills in Python, with familiarity in ML frameworks such as TensorFlow or PyTorch
  • Experience with ML cloud platforms such as AWS Sagemaker, Databricks, or GCP Vertex AI
  • Familiarity with data pipelines and workflow management tools
  • Strong communication and collaboration skills
  • Passion for learning and staying updated with the latest industry trends in machine learning and platform engineering
Job Responsibility
Job Responsibility
  • Design, build, and maintain a robust ML platform and tooling ecosystem that supports the entire machine learning lifecycle, from experimentation to production
  • Lead and mentor a team of ML engineers, deeply understanding their workflows to streamline model training, deployment, and monitoring, while ensuring reproducibility and consistency of results
  • Drive scalability, reliability, and cost efficiency of the ML platform, balancing performance with ease of use for scientists and engineers
  • Evaluate and adopt emerging technologies to continually advance the organization’s machine learning capabilities and maintain a competitive edge
  • Champion operational excellence, setting a high bar for engineering quality, reliability, and automation
  • Act as a catalyst for innovation, spearheading step-change improvements that unlock new opportunities for growth and efficiency
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer, ML Data Systems

We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet t...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems
  • Familiarity with machine learning workflows — from training data preparation to evaluation
  • Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations
  • Attention to detail and an obsession with data quality and reproducibility
  • Motivated by the Khan Academy mission “to provide a free world-class education for anyone, anywhere.”
  • Proven cross-cultural competency skills demonstrating self-awareness, awareness of other, and the ability to adopt inclusive perspectives, attitudes, and behaviors to drive inclusion and belonging throughout the organization.
Job Responsibility
Job Responsibility
  • Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
  • Clean, normalize, and enrich data while preserving semantic meaning and consistency
  • Prepare and format datasets for human labeling, and integrate results into ML datasets
  • Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
  • Implement automated tests and validation to detect data drift or labeling inconsistencies
  • Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring
  • Contribute to shared tools and documentation for dataset management and AI evaluation
  • Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery.
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
  • Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
  • Generous parental leave
  • An exceptional team that trusts you and gives you the freedom to do your best
  • The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
  • Opportunities to connect through affinity, ally, and social groups
  • 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life.
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer SRE – ML platform

Location
Location
United States , Sunnyvale
Salary
Salary:
Not provided
thirdeyedata.ai Logo
Thirdeye Data
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS
  • Good understanding of Apache SOLR
  • Proficient with Linux administration
  • Knowledge of ML models and LLM
  • Ability to understand tools used by data scientists and experience with software development and test automation
  • Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
  • Experience working with cloud computing and database systems
  • Experience building custom integrations between cloud-based systems using APIs
  • Experience developing and maintaining ML systems built with open-source tools
  • Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Job Responsibility
Job Responsibility
  • Continuous Deployment using GitHub Actions, Flux, Kustomize
  • Design and implement cloud solutions, build MLOps on AWS cloud
  • Data science model containerization, deployment using Docker, VLLM, Kubernetes
  • Communicate with a team of data scientists, data engineers, and architects, and document the processes
  • Develop and deploy scalable tools and services for our clients to handle machine learning training and inference
  • Knowledge of ML models and LLM
  • Fulltime
Read More
Arrow Right

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...
Location
Location
Canada , Vancouver
Salary
Salary:
190000.00 - 240000.00 CAD / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 years of experience in software engineering
  • 5 years of experience with infrastructure-as-code
  • Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
Job Responsibility
Job Responsibility
  • Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
  • Identify and implement opportunities to enhance engineering speed and efficiency
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
  • Develop and share best practices to improve automation and efficiency across our engineering teams
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...
Location
Location
United States , Mountain View, California
Salary
Salary:
180000.00 - 280000.00 USD / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 years of experience in software engineering
  • 5 years of experience with infrastructure-as-code
  • Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
  • Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week)
Job Responsibility
Job Responsibility
  • Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
  • Identify and implement opportunities to enhance engineering speed and efficiency
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
  • Develop and share best practices to improve automation and efficiency across our engineering teams
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

AI ML Engineer

We seeking a talented ML/AI Engineer to join our innovative team and drive the d...
Location
Location
India , hyderabad
Salary
Salary:
Not provided
genzeon.com Logo
Genzeon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in building ML models with proven track record of successful deployments
  • Extensive experience in Generative AI including LLMs, diffusion models, and related technologies
  • Experience in Agentic AI and understanding of autonomous agent architectures
  • Proficiency with Model Control Protocol (MCP) for agent communication and control
  • Advanced Python programming with expertise in ML libraries (scikit-learn, TensorFlow, PyTorch, etc.)
  • Google Cloud Platform (GCP) experience with ML-focused services
  • Vertex AI hands-on experience for model lifecycle management
  • AutoML experience for automated machine learning workflows
  • Model Armour or similar model security and protection frameworks
  • 3+ years of experience in machine learning engineering or related field
Job Responsibility
Job Responsibility
  • Design, develop, and deploy robust machine learning models for various business applications
  • Build and optimize generative AI solutions using latest frameworks and techniques
  • Implement agentic AI systems that can autonomously perform complex tasks
  • Develop and maintain ML pipelines from data ingestion to model deployment
  • Leverage Google Cloud Platform (GCP) services for scalable ML infrastructure
  • Utilize Vertex AI for model training, deployment, and management
  • Implement AutoML solutions for rapid prototyping and model development
  • Ensure model security and compliance using Model Armour and related tools
  • Write clean, efficient Python code for ML applications and data processing
  • Optimize model performance, accuracy, and computational efficiency
Read More
Arrow Right

AI ML Engineer

We seeking a talented ML/AI Engineer to join our innovative team and drive the d...
Location
Location
India , hyderabad
Salary
Salary:
Not provided
genzeon.com Logo
Genzeon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in building ML models with proven track record of successful deployments
  • Extensive experience in Generative AI including LLMs, diffusion models, and related technologies
  • Experience in Agentic AI and understanding of autonomous agent architectures
  • Proficiency with Model Control Protocol (MCP) for agent communication and control
  • Advanced Python programming with expertise in ML libraries (scikit-learn, TensorFlow, PyTorch, etc.)
  • Google Cloud Platform (GCP) experience with ML-focused services
  • Vertex AI hands-on experience for model lifecycle management
  • AutoML experience for automated machine learning workflows
  • Model Armour or similar model security and protection frameworks
  • 3+ years of experience in machine learning engineering or related field
Job Responsibility
Job Responsibility
  • Design, develop, and deploy robust machine learning models for various business applications
  • Build and optimize generative AI solutions using latest frameworks and techniques
  • Implement agentic AI systems that can autonomously perform complex tasks
  • Develop and maintain ML pipelines from data ingestion to model deployment
  • Leverage Google Cloud Platform (GCP) services for scalable ML infrastructure
  • Utilize Vertex AI for model training, deployment, and management
  • Implement AutoML solutions for rapid prototyping and model development
  • Ensure model security and compliance using Model Armour and related tools
  • Write clean, efficient Python code for ML applications and data processing
  • Optimize model performance, accuracy, and computational efficiency
Read More
Arrow Right