CrawlJobs Logo

Machine Learning Engineer, Distributed Data Systems

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

295000.00 - 445000.00 USD / Year

Job Description:

As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements into robust systems, and harden pipelines that serve as the backbone for Sora’s rapid iteration cycles.

Job Responsibility:

  • Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security
  • Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient
  • Partner with researchers to deeply understand requirements and translate them into production-ready systems
  • Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation

Requirements:

  • Strong experience with distributed systems and large-scale infrastructure
  • Detail-oriented and bring rigor to building and maintaining reliable systems
  • Excellent software engineering fundamentals and organizational skills
  • Comfortable with ambiguity and rapid change
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Offers Equity

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Machine Learning Engineer, Distributed Data Systems

Senior Machine Learning Systems Engineer

Our team is building the foundations to democratise Machine Learning for Atlassi...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin)
  • Understanding and experience with Machine Learning project lifecycle and tools
  • Understanding of LLMs, best deployment practices and inference optimisation
  • Experience in building and implementing high-performance RESTful micro-services
  • Experience building and operating large scale distributed systems using Amazon Web Services (Sagemaker, S3, Cloud Formation, AWS Security and Networking)
  • Experience with Continuous Delivery and Continuous Integration
Job Responsibility
Job Responsibility
  • Build and scale the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
  • Build systems for product teams like Jira & Confluence to provide access to curated LLMs
  • Use software development expertise to solve difficult problems, tackling infrastructure and architecture challenges
  • Lead engineers to drive involved projects from technical design to launch
  • Collaborate with other teams and internal customers to set expectations, gather input and communicate results
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Personalization and Recommendations

As a Senior Machine Learning Engineer on the Personalization & Recommendations t...
Location
Location
United States , San Francisco
Salary
Salary:
183360.00 - 248000.00 USD / Year
edtechjobs.io Logo
EdTech Jobs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in applied machine learning or ML-heavy software engineering, with a strong focus on personalization, ranking, or recommendation systems
  • Demonstrated impact improving key metrics such as CTR, retention, or engagement through recommender or search systems in production
  • Strong hands-on skills in Python and PyTorch, with expertise in data and feature engineering, distributed training and inference on GPUs, and familiarity with modern MLOps practices — including model registries, feature stores, monitoring, and drift detection
  • Deep understanding of retrieval and ranking architectures, such as Two-Tower models, deep cross networks, Transformers, or MMoE, and the ability to apply them to real-world problems
  • Experience with large-scale embedding models and vector search, including FAISS, ScaNN, or similar systems
  • Proficiency in experiment design and evaluation, connecting offline metrics (AUC, NDCG, calibration) with online A/B test outcomes to drive product decisions
  • Clear, effective communication, collaborating well with product managers, data scientists, engineers, and cross-functional partners
  • A growth and mentorship mindset, helping elevate team quality in modeling, experimentation, and reliability
  • Commitment to responsible and inclusive personalization, ensuring our systems respect learner privacy, fairness, and diverse goals
Job Responsibility
Job Responsibility
  • Design and implement personalization models across candidate retrieval, ranking, and post-ranking layers, leveraging user embeddings, contextual signals and content features
  • Develop scalable retrieval and serving systems using architectures such as Two-Tower models, deep ranking networks, and ANN-based vector search for real-time personalization
  • Build and maintain model training, evaluation, and deployment pipelines, ensuring reliability, training–serving consistency, observability, and robust monitoring
  • Partner with Product and Data Science to translate learner objectives (engagement, retention, mastery) into measurable modeling goals and experiment designs
  • Advance evaluation methodologies, contributing to offline metric design (e.g., NDCG, CTR, calibration) and supporting rigorous A/B testing to measure learner and business impact
  • Collaborate with platform and infrastructure teams to optimize distributed training, inference latency, and serving cost in production environments
  • Stay informed on industry and research trends, evaluating opportunities to meaningfully apply them within Quizlet’s ecosystem
  • Mentor junior and mid-level engineers, supporting technical growth, experimentation rigor, and responsible ML practices
  • Champion collaboration, inclusion, curiosity, and data-driven problem solving, contributing to a healthy and productive team culture
What we offer
What we offer
  • 20 vacation days
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401k plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
  • 40 hours of annual paid time off to participate in volunteer programs of choice
  • Fulltime
Read More
Arrow Right

Machine Learning Platform / Backend Engineer

We are seeking a Machine Learning Platform/Backend Engineer to design, build, an...
Location
Location
Serbia; Romania , Belgrade; Timișoara
Salary
Salary:
Not provided
everseen.ai Logo
Everseen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
  • Bachelors degree or equivalent focusing on the computer science field is preferred
  • Excellent communication and collaboration skills
  • Expert knowledge of Python
  • Experience with CI/CD tools (e.g., GitLab, Jenkins)
  • Hands-on experience with Kubernetes, Docker, and cloud services
  • Understanding of ML training pipelines, data lifecycle, and model serving concepts
  • Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML)
  • A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
  • Experience with ML frameworks (e.g., TensorFlow, PyTorch)
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
  • Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
  • Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
  • Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
  • Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
  • Build out documentation in relation to architecture, policies and operations runbooks
  • Share skills, knowledge, and expertise with members of the data engineering team
  • Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
  • Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
  • Ensure that the developed solutions meet project objectives and enhance user experience
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
Spain , Madrid; Valencia
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5–8+ years hands-on experience building and deploying ML models in production, ideally for recommender, ranking, or personalization systems
  • Expertise in Python (and optionally Java/Scala), ML frameworks (PyTorch, TensorFlow, XGBoost), feature engineering, and data transformation
  • Solid background in cloud (GCP strongly preferred), container orchestration (Docker, Kubernetes), and modern data/feature pipelines
  • Skilled at structuring ambiguous problems and navigating fast-changing priorities—ready to build with minimal legacy constraints
  • Comfortable communicating complex technical concepts in clear, remote team environments (professional English)
Job Responsibility
Job Responsibility
  • Lead the full ML model lifecycle—feature engineering, model design, training, deployment, monitoring, and ongoing improvement
  • Architect and implement scalable ranking, retrieval, and personalization models using state-of-the-art ML frameworks (e.g., PyTorch, TensorFlow)
  • Build robust, production-ready ML data pipelines and infrastructure (Python, GCP, Docker/Kubernetes)
  • Integrate ML models into high-traffic distributed systems
  • ensure observability, CI/CD, and real-time performance
  • Collaborate closely with Product and Data Engineering to deeply understand business needs and translate them into measurable user impact
  • Set technical standards and mentor less-experienced colleagues as an emerging ML leader in our scale-up environment
  • Experiment with advanced techniques (embeddings, deep learning, reinforcement learning) and champion an evidence-driven, AI-first culture
What we offer
What we offer
  • Greenfield Impact: Architect the backbone of Groupon’s revitalized search and recommendations from the ground up—with your work seen by millions
  • AI-First Scale-Up Vibe: Join a driven, supportive team amid exciting transformation—where speed, ambition, and technical influence matter
  • Career Launchpad: Be the ML architect/leader you’ve always wanted to be, with clear pathways to technical or team leadership as we grow
  • Global Collaboration: Work cross-functionally with international colleagues and senior leadership. EMEA time zone overlap preferred for maximum impact
Read More
Arrow Right

Machine Learning Engineer - Data Foundation and AI

You’ll be a machine learning engineer on the Data Foundation & AI team. In this ...
Location
Location
United States , San Francisco
Salary
Salary:
186000.00 - 236400.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1-3 years of experience training, deploying, and scaling ML/AI models in production environments
  • Strong experience with distributed systems and ML operations — from large-scale training to low-latency serving and monitoring
  • Proficiency in Python and modern ML frameworks (e.g., PyTorch), with the ability to implement and optimize complex models
  • Hands-on experience building or scaling ML/AI infrastructure, pipelines, or reusable platforms that support multiple teams
  • Curiosity and drive to experiment with advanced AI techniques (e.g., embeddings, retrieval, generative modeling) while staying grounded in production impact
  • Ability to thrive in a collaborative environment, working with both technical and non-technical partners to drive measurable outcomes
Job Responsibility
Job Responsibility
  • Building and scaling advanced ML/AI systems that power core Plaid products and applications used by millions of consumers
  • Driving impact at scale by improving distributed training, serving, and ML operations to make Plaid’s AI capabilities faster, more reliable, and more widely available
  • Developing new AI applications that enable innovative product experiences across fintech
  • Tackling 0 to 1 problems where you explore new approaches, as well as scaling 1 to 10 systems for reliability and efficiency
  • Collaborating with some of the strongest MLEs at Plaid in a high-ownership, bottom-up driven team
  • Experimenting with cutting-edge ML and AI techniques while balancing practical productionization and measurable business impact
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Machine Learning Engineer - Data Foundation and AI

You’ll be a machine learning engineer on the Data Foundation & AI team. In this ...
Location
Location
United States , New York
Salary
Salary:
186000.00 - 236400.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1-3 years of experience training, deploying, and scaling ML/AI models in production environments
  • Strong experience with distributed systems and ML operations — from large-scale training to low-latency serving and monitoring
  • Proficiency in Python and modern ML frameworks (e.g., PyTorch), with the ability to implement and optimize complex models
  • Hands-on experience building or scaling ML/AI infrastructure, pipelines, or reusable platforms that support multiple teams
  • Curiosity and drive to experiment with advanced AI techniques (e.g., embeddings, retrieval, generative modeling) while staying grounded in production impact
  • Ability to thrive in a collaborative environment, working with both technical and non-technical partners to drive measurable outcomes
Job Responsibility
Job Responsibility
  • Building and scaling advanced ML/AI systems that power core Plaid products and applications used by millions of consumers
  • Driving impact at scale by improving distributed training, serving, and ML operations to make Plaid’s AI capabilities faster, more reliable, and more widely available
  • Developing new AI applications that enable innovative product experiences across fintech
  • Tackling 0 to 1 problems where you explore new approaches, as well as scaling 1 to 10 systems for reliability and efficiency
  • Collaborating with some of the strongest MLEs at Plaid in a high-ownership, bottom-up driven team
  • Experimenting with cutting-edge ML and AI techniques while balancing practical productionization and measurable business impact
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Machine Learning Engineer

The Personalization (PZN) team makes deciding what to play next on Spotify easie...
Location
Location
United States , New York or Boston
Salary
Salary:
138250.00 - 197500.00 USD / Year
spotify.com Logo
Spotify
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • An experienced ML practitioner motivated to work on complex real-world problems in a fast-paced and collaborative environment
  • Strong background in machine learning, natural language processing, and generative AI, with experience in applying theory to develop real-world applications
  • Hands-on expertise with implementing end-to-end production ML systems at scale. Experience with production LLM scale based systems is a plus
  • Experience with incorporating human feedback to improve LLM based systems using technicals like DPO, KTO, and reinforcement fine-tuning
  • Experience with designing end-to-end tech specs and modular architectures for ML frameworks in complex problem spaces in collaboration with product teams
  • Experience with large scale, distributed data processing frameworks/tools like Apache Beam, Apache Spark, and cloud platforms like GCP or AWS
Job Responsibility
Job Responsibility
  • Design, build, evaluate, and ship LLM based solutions that tell stories about our content and our users
  • Collaborate with cross functional teams spanning user research, design, data science, product management, and engineering to build new product features that advance our mission to connect artists and fans in personalized and useful ways
  • Prototype new approaches and productionize solutions at scale for our hundreds of millions of active users
  • Promote and role-model best practices of ML systems development, testing, evaluation, etc., both inside the team as well as throughout the organization
  • Be part of an active group of machine learning practitioners
What we offer
What we offer
  • health insurance
  • six month paid parental leave
  • 401(k) retirement plan
  • monthly meal allowance
  • 23 paid days off
  • 13 paid flexible holidays
  • paid sick leave
  • Extensive learning opportunities, through our dedicated team, GreenHouse
  • Flexible share incentives letting you choose how you share in our success
  • Global parental leave, six months off - for all new parents
  • Fulltime
Read More
Arrow Right