Machine Learning Engineer, Distributed Data Systems Job at OpenAI (San Francisco)

Senior Machine Learning Systems Engineer

Our team is building the foundations to democratise Machine Learning for Atlassi...

Location

India , Bengaluru

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin)
Understanding and experience with Machine Learning project lifecycle and tools
Understanding of LLMs, best deployment practices and inference optimisation
Experience in building and implementing high-performance RESTful micro-services
Experience building and operating large scale distributed systems using Amazon Web Services (Sagemaker, S3, Cloud Formation, AWS Security and Networking)
Experience with Continuous Delivery and Continuous Integration

Job Responsibility

Build and scale the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Build systems for product teams like Jira & Confluence to provide access to curated LLMs
Use software development expertise to solve difficult problems, tackling infrastructure and architecture challenges
Lead engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results

What we offer

Health coverage
Paid volunteer days
Wellness resources

Fulltime

Senior Machine Learning Engineer, Personalization and Recommendations

As a Senior Machine Learning Engineer on the Personalization & Recommendations t...

Location

United States , San Francisco

Salary:

183360.00 - 248000.00 USD / Year

EdTech Jobs

Expiration Date

Until further notice

Requirements

5+ years of experience in applied machine learning or ML-heavy software engineering, with a strong focus on personalization, ranking, or recommendation systems
Demonstrated impact improving key metrics such as CTR, retention, or engagement through recommender or search systems in production
Strong hands-on skills in Python and PyTorch, with expertise in data and feature engineering, distributed training and inference on GPUs, and familiarity with modern MLOps practices — including model registries, feature stores, monitoring, and drift detection
Deep understanding of retrieval and ranking architectures, such as Two-Tower models, deep cross networks, Transformers, or MMoE, and the ability to apply them to real-world problems
Experience with large-scale embedding models and vector search, including FAISS, ScaNN, or similar systems
Proficiency in experiment design and evaluation, connecting offline metrics (AUC, NDCG, calibration) with online A/B test outcomes to drive product decisions
Clear, effective communication, collaborating well with product managers, data scientists, engineers, and cross-functional partners
A growth and mentorship mindset, helping elevate team quality in modeling, experimentation, and reliability
Commitment to responsible and inclusive personalization, ensuring our systems respect learner privacy, fairness, and diverse goals

Job Responsibility

Design and implement personalization models across candidate retrieval, ranking, and post-ranking layers, leveraging user embeddings, contextual signals and content features
Develop scalable retrieval and serving systems using architectures such as Two-Tower models, deep ranking networks, and ANN-based vector search for real-time personalization
Build and maintain model training, evaluation, and deployment pipelines, ensuring reliability, training–serving consistency, observability, and robust monitoring
Partner with Product and Data Science to translate learner objectives (engagement, retention, mastery) into measurable modeling goals and experiment designs
Advance evaluation methodologies, contributing to offline metric design (e.g., NDCG, CTR, calibration) and supporting rigorous A/B testing to measure learner and business impact
Collaborate with platform and infrastructure teams to optimize distributed training, inference latency, and serving cost in production environments
Stay informed on industry and research trends, evaluating opportunities to meaningfully apply them within Quizlet’s ecosystem
Mentor junior and mid-level engineers, supporting technical growth, experimentation rigor, and responsible ML practices
Champion collaboration, inclusion, curiosity, and data-driven problem solving, contributing to a healthy and productive team culture

What we offer

20 vacation days
Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
Employer-sponsored 401k plan with company match
Access to LinkedIn Learning and other resources to support professional growth
Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
40 hours of annual paid time off to participate in volunteer programs of choice

Fulltime

Machine Learning Platform / Backend Engineer

We are seeking a Machine Learning Platform/Backend Engineer to design, build, an...

Location

Serbia; Romania , Belgrade; Timișoara

Salary:

Not provided

Everseen

Expiration Date

Until further notice

Requirements

4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
Bachelors degree or equivalent focusing on the computer science field is preferred
Excellent communication and collaboration skills
Expert knowledge of Python
Experience with CI/CD tools (e.g., GitLab, Jenkins)
Hands-on experience with Kubernetes, Docker, and cloud services
Understanding of ML training pipelines, data lifecycle, and model serving concepts
Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML)
A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
Experience with ML frameworks (e.g., TensorFlow, PyTorch)

Job Responsibility

Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
Build out documentation in relation to architecture, policies and operations runbooks
Share skills, knowledge, and expertise with members of the data engineering team
Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
Ensure that the developed solutions meet project objectives and enhance user experience

Fulltime

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...

Location

United States , San Francisco

Salary:

216500.00 - 324500.00 USD / Year

GoFundMe

Expiration Date

Until further notice

Requirements

9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
Extensive experience designing, developing, and operating scalable backend systems
Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)

Job Responsibility

Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure

What we offer

Competitive pay
Comprehensive healthcare benefits
Financial assistance for things like hybrid work, family planning
Generous parental leave
Flexible time-off policies
Mental health and wellness resources
Learning, development, and recognition programs

Fulltime

Senior Machine Learning Engineer

Groupon is a marketplace where customers discover new experiences and services e...

Location

Spain , Madrid; Valencia

Salary:

Not provided

Groupon

Expiration Date

Until further notice

Requirements

5–8+ years hands-on experience building and deploying ML models in production, ideally for recommender, ranking, or personalization systems
Expertise in Python (and optionally Java/Scala), ML frameworks (PyTorch, TensorFlow, XGBoost), feature engineering, and data transformation
Solid background in cloud (GCP strongly preferred), container orchestration (Docker, Kubernetes), and modern data/feature pipelines
Skilled at structuring ambiguous problems and navigating fast-changing priorities—ready to build with minimal legacy constraints
Comfortable communicating complex technical concepts in clear, remote team environments (professional English)

Job Responsibility

Lead the full ML model lifecycle—feature engineering, model design, training, deployment, monitoring, and ongoing improvement
Architect and implement scalable ranking, retrieval, and personalization models using state-of-the-art ML frameworks (e.g., PyTorch, TensorFlow)
Build robust, production-ready ML data pipelines and infrastructure (Python, GCP, Docker/Kubernetes)
Integrate ML models into high-traffic distributed systems
ensure observability, CI/CD, and real-time performance
Collaborate closely with Product and Data Engineering to deeply understand business needs and translate them into measurable user impact
Set technical standards and mentor less-experienced colleagues as an emerging ML leader in our scale-up environment
Experiment with advanced techniques (embeddings, deep learning, reinforcement learning) and champion an evidence-driven, AI-first culture

What we offer

Greenfield Impact: Architect the backbone of Groupon’s revitalized search and recommendations from the ground up—with your work seen by millions
AI-First Scale-Up Vibe: Join a driven, supportive team amid exciting transformation—where speed, ambition, and technical influence matter
Career Launchpad: Be the ML architect/leader you’ve always wanted to be, with clear pathways to technical or team leadership as we grow
Global Collaboration: Work cross-functionally with international colleagues and senior leadership. EMEA time zone overlap preferred for maximum impact

Machine Learning Engineer - Data Foundation and AI

You’ll be a machine learning engineer on the Data Foundation & AI team. In this ...

Location

United States , San Francisco

Salary:

186000.00 - 236400.00 USD / Year

Plaid

Expiration Date

Until further notice

Requirements

1-3 years of experience training, deploying, and scaling ML/AI models in production environments
Strong experience with distributed systems and ML operations — from large-scale training to low-latency serving and monitoring
Proficiency in Python and modern ML frameworks (e.g., PyTorch), with the ability to implement and optimize complex models
Hands-on experience building or scaling ML/AI infrastructure, pipelines, or reusable platforms that support multiple teams
Curiosity and drive to experiment with advanced AI techniques (e.g., embeddings, retrieval, generative modeling) while staying grounded in production impact
Ability to thrive in a collaborative environment, working with both technical and non-technical partners to drive measurable outcomes

Job Responsibility

Building and scaling advanced ML/AI systems that power core Plaid products and applications used by millions of consumers
Driving impact at scale by improving distributed training, serving, and ML operations to make Plaid’s AI capabilities faster, more reliable, and more widely available
Developing new AI applications that enable innovative product experiences across fintech
Tackling 0 to 1 problems where you explore new approaches, as well as scaling 1 to 10 systems for reliability and efficiency
Collaborating with some of the strongest MLEs at Plaid in a high-ownership, bottom-up driven team
Experimenting with cutting-edge ML and AI techniques while balancing practical productionization and measurable business impact

What we offer

medical
dental
vision
401(k)
equity
commission

Fulltime

Machine Learning Engineer - Data Foundation and AI

You’ll be a machine learning engineer on the Data Foundation & AI team. In this ...

Location

United States , New York

Salary:

186000.00 - 236400.00 USD / Year

Plaid

Expiration Date

Until further notice

Requirements

1-3 years of experience training, deploying, and scaling ML/AI models in production environments
Strong experience with distributed systems and ML operations — from large-scale training to low-latency serving and monitoring
Proficiency in Python and modern ML frameworks (e.g., PyTorch), with the ability to implement and optimize complex models
Hands-on experience building or scaling ML/AI infrastructure, pipelines, or reusable platforms that support multiple teams
Curiosity and drive to experiment with advanced AI techniques (e.g., embeddings, retrieval, generative modeling) while staying grounded in production impact
Ability to thrive in a collaborative environment, working with both technical and non-technical partners to drive measurable outcomes

Job Responsibility

Building and scaling advanced ML/AI systems that power core Plaid products and applications used by millions of consumers
Driving impact at scale by improving distributed training, serving, and ML operations to make Plaid’s AI capabilities faster, more reliable, and more widely available
Developing new AI applications that enable innovative product experiences across fintech
Tackling 0 to 1 problems where you explore new approaches, as well as scaling 1 to 10 systems for reliability and efficiency
Collaborating with some of the strongest MLEs at Plaid in a high-ownership, bottom-up driven team
Experimenting with cutting-edge ML and AI techniques while balancing practical productionization and measurable business impact

What we offer

medical
dental
vision
401(k)
equity
commission

Fulltime