CrawlJobs Logo

Senior Machine Learning Engineer, ML Training Platform

United States Employment contract 216700.00 - 303400.00 USD / Year · Job Posted May 26, 2026
Apply Position
Job Link Share

Job Responsibility

  • Lead the building, testing, and maintenance of ML training infrastructure at Reddit
  • Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows
  • Evolve the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows
  • Kubernetes Automation: Write custom Kubernetes Controllers and Operators to manage the lifecycle of interactive Jupyter workspaces and long-running ML training jobs, handle auto-idling, and ensure fault tolerance
  • GPU Orchestration: Work with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully
  • Developer Experience (DevX): Treat internal MLEs as your customers. Conduct user research, reduce friction in the 'Idea-to-Prototype' loop, and standardize software environments (Docker images, Python dependency management)

Requirements

  • 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems
  • Deep Kubernetes Expertise: You know K8s beyond just 'deploying pods.' You understand CRDs, Controllers and the Operator pattern
  • Jupyter Ecosystem Knowledge: Experience customizing JupyterHub, JupyterLab extensions, or building similar interactive computing platforms
  • Strong Coding Skills: Proficiency in Python (for the ML ecosystem) and Go (for Kubernetes controllers/infrastructure tooling)
  • GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes
  • Cloud Provider Experience: Familiarity with both managed ML offerings (Vertex AI, Sagemaker, etc) and building custom ML components in AWS and/or GCP
  • Experience working with distributed training frameworks, including Ray and Kubernetes
  • Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle
  • Strong organizational & communication skills

What we offer

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k Match
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Reddit Global Days off
  • Generous paid Parental Leave
  • Paid Volunteer time off

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Machine Learning Engineer, ML Training Platform

8 matching positions

Senior AI ML Engineer

We are seeking a highly skilled and experienced Assistant Vice President (AVP), ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Statistics, or a related quantitative field
  • Minimum of 6+ years of professional experience in Data Science, Machine Learning Engineering, or a similar role, with a strong track record of deploying ML models to production
  • Proven experience in a lead or senior technical role
  • Expert-level proficiency in Python programming, including experience with relevant data science libraries (e.g., Pandas, NumPy, Scikit-learn) and deep learning frameworks (e.g., TensorFlow, PyTorch)
  • Strong hands-on experience designing, developing, and deploying RESTful APIs using FastAPI
  • Solid understanding and practical experience with CI/CD tools and methodologies (e.g., Jenkins, GitLab CI, GitHub Actions, Azure DevOps) for MLOps
  • Experience with MLOps platforms, model monitoring, and model versioning
  • Experience with at least one major cloud provider (e.g., AWS, Azure, GCP) for deploying and managing ML workloads
  • Proficiency in SQL and experience working with relational and/or NoSQL databases
  • Deep understanding of machine learning algorithms, statistical modeling, and data mining techniques
Job Responsibility
Job Responsibility
  • Design, develop, and implement advanced machine learning models (e.g., predictive, prescriptive, generative AI) to solve complex business problems, from initial data exploration and feature engineering to model training and evaluation
  • Lead the deployment of AI/ML models into production environments, ensuring scalability, reliability, and performance
  • Build and maintain robust, high-performance APIs (using frameworks like FastAPI) to serve machine learning models and integrate them with existing applications and systems
  • Establish and manage continuous integration and continuous deployment (CI/CD) pipelines for ML code and model deployments, promoting automation and efficiency
  • Collaborate with data engineers to ensure optimal data pipelines and data quality for model development and deployment
  • Conduct rigorous experimentation, A/B testing, and model performance monitoring to continuously improve and optimize AI/ML solutions
  • Promote and enforce best practices in software development, including clean code, unit testing, documentation, and version control
  • Mentor junior team members, contribute to technical discussions, and drive the adoption of new technologies and methodologies within the team
  • Effectively communicate complex technical concepts and model results to both technical and non-technical stakeholders.
What we offer
What we offer
  • Not explicitly stated.
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

At Cresta, we are dedicated to building state-of-the-art Machine Learning system...
Location
Location
United States; Canada
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or Ph.D. in Computer Science, Machine Learning, AI, or a related field
  • 5+ years of hands-on experience building and deploying ML models in production
  • Strong knowledge of ML frameworks and NLP libraries (e.g., PyTorch, TensorFlow, Hugging Face, spaCy, NLTK)
  • Solid experience with modern ML techniques including transformer architectures, embeddings, retrieval systems, and large-scale model training
  • Experience designing and deploying Retrieval-Augmented Generation pipelines for enterprise use cases
  • Familiarity with evaluation and benchmarking frameworks for ML/LLM models
  • Strong passion for AI-driven innovation, with a proven ability to deliver impactful, production-grade solutions
Job Responsibility
Job Responsibility
  • Build and optimize agentic AI workflows that enable users to dynamically interact with and refine outputs from ML systems
  • Research and implement advanced ML and NLP techniques, including transformer-based models, embeddings, and retrieval-augmented generation
  • Develop evaluation frameworks to assess accuracy, robustness, and usability of ML/LLM models in production environments
  • Design, train, and deploy machine learning models for tasks such as classification, entity identification, information extraction, retrieval, topic discovery, and structured insight generation
  • Architect and optimize RAG pipelines for grounding LLMs with enterprise data to ensure accuracy and reliability
  • Collaborate with engineers, UX designers, and product managers to integrate AI-driven capabilities into Cresta’s platform
  • Optimize ML pipelines and data processing systems to operate efficiently at scale
What we offer
What we offer
  • We offer Cresta employees a variety of medical, dental, and vision plans, designed to fit you and your family’s needs
  • Paid parental leave to support you and your family
  • Monthly Health & Wellness allowance
  • Work from home office stipend to help you succeed in a remote environment
  • Lunch reimbursement for in-office employees
  • PTO: 3 weeks in Canada
  • Compensation for this position includes a base salary, equity, and a variety of benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

As a Fintech company where Machine Learning (ML) is a key feature, our operation...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MS degree or PhD degree in Computer Science or a related technical field
  • 4+ years of experience in ML engineering
  • Strong programming skills in Python and data engineering skills
  • Extensive knowledge of machine learning algorithms
  • Hands-on experience on architectural patterns for large-scale software applications
  • Industry experience building and productionizing machine learning systems
  • Strong oral and written communication skills
Job Responsibility
Job Responsibility
  • Design, build, and launch efficient and reliable machine learning (ML) models to drive business impact
  • Train and validate state-of-the-art multi-modal, multi-task deep learning models as well as statistical models, considering use-case, complexity, performance, and robustness
  • Demonstrate end-to-end understanding of applications and develop a deep understanding of the “why” behind our models & systems
  • Partner with product managers, tech leads, and stakeholders to analyze business problems, clarify requirements, and define the scope of the systems needed
  • Work closely with data platform teams to enable robust, scalable batch and real-time data pipelines
  • Drive high ML and engineering standards on the team through mentoring and knowledge sharing. Drive engineering best practices around code reviews, automated testing and monitoring
What we offer
What we offer
  • healthcare
  • internet/cell phone reimbursement
  • a learning and development stipend
  • opportunities to collaborate with and travel to our Palo Alto HQ and Bangkok Site
Read More
Arrow Right

Senior Machine Learning Engineer

As a Fintech company where Machine Learning (ML) is one of the key features, our...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MS degree or PhD degree in Computer Science or a related technical field
  • 4+ years of experience in ML engineering
  • Strong programming skill in Python and data engineering skills
  • Extensive knowledge of machine learning algorithms
  • Hands-on experience on architectural patterns for large-scale software applications
  • Industry experience building and productionizing machine learning systems
  • Strong oral and written communication skills
Job Responsibility
Job Responsibility
  • Design, build, and launch efficient and reliable machine learning (ML) models to drive business impact
  • Train and validate state-of-the-art multi-modal, multi-task deep learning models as well as statistical models considering use-case, complexity, performance, and robustness
  • Demonstrate end-to-end understanding of applications and develop a deep understanding of the “why” behind our models & systems
  • Partner with product managers, tech leads, and stakeholders to analyze business problems, clarify requirements, and define the scope of the systems needed
  • Work closely with data platform teams to enable robust, scalable batch and real-time data pipelines
  • Drive high ML and engineering standards on the team through mentoring and knowledge sharing. Drive engineering best practices around code reviews, automated testing and monitoring
What we offer
What we offer
  • healthcare
  • internet/cell phone reimbursement
  • a learning and development stipend
  • opportunities to collaborate with and travel to our Palo Alto HQ and Bangkok Site
Read More
Arrow Right

Senior Platform Machine Learning Engineer

Machine learning is the crucial enabler for every financial service EarnIn provi...
Location
Location
United States , Mountain View
Salary
Salary:
232200.00 - 283800.00 USD / Year
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Engineering, or a related field, or relevant equivalent experience
  • 4+ years of industry machine learning experience and excellent software engineering skills
  • Strong programming skills in Python, with familiarity in ML frameworks such as TensorFlow or PyTorch
  • Experience with ML cloud platforms like AWS Sagemaker, Databricks, or GCP Vertex AI
  • Experience with LLM Ops, foundation model APIs, and AI engineering
  • Familiarity with data pipeline and workflow management tools
  • Strong communication and collaboration skills
  • Passion for learning and staying updated with the latest machine learning and platform engineering industry trends
Job Responsibility
Job Responsibility
  • Design, build, and maintain the ML and AI platform and tools to support the end-to-end machine learning lifecycle
  • Work closely with other machine learning engineers to understand their workflows, optimize model training and deployment processes, and ensure the reproducibility of results
  • Ensure scalability, reliability, cost efficiency, and ease of use of the machine learning platform
  • Contribute to evaluating and adopting new technologies and tools to enhance our machine-learning capabilities
  • Set examples of outstanding operational excellence. Be the catalyst for step-jump changes
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Personalization and Recommendations

As a Senior Machine Learning Engineer on the Personalization & Recommendations t...
Location
Location
United States , San Francisco
Salary
Salary:
183360.00 - 248000.00 USD / Year
edtechjobs.io Logo
EdTech Jobs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in applied machine learning or ML-heavy software engineering, with a strong focus on personalization, ranking, or recommendation systems
  • Demonstrated impact improving key metrics such as CTR, retention, or engagement through recommender or search systems in production
  • Strong hands-on skills in Python and PyTorch, with expertise in data and feature engineering, distributed training and inference on GPUs, and familiarity with modern MLOps practices — including model registries, feature stores, monitoring, and drift detection
  • Deep understanding of retrieval and ranking architectures, such as Two-Tower models, deep cross networks, Transformers, or MMoE, and the ability to apply them to real-world problems
  • Experience with large-scale embedding models and vector search, including FAISS, ScaNN, or similar systems
  • Proficiency in experiment design and evaluation, connecting offline metrics (AUC, NDCG, calibration) with online A/B test outcomes to drive product decisions
  • Clear, effective communication, collaborating well with product managers, data scientists, engineers, and cross-functional partners
  • A growth and mentorship mindset, helping elevate team quality in modeling, experimentation, and reliability
  • Commitment to responsible and inclusive personalization, ensuring our systems respect learner privacy, fairness, and diverse goals
Job Responsibility
Job Responsibility
  • Design and implement personalization models across candidate retrieval, ranking, and post-ranking layers, leveraging user embeddings, contextual signals and content features
  • Develop scalable retrieval and serving systems using architectures such as Two-Tower models, deep ranking networks, and ANN-based vector search for real-time personalization
  • Build and maintain model training, evaluation, and deployment pipelines, ensuring reliability, training–serving consistency, observability, and robust monitoring
  • Partner with Product and Data Science to translate learner objectives (engagement, retention, mastery) into measurable modeling goals and experiment designs
  • Advance evaluation methodologies, contributing to offline metric design (e.g., NDCG, CTR, calibration) and supporting rigorous A/B testing to measure learner and business impact
  • Collaborate with platform and infrastructure teams to optimize distributed training, inference latency, and serving cost in production environments
  • Stay informed on industry and research trends, evaluating opportunities to meaningfully apply them within Quizlet’s ecosystem
  • Mentor junior and mid-level engineers, supporting technical growth, experimentation rigor, and responsible ML practices
  • Champion collaboration, inclusion, curiosity, and data-driven problem solving, contributing to a healthy and productive team culture
What we offer
What we offer
  • 20 vacation days
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401k plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
  • 40 hours of annual paid time off to participate in volunteer programs of choice
  • Fulltime
Read More
Arrow Right

Senior ML Engineer

We take pride in our commitment to excellence and our dedication to providing ta...
Location
Location
Salary
Salary:
Not provided
ennova-research.com Logo
Ennova Research
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of production experience working in Data Science and Software Engineering
  • Understanding of data structures, data modeling and software architecture
  • Fluent in a at least two mainstream programming language (Python, Scala, Java, C++)
  • Experience developing/deploying ML solutions in one of the public cloud platforms (Google preferred)
  • Willingness to develop software applications exploiting the latest Generative AI technologies
  • Experience with deployment including knowledge of CI/CD, containerization, and related concepts
  • Ability to train junior team members in multiple Machine Learning and Deep Learning concepts
What we offer
What we offer
  • Competitive salary package
  • Professional growth opportunities
  • Training program
  • Teamwork and participation in team building activities
  • A dynamic, young, creative and stimulating environment
  • Open-minded and multicultural work environment
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right