CrawlJobs Logo

ML Research Engineer, ML Systems

scale.com Logo

Scale

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

218400.00 - 273000.00 USD / Year

Job Description:

Scale’s ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and operators for fast and automatic training and evaluation of LLM's, as well as evaluation of data quality. Scale is uniquely positioned at the heart of the field of AI as an indispensable provider of training and evaluation data and end-to-end solutions for the ML lifecycle. You will work closely across Scale’s ML teams and researchers to build the foundation platform that supports all our ML research and development. You will be building and optimizing the platform to enable our next generation of LLM training, inference and data curation.

Job Responsibility:

  • Build, profile and optimize our training and inference framework
  • Collaborate with ML teams to accelerate their research and development and enable them to develop the next generation of models and data curation
  • Research and integrate state-of-the-art technologies to optimize our ML system

Requirements:

  • Strong excitement about system optimization
  • Experience with multi-node LLM training and inference
  • Experience with developing large-scale distributed ML systems
  • Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.
  • Strong written and verbal communication skills and the ability to operate in a cross functional team environment

Nice to have:

Demonstrated expertise in post-training methods &/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc.

What we offer:
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • additional benefits such as a commuter stipend
  • equity based compensation

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for ML Research Engineer, ML Systems

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

AI Research Engineer

We're seeking a Research Engineer to conduct innovative research in key AI areas...
Location
Location
United Kingdom
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with significant AI/ML focus
  • Demonstrated research experience through publications, open-source contributions, or impactful projects
  • Strong engineering fundamentals and experience implementing AI systems in production environments
  • Deep knowledge of LLM evaluation methodologies, alignment techniques, and model optimization approaches
  • Experience with model fine-tuning, adapters, quantization, and distillation frameworks
  • Self-motivation and ability to define and pursue research directions independently
  • Excellent understanding of current challenges in AI safety, reliability, and alignment
  • Strong communication skills and ability to explain complex research concepts clearly
  • Passion for staying current with the rapidly evolving AI research landscape
Job Responsibility
Job Responsibility
  • Lead independent research projects in AI evaluation methodologies, alignment techniques, and synthetic data generation
  • Design and implement novel evaluation frameworks for LLMs and agent systems that are grounded in human data
  • Contribute to the academic AI community through publications and open-source contributions
  • Stay at the forefront of AI research and pioneer innovative approaches to tackle pressing open challenges in the field
  • Design and conduct rigorous experiments to study AI models and systems with sound methodological approaches
  • Develop scalable frameworks for systematic evaluation of model behaviours and capabilities
  • Create tools and frameworks that transform research insights into practical applications
  • Build infrastructure to support large-scale research experiments when needed
  • Apply knowledge of model fine-tuning, optimization techniques, distillation, and other ML engineering practices to support research goals
  • Work closely with ML engineers, data scientists, and product teams to translate research insights into practical applications
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
Read More
Arrow Right

Machine Learning/AI Research Engineer

Machine Learning/AI Research Engineer position focusing on advancing renewable o...
Location
Location
Ireland , Galway; Dublin
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD degree (or Master's with equivalent research and innovation experience) in a relevant discipline (e.g., computer science, software engineering, electrical engineering, math, physics, statistics, mechanical engineering, etc.)
  • Proven record of innovation with Deep Learning or with scientific and engineering computation involving the application of Machine Learning
  • Demonstrated experience with innovative solution development, developing proofs-of-concept, first-of-a-kind solutions, and technology transfer
  • Strong software development skills in Python and Pytorch
  • Strong application experience of Machine Learning with physical systems
  • Good understanding of digital twins, use of ML with Digital Twins, applications to sustainability
  • A strong science or engineering background with aptitude for system level analysis and modeling
  • Deep understanding of the relevant environment, ecosystem, trends, and literature
  • Excellent research and development skills
  • Ability to innovate, make research contributions, and bring ideas to reality in compelling ways
Job Responsibility
Job Responsibility
  • Develop and program integrated software algorithms to structure, analyze and leverage structured and unstructured data in product and systems applications
  • Work with large scale computing frameworks, data analysis systems, and modeling environments
  • Use machine learning and statistical modeling techniques to improve product/system performance
  • Formulate descriptive, diagnostic, predictive and prescriptive insights/algorithms and translate technical specifications into code
  • Apply, optimize and scale deep learning technologies and algorithms to give computers the capability to visualize, learn and respond to complex situations
  • Document procedures for installation and maintenance, complete programming, perform testing and debugging, define and monitor performance metrics
  • Provide thought leadership and technical influence internally and externally
  • Take innovative ideas and make them real – contributing along the full range from conception, to design, development, implementation, evaluation, and technology transfer
  • Collaborate with Hewlett Packard Labs' research teams and external partners
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Research Engineer, GenAI

You will be part of Kiddom’s Data Science team, building the foundation of our s...
Location
Location
United States , San Francisco; New York
Salary
Salary:
175000.00 - 250000.00 USD / Year
kiddom.co Logo
Kiddom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience applying machine learning to solve real-world problems with large, complex datasets
  • 1–2 years in a technical leadership role
  • Proven track record designing, evaluating, and deploying ML/AI systems in production environments that drive measurable business impact, ideally in recommendation, personalization, search, or workflow optimization
  • Strong programming skills in Python
  • Fluency in data manipulation (SQL, Pandas) and common ML toolkits (scikit-learn, XGBoost, TensorFlow/PyTorch)
  • Strong analytical skills and ability to break down complex problems into measurable hypotheses and experiments
  • Excellent communication skills with a history of cross-functional collaboration with product, design, and engineering stakeholders
Job Responsibility
Job Responsibility
  • Architect and scale machine learning systems for search, personalization, and recommendations that power Kiddom’s teacher helper and insight engine
  • Develop evaluation-first development workflows to measure how models improve teaching efficiency, lesson planning, and student learning outcomes
  • Fine-tune machine learning models with feedback signals from teachers and students to align outputs with instructional goals and classroom needs
  • Design intelligent discovery pipelines that combine semantic retrieval, curriculum alignment, and real-time personalization
  • Build agentic assistants that help teachers plan lessons, adapt instruction, and reduce repetitive tasks
  • Collaborate closely with product managers, designers, and curriculum experts to translate high-level educational goals into scalable ML-powered systems
  • Coach and mentor junior ML engineers and data scientists, fostering technical and professional growth
What we offer
What we offer
  • Meaningful equity
  • Health insurance benefits: medical (various PPO/HMO/HSA plans), dental, vision, disability and life insurance
  • One Medical membership (in participating locations)
  • Flexible vacation time policy (subject to internal approval). Average use 4 weeks off per year
  • 10 paid sick days per year (pro rated depending on start date)
  • Paid holidays
  • Paid bereavement leave
  • Paid family leave after birth/adoption. Minimum of 16 paid weeks for birthing parents, 10 weeks for caretaker parents. Meant to supplement benefits offered by State
  • Commuter and FSA plans
  • Fulltime
Read More
Arrow Right

Machine Learning Research Engineer

You will be part of Kiddom’s Data Science team, building the foundation of our s...
Location
Location
United States , San Francisco; New York
Salary
Salary:
175000.00 - 250000.00 USD / Year
kiddom.co Logo
Kiddom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have 5+ years of industry experience applying machine learning to solve real-world problems with large, complex datasets, with 1–2 years in a technical leadership role
  • Proven track record designing, evaluating, and deploying ML/AI systems in production environments that drive measurable business impact, ideally in recommendation, personalization, search, or workflow optimization
  • Strong programming skills in Python and fluency in data manipulation (SQL, Pandas) and common ML toolkits (scikit-learn, XGBoost, TensorFlow/PyTorch)
  • Strong analytical skills and ability to break down complex problems into measurable hypotheses and experiments
  • Excellent communication skills with a history of cross-functional collaboration with product, design, and engineering stakeholders
Job Responsibility
Job Responsibility
  • Architect and scale machine learning systems for search, personalization, and recommendations that power Kiddom’s teacher helper and insight engine
  • Develop evaluation-first development workflows to measure how models improve teaching efficiency, lesson planning, and student learning outcomes
  • Fine-tune machine learning models with feedback signals from teachers and students to align outputs with instructional goals and classroom needs
  • Design intelligent discovery pipelines that combine semantic retrieval, curriculum alignment, and real-time personalization
  • Build agentic assistants that help teachers plan lessons, adapt instruction, and reduce repetitive tasks
  • Collaborate closely with product managers, designers, and curriculum experts to translate high-level educational goals into scalable ML-powered systems
  • Coach and mentor junior ML engineers and data scientists, fostering technical and professional growth
What we offer
What we offer
  • Competitive salary
  • Meaningful equity
  • Health insurance benefits: medical (various PPO/HMO/HSA plans), dental, vision, disability and life insurance
  • One Medical membership (in participating locations)
  • Flexible vacation time policy (subject to internal approval). Average use 4 weeks off per year
  • 10 paid sick days per year (pro rated depending on start date)
  • Paid holidays
  • Paid bereavement leave
  • Paid family leave after birth/adoption. Minimum of 16 paid weeks for birthing parents, 10 weeks for caretaker parents. Meant to supplement benefits offered by State
  • Commuter and FSA plans
  • Fulltime
Read More
Arrow Right

AI Researcher, Core ML

As an AI Researcher, you will be pushing the frontier of foundation model resear...
Location
Location
United States , San Francisco
Salary
Salary:
160000.00 - 230000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong background in Machine Learning
  • Experience in building state-of-the-art models at large scale
  • Experience in developing algorithms in areas such as optimization, model architecture, and data-centric optimizations
  • Passion in contributing to the open model ecosystem and pushing the frontier of open models
  • Excellent problem-solving and analytical skills
  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or a related field
Job Responsibility
Job Responsibility
  • Develop novel architectures, system optimizations, optimization algorithms, and data-centric optimizations, that significantly improve over state-of-the-art
  • Take advantage of the computational infrastructure of Together to create the best open models in their class
  • Understand and improve the full lifecycle of building open models
  • release and publish your insights (blogs, academic papers etc.)
  • Collaborate with cross-functional teams to deploy your models and make them available to a wider community and customer base
  • Stay up-to-date with the latest advancements in machine learning
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other competitive benefits
  • Fulltime
Read More
Arrow Right

Ai / Ml Engineer Iii

Leads R&D efforts, develops scalable AI systems, and contributes to strategic pr...
Location
Location
Serbia , Belgrade
Salary
Salary:
Not provided
everseen.ai Logo
Everseen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of programming experience in Python
  • Familiar with some of computer vision, machine learning and deep learning libraries (e.g. OpenCV, TensorFlow, Keras, PyTorch, Scikit-learn…)
  • Deep understanding of ML/AI architectures and applications
  • Strong in experimentation, scalability, and real-time systems
  • Advanced problem-solving and engineering judgment
  • Understands the latest innovations in machine learning and computer vision
  • Ability to design complex experiments and to communicate them clearly to other team members
  • Ability to design and develop components addressing the needs their research identified
  • Implement algorithms and solutions presented in papers
  • Advance projects to production (from prototype/research phase to production phase)
Job Responsibility
Job Responsibility
  • Owns and maintains major components and complex features of a project
  • Drives research initiatives and develops innovative algorithms
  • Implements advanced algorithms with awareness of tradeoffs
  • Coaches team members on ML concepts, best practices, and procedures
  • Occasionally collaborates cross-functionally on high-impact projects to integrate AI technologies into products and services
  • Demonstrates a strong understanding of how components and features impact the business, and ensures work is aligned with overall project outcomes
  • Stay updated on the latest AI trends
  • Write and review research reports and experiment results
  • Fulltime
Read More
Arrow Right

ML Platform Engineer

We are seeking a Machine Learning Engineer to help build and scale our machine l...
Location
Location
United States
Salary
Salary:
Not provided
duettocloud.com Logo
Duetto
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in ML engineering or a similar role building and deploying machine learning models in production
  • Strong experience with AWS ML services (SageMaker, Lambda, EMR, ECR) for training, serving, and orchestrating model workflows
  • Hands-on experience with Kubernetes (e.g., EKS) for container orchestration and job execution at scale
  • Strong proficiency in Python, with exposure to ML/DL libraries such as TensorFlow, PyTorch, scikit-learn
  • Experience working with feature stores, data pipelines, and model versioning tools (e.g., SageMaker Feature Store, Feast, MLflow)
  • Familiarity with infrastructure-as-code and deployment tools such as Terraform, GitHub Actions, or similar CI/CD systems
  • Experience with logging and monitoring stacks such as Prometheus, Grafana, CloudWatch, or similar
  • Experience working in cross-functional teams with data scientists and DevOps engineers to bring models from research to production
  • Strong communication skills and ability to operate effectively in a fast-paced, ambiguous environment with shifting priorities
Job Responsibility
Job Responsibility
  • Develop, maintain, and scale machine learning pipelines for training, validation, and batch or real-time inference across thousands of hotel-specific models
  • Build reusable components to support model training, evaluation, deployment, and monitoring within a Kubernetes- and AWS-based environment
  • Partner with data scientists to translate notebooks and prototypes into production-grade, versioned training workflows
  • Implement and maintain feature engineering workflows, integrating with custom feature pipelines and supporting services
  • Collaborate with platform and DevOps teams to manage infrastructure-as-code (Terraform), automate deployment (CI/CD), and ensure reliability and security
  • Integrate model monitoring for performance metrics, drift detection, and alerting (using tools like Prometheus, CloudWatch, or Grafana)
  • Improve retraining, rollback, and model versioning strategies across different deployment contexts
  • Support experimentation infrastructure and A/B testing integrations for ML-based products
Read More
Arrow Right