CrawlJobs Logo

Research Engineer, Scaling

United States, Palo Alto 180000.00 - 300000.00 USD / Year · Job Posted December 01, 2025
Apply Position
Job Link Share

Job Description

As a Research Engineer, Scaling, you will design and build infrastructure to support training, evaluation, and deployment at scale across 1X’s fleet of robots. You will take experimental and prototype systems, and transform them into production‑grade systems capable of large‑scale training runs, reliable inference, and efficient edge deployment. Your work will directly impact throughput, latency, and model performance across both datacenter and on‑device environments.

Job Responsibility

  • Own and lead scaling of both distributed training and inference systems
  • Ensure compute resources are sufficient so that data, not hardware, is the limiter
  • Enable massive training at scale (1000+ GPUs) on robot data, handling fault tolerance, experiment tracking, distributed operations, and large datasets
  • Optimize inference throughput in datacenter contexts (e.g., for world models and diffusion engines)
  • Reduce latency and optimize performance for on‑device robot policies through techniques like quantization, scheduling, distillation, etc.

Requirements

  • Strong programming experience in Python and/or C++
  • Deep intuitive understanding of what affects training or inference speed: from bottlenecks to scaling laws
  • A mindset aligned with extremely high scaling: belief that scale is foundational to enabling humanoid robotics
  • Degree in Computer Science or a related field
  • Hands‑on experience with distributed training frameworks (e.g., TorchTitan, DeepSpeed, FSDP/ZeRO), multi‑node debugging, experiment management
  • Proven skills optimizing inference performance: graph compilers, batching/scheduling, serving systems (e.g., using TensorRT or equivalents)
  • Familiarity with quantization strategies: PTQ, QAT, INT8/FP8
  • tools like TensorRT, bitsandbytes, etc.
  • Experience writing or tuning CUDA or Triton kernels
  • understanding of hardware features like vectorization, tensor cores, and memory hierarchies

What we offer

  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Engineer, Scaling

8 matching positions

AI Research Engineer, Scaling

As a Research Engineer focused on Scaling, you will design and build robust infr...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 300000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming experience in Python and/or C++
  • Deep intuitive understanding of training and inference speed bottlenecks and scaling laws
  • A mindset aligned with extremely high scaling: belief that scale is foundational to enabling humanoid robotics
  • Degree in Computer Science or a related field
  • Experience with distributed training frameworks (e.g., TorchTitan, DeepSpeed, FSDP/ZeRO), multi-node debugging, and experiment management
  • Proven skills in optimizing inference performance using graph compilers, batching/scheduling, and serving systems like TensorRT or equivalents
  • Familiarity with quantization strategies (PTQ, QAT, INT8/FP8) and tools such as TensorRT and bitsandbytes
  • Experience developing or tuning CUDA or Triton kernels with understanding of hardware-level optimization (vectorization, tensor cores, memory hierarchies)
Job Responsibility
Job Responsibility
  • Own and lead scaling of distributed training and inference systems
  • Ensure compute resources are optimized to make data the primary constraint
  • Enable massive training runs (1000+ GPUs) using robot data, with robust fault tolerance, experiment tracking, and distributed operations
  • Optimize inference throughput for datacenter use cases such as world models and diffusion engines
  • Reduce latency and enhance performance for on-device robot policies using techniques such as quantization, scheduling, and distillation
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

Research Scientist / Engineer – Pre-training / Scaling

At Luma, the Pre-Training / Scaling team is responsible for building the core mu...
Location
Location
United States , Palo Alto
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in Python and PyTorch with experience building ML models from scratch
  • Deep understanding of multimodal generative models and deep learning architectures
  • (Preferred) Strong research track record in generative AI with published work in top-tier venues preferred
  • (Preferred) Experience with large-scale distributed training systems
Job Responsibility
Job Responsibility
  • Lead cutting-edge research in multimodal foundation models spanning video, image, text, and audio
  • Design and implement novel algorithms, architectures, and techniques for large-scale generative AI models
  • Develop training methodologies for foundation models across thousands of GPUs
  • Research and implement state-of-the-art techniques in Autoregressive LLMs, Vision Language Models, and / or Diffusion Models
  • Collaborate with cross-functional teams to transition research into production systems
  • Fulltime
Read More
Arrow Right

Research Engineer / Research Scientist - Foundations Retrieval Lead

The Foundations Research team works on high-risk, high-reward ideas that could s...
Location
Location
United States , San Francisco
Salary
Salary:
445000.00 - 555000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading high-performance teams of researchers or engineers in ML infrastructure or foundational research
  • Deep technical expertise in representation learning, embedding models, or vector retrieval systems
  • Familiarity with transformer-based LLMs and how embedding spaces can interact with language model objectives
  • Research experience in areas such as contrastive learning, supervised or unsupervised embedding learning, or metric learning
  • A track record of building or scaling large machine learning systems, particularly embedding pipelines in production or research contexts
  • A first-principles mindset for challenging assumptions about how retrieval and memory should work for large models
Job Responsibility
Job Responsibility
  • Lead research into embedding models and retrieval systems optimized for grounding, relevance, and adaptive reasoning
  • Manage a team of researchers and engineers building end-to-end infrastructure for training, evaluating, and integrating embeddings into frontier models
  • Drive innovation in dense, sparse, and hybrid representation techniques, metric learning, and learning-to-retrieve systems
  • Collaborate closely with Pretraining, Inference, and other Research teams to integrate retrieval throughout the model lifecycle
  • Contribute to OpenAI’s long-term vision of AI systems with memory and knowledge access capabilities rooted in learned representations
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Research Engineer / Software Engineer (platform/core infrastructure)

Build the future of offensive security with XBOW. Attackers are already using AI...
Location
Location
United States
Salary
Salary:
150000.00 - 350000.00 USD / Year
xbow.com Logo
Xbow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience building and operating scalable, distributed systems on cloud infrastructure such as AWS or similar
  • Comfortable working with infrastructure as code (e.g., Terraform, CDK)
  • A track record of performance tuning across cloud services, databases, and compute layers
  • Eager to learn new tools, languages, and technologies as needed
  • A thoughtful communicator who values clarity and simplicity and is comfortable working in a fast-paced startup and navigating ambiguity
  • Strong problem-solving skills and the ability to work with incomplete information
  • Curious, practical, and eager to work across layers of the stack when needed
  • You think proactively about failure modes and bring experience implementing disaster recovery and business continuity plans that keep critical systems running
Job Responsibility
Job Responsibility
  • Design and implement infrastructure systems that scale reliably and securely, and can be deployed across multiple cloud environments (AWS, Azure, OCI etc.) and contexts (SaaS, on prem)
  • Tune and optimize cloud services across compute, storage, networking, and observability to drive performance, reliability and maintainability of core services
  • Develop our core services, written in TypeScript, Kotlin and Go
  • Support large-scale systems with event driven architectures
  • Own problems end-to-end—from design through deployment to production support
  • Navigate ambiguity and help define how we build as much as what we build
  • Partner closely with other engineers, AI researchers and Security researchers to enable high-quality, high-velocity product development
  • Design for resilience by implementing disaster recovery and business continuity strategies that ensure uptime, even when things break
  • Improve how we build, deploy, and monitor services at scale
What we offer
What we offer
  • Competitive salary and a generous equity package
  • Career Growth: Shape your role, lead the function, and grow with the company
  • Meaningful Work: You will tackle technically complex challenges and play a pivotal role in the growth of our business
  • Fulltime
Read More
Arrow Right

Research Engineer / Software Engineer (backend)

Build the future of offensive security with XBOW. Attackers are already using AI...
Location
Location
United States
Salary
Salary:
150000.00 - 350000.00 USD / Year
xbow.com Logo
Xbow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience building and operating scalable, distributed systems
  • Comfort working in a fast-moving, early-stage environment
  • Strong problem-solving skills and the ability to work with incomplete information
  • Familiarity with AWS or similar cloud platforms
  • Comfort working with infrastructure as code (e.g., Terraform or CDK)
  • Eager to learn new tools, languages, and technologies as needed
  • A thoughtful communicator who values clarity and simplicity
Job Responsibility
Job Responsibility
  • Design and build distributed backend systems that scale reliably and securely
  • Work in TypeScript, Kotlin and Go
  • Deploy and operate services in AWS and other cloud providers
  • Own problems end-to-end—from design through deployment to production support
  • Navigate ambiguity and help define how we build as much as what we build
  • Collaborate closely with teammates across the stack, including AI researchers, Security researchers and frontend engineers
What we offer
What we offer
  • Competitive salary and a generous equity package
  • Career growth
  • Meaningful work
  • Remote work with support to travel to collaborate with colleagues in person
  • Fulltime
Read More
Arrow Right

Machine Learning Research Scientist / Research Engineer, Post-Training

Scale works with the industry’s leading AI labs to provide high quality data and...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
252000.00 - 315000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field
  • Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning
  • Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning
  • Excellent written and verbal communication skills
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
  • Previous experience in a customer facing role
Job Responsibility
Job Responsibility
  • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities
  • Design and experiment new approaches to preference optimization
  • Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness
  • Publish research findings in top-tier AI conferences
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • commuter stipend
  • Fulltime
Read More
Arrow Right

Research Engineer, Text Data Research - MSL FAIR

Meta is seeking AI research engineers to help us build the data foundation for M...
Location
Location
United States , Menlo Park
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry research experience in LLM/NLP or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross-functional impact, and/or influencing strategy across multiple teams
  • Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs
  • Demonstrated data infrastructure and software background, and experience building data tooling and services
  • Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Research Engineer, Media Data Research - MSL FAIR

Meta is seeking AI research engineers to help us build the data foundation for M...
Location
Location
United States , Menlo Park
Salary
Salary:
217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 1+ year of industry research experience in LLM/LMM, computer vision, or related AI/ML models
  • Experience owning and/or driving complex technical projects from end-to-end
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
  • Demonstrated data infrastructure and software background, and experience building data tooling and services
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right