CrawlJobs Logo

Researcher, Synthetic RL

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

295000.00 - 445000.00 USD / Year

Job Description:

The Synthetic RL team develops reinforcement learning methods that leverage synthetic data, environments, and feedback to train and evaluate frontier AI models. The team explores approaches such as self-play, simulators, and other synthetic evaluations to push model capability, generalization, and alignment beyond what is possible with the current prevailing methodology. As a Research Scientist on the Synthetic RL team, you will develop novel reinforcement learning techniques that use synthetic environments and feedback to improve large-scale models. You’ll work closely with other researchers to design experiments, analyze learning dynamics, and translate research insights into training approaches used in production systems. We’re looking for researchers who enjoy working on open-ended problems, value fast iteration, and want their work to directly shape how frontier models are trained.

Job Responsibility:

  • Research and develop reinforcement learning algorithms
  • Design and run experiments to study training dynamics and model behavior at scale
  • Collaborate with engineers and researchers to integrate successful approaches into model training pipelines

Requirements:

  • Strong background in reinforcement learning, machine learning research, or related fields
  • Strong engineering and statistical analysis skills

Nice to have:

  • Enjoy exploring new problem spaces where data, objectives, and evaluation are imperfect or evolving
  • Motivated by seeing research ideas influence real-world AI systems
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Offers Equity
  • Performance-related bonus(es) for eligible employees

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Researcher, Synthetic RL

New

Research Engineer - Reinforcement Learning

Building Open Superintelligence Infrastructure. Prime Intellect is building the ...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
Prime Intellect
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong background in AI/ML engineering, with extensive experience in designing and implementing end-to-end pipelines for the inference or training of large-scale AI models
  • Deep expertise in distributed inference techniques and frameworks (e.g. vllm, sglang) for optimizing the performance and scalability of AI workloads
  • Solid understanding of MLOps best practices, including model versioning, experiment tracking, and continuous integration/deployment (CI/CD) pipelines
  • Passion for advancing the state-of-the-art in reasoning and democratizing access to AI capabilities for researchers, developers, and businesses worldwide
Job Responsibility
Job Responsibility
  • Lead and participate in novel research to build a massive scale synthetic data generation pipeline and orchestration solution
  • Optimize the performance, cost, and resource utilization of AI inference workloads by leveraging the most recent advances for compute & memory optimization techniques
  • Contribute to the development of our open-source libraries and frameworks for synthetic data generation and distributed RL frameworks
  • Publish research in top-tier AI conferences such as ICML & NeurIPS
  • Distill highly technical project outcomes in layman approachable technical blogs to our customers and developers
  • Stay up-to-date with the latest advancements in AI/ML infrastructure and tools, synthetic data gen research and proactively identify opportunities to enhance our platform's capabilities and user experience
What we offer
What we offer
  • Competitive compensation, including equity incentives, aligning your success with the growth and impact of Prime Intellect
  • Flexible work arrangements, with the option to work remotely or in-person at our offices in San Francisco
  • Visa sponsorship and relocation assistance for international candidates
  • Quarterly team off-sites, hackathons, conferences and learning opportunities
  • Opportunity to work with a talented, hard-working and mission-driven team, united by a shared passion for leveraging technology to accelerate science and AI
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Reinforcement Learning

We’re looking for a curious and motivated Reinforcement Learning Intern to help ...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
wayve.ai Logo
Wayve
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently pursuing a PhD or Masters in Computer Science, Robotics, Electrical Engineering, or a related field, with a focus on Machine Learning, AI, or Computer Vision
  • Experience in research in Reinforcement Learning
  • Interest in one or more: synthetic data, representation learning, and Offline RL
  • Comfortable working in Python and libraries like PyTorch, NumPy, and Pandas
  • A principled mindset: you enjoy brainstorming, making assumptions, building, testing, and iterating on ideas to see what works
Job Responsibility
Job Responsibility
  • Help advance the next generation of decision-making systems for autonomous driving
  • Work embedded in a research team to develop scalable RL algorithms that enable vehicles to learn complex behaviors directly from experience — both in simulation and the real world
What we offer
What we offer
  • Competitive compensation and benefits
  • A dynamic and fast-paced work environment in which you will grow every day - learning on the job, from the a diverse team of the brightest researchers and engineers in this space
  • A culture that is ego-free, respectful and welcoming
  • Potential to publish your research work at a top flight conference
  • The chance to be part of a truly mission driven organisation and an opportunity to shape the future of autonomous driving
Read More
Arrow Right
New

Research Scientist: Post-Training

Pretraining gives us a general model. Post-training makes it useful, controllabl...
Location
Location
United States , San Mateo; Somerville
Salary
Salary:
200000.00 - 350000.00 USD / Year
generalistai.com Logo
Generalist AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with fine-tuning large models for downstream tasks (RLHF, IL, RL, distillation, domain adaptation, etc.)
  • Worked on embodied AI, robotics, or real-world ML systems
  • Care deeply about evaluation, benchmarking, and failure analysis
  • Comfortable debugging across the ML stack — from loss curves to robot behavior
  • Enjoy rapid iteration with real-world feedback loops
  • Want to bridge the gap between foundation models and physical deployment
Job Responsibility
Job Responsibility
  • Designing fine-tuning and adaptation strategies for downstream robotic tasks and embodiments
  • Developing methods for improving reliability, robustness, and controllability
  • Building evaluation frameworks that measure real-world robot performance, not just offline metrics
  • Improving inference-time performance (latency, stability, memory footprint) in collaboration with ML infrastructure
  • Leveraging techniques such as imitation learning, RL, distillation, synthetic data, and curriculum learning
  • Closing the loop between model outputs and physical-world outcomes
What we offer
What we offer
  • Offers Equity
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Agents Modeling

We’re looking for an experienced machine learning researcher / engineer who can ...
Location
Location
United States , New York
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have a PhD in computer science or related field or similar industry research experience
  • Strong software engineering skills
  • Proficiency in Python and experience with ML-related code (e.g., pytorch, numpy, etc.)
  • Experience with LLMs and agentic frameworks
  • Experience with post-training LLMs (SFT, PEFT, or RL*)
  • Experience with building synthetic data generation pipelines
Job Responsibility
Job Responsibility
  • Design and develop novel agentic solutions
  • Improve upon SOTA on hard agentic tasks
  • Research the next-generation of on-line learning-from-experience self-improvement
  • Work with partner teams (Reasoning, Post-training, Pre-training, etc.) to improve performance of agentic system
  • Work with an amazing team of researchers and engineers pushing the boundaries
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Next Generation Agents

Agentic LLM systems are being deployed widely across enterprise companies includ...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills
  • Proficiency in Python and have some experience with ML-related code (e.g., pytorch, numpy, etc.)
  • Experience with LLMs and agentic frameworks
  • Experience with post-training LLMs (SFT, PEFT, or RL*)
  • Experience with building synthetic data generation pipelines
Job Responsibility
Job Responsibility
  • Design and develop novel agentic solutions
  • Improve upon SOTA on hard agentic tasks
  • Research the next-generation of on-line learning-from-experience self-improvement
  • Work with partner teams (Reasoning, Post-training, Pre-training, etc.) to improve performance of agentic system
  • Work with an amazing team of researchers and engineers pushing the boundaries
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

AI Applied Scientist

Figma is growing our team of passionate creatives and builders on a mission to m...
Location
Location
United States , San Francisco; New York
Salary
Salary:
149000.00 - 350000.00 USD / Year
figma.com Logo
Figma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience in building generative AI features through prompt engineering, and fine tuning models in production environments
  • Experience working on deep learning and generative AI frameworks like PyTorch, JAX, HuggingFace etc
  • Experience training LLMs with Reinforcement Learning techniques such as preference-based RL (DPO, PPO) and/or RL with verifiable rewards (RLVR) such as GRPO/DAPO
  • 4+ years in Generative AI, and 6+ years of experience in one or more of the following areas: machine learning, natural language processing/understanding, computer vision
  • Strong software engineering skills with 5+ years of experience in programming languages (Python, C++, Java or R)
  • Experience communicating and working across functions to drive solutions
Job Responsibility
Job Responsibility
  • Driving fundamental and applied research in AI
  • Combining industry best practices and a first-principles approach to build cutting edge Generative AI models, using techniques like Supervised Finetuning (SFT), Reinforcement Learning (RL), prompt improvements and synthetic data generation
  • Work in concert with product and infrastructure engineers to improve Figma’s products through AI powered features
  • Collaborate closely with product managers and engineers to transform user feedback into requirements for AI systems
  • Build evaluation systems to measure and improve quality of AI features in Figma products
What we offer
What we offer
  • Equity
  • Health, dental & vision
  • Retirement with company contribution
  • Parental leave & reproductive or family planning support
  • Mental health & wellness benefits
  • Generous PTO
  • Company recharge days
  • Learning & development stipend
  • Work from home stipend
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right
New

Research Engineer, Atlas Physics Simulation for RL

Are you passionate about using physics simulation, reinforcement learning and hu...
Location
Location
United States , Waltham
Salary
Salary:
Not provided
bostondynamics.com Logo
Boston Dynamics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MS with 3 years of industry experience or PhD in Computer Science, Machine Learning, Robotics, or a related field
  • Detailed understanding physics simulation including contact solvers and mesh representations
  • Extensive experience with physics simulation including MuJoCo, IsaacSim and Warp
  • Experience with rendering pipelines to create photorealistic synthetic images
  • Strong foundation in Python, C++ and modern numerical frameworks (e.g., PyTorch and Jax)
  • Experience in algorithm design, debugging, and performance optimization
Job Responsibility
Job Responsibility
  • Develop large scale physics simulation that can efficiently train reinforcement learning agents
  • Evaluate and benchmark different physics simulators, mesh representations and rendering pipelines
  • Scale physics simulations to generate millions of samples per second
  • Build synthetic rendering pipelines to create photorealistic images
  • Extend existing physics simulators to improve simulation quality
What we offer
What we offer
  • Direct access to cutting-edge robots and the infrastructure to run large-scale experiments
  • A collaborative, mission-driven team where your ideas have real impact
  • The chance to help define what’s possible in real-world robotics
  • Fulltime
Read More
Arrow Right
New

Machine Learning Research Engineer, Agent Data Foundation - Enterprise GenAI

Join the team shaping the future of AI at Scale. The Enterprise ML Research Lab ...
Location
Location
United States , San Francisco; New York
Salary
Salary:
218400.00 - 273000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of building with LLMs in a production environment
  • Clear experiences with constructing high quality data to use to improve an LLM/Agent
  • Publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years
  • PhD or Masters in Computer Science or a related field
Job Responsibility
Job Responsibility
  • Build synthetic data pipelines to generate enterprise environments to use for RL post-training
  • Create agents to convert traces from production into actionable insights to use to improve agents
  • Contribute to our agent building product which can construct other agents using coding agents + proprietary algorithms
  • Train state of the art models, developed both internally and from the community, to deploy to our enterprise customers
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • Fulltime
Read More
Arrow Right