CrawlJobs Logo

Research Engineer, Frontier Evals & Environments

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

205000.00 - 380000.00 USD / Year

Job Description:

The Frontier Evals & Environments team builds north star model environments to drive progress towards safe AGI/ASI. This team builds ambitious environments to measure and steer our models, and creates self-improvement loops to steer our training, safety, and launch decisions.

Job Responsibility:

  • Create ambitious RL environments to push our models to their limits
  • Work on measuring frontier model capabilities, skills, and behaviors
  • Develop new methodologies for automatically exploring the behavior of these models
  • Help steer training for our largest training runs, and see the future first
  • Design scalable systems and processes to support continuous evaluation
  • Build self-improvement loops to automate model understanding

Requirements:

  • Passionate and knowledgeable about AGI/ASI measurement
  • Strong engineering and statistical analysis skills
  • Able to think outside the box and have a robust “red-teaming mindset”
  • Experienced in ML research engineering, stochastic systems, observability and monitoring, LLM-enabled applications, and/or another technical domain applicable to AI evaluations
  • Able to operate effectively in a dynamic and extremely fast-paced research environment as well as scope and deliver projects end-to-end

Nice to have:

  • First-hand experience in red-teaming systems—be it computer systems or otherwise
  • An ability to work cross-functionally
  • Excellent communication skills
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Offers Equity
  • performance-related bonus(es) for eligible employees

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Engineer, Frontier Evals & Environments

Research Engineer, Frontier Evals & Environments - Finance

The Frontier Evals team builds north star model evaluations to drive progress to...
Location
Location
United States , San Francisco
Salary
Salary:
205000.00 - 380000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering and statistical analysis skills (with at least 2-3 years of full-time technical experience)
  • Passionate about evals for real world applications and knowledge work
  • Detail-oriented and thorough
  • Team player / willing to do a variety of tasks to move the team forward
  • Passionate and knowledgeable about AGI/ASI measurement
  • Able to operate effectively in a dynamic and extremely fast-paced research environment as well as scope and deliver projects end-to-end
Job Responsibility
Job Responsibility
  • Identify important model capabilities, skills, and behaviors that are crucial to financial workflows, and design methods to quantify performance in these areas
  • Own and pursue a research agenda to identify an important model capability (especially as it relates to financial reasoning) and build evals to measure it
  • Continuously refine evaluations of frontier AI models to assess the extent of frontier capabilities
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

AI Architect

We’re hiring an AI Architect to sit at the intersection of frontier AI research,...
Location
Location
United States , San Francisco; New York
Salary
Salary:
201600.00 - 241920.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep technical background in applied AI/ML: 5–10+ years in research, engineering, solutions engineering, or technical product roles working on LLMs or multimodal systems, ideally in high-stakes, customer-facing environments
  • Hands-on experience with model improvement workflows: demonstrated experience with post-training techniques, evaluation design, benchmarking, and model quality iteration
  • Ability to work on hard, ambiguous technical problems: proven track record of partnering directly with advanced customers or research teams to scope, reason through, and execute on deep technical challenges involving frontier models
  • Strong technical fluency: you can read papers, interrogate metrics, write or review complex Python/SQL for analysis, and reason about model-data trade-offs
  • Executive presence with world-class researchers and enterprise leaders
  • excellent writing and storytelling
  • Bias to action: you ship, learn, and iterate.
Job Responsibility
Job Responsibility
  • Translate research → product: work with client side researchers on post-training, evals, safety/alignment and build the primitives, data, and tooling they need
  • Partner deeply with core customers and frontier labs: work hands-on with leading AI teams and frontier research labs to tackle hard, open-ended technical problems related to frontier model improvement, performance, and deployment
  • Shape and propose model improvement work: translate customer and research objectives into clear, technically rigorous proposals—scoping post-training, evaluation, and safety work into well-defined statements of work and execution plans
  • Translate research into production impact: collaborate with customer-side researchers on post-training, evaluations, and alignment, and help design the data, primitives, and tooling required to improve frontier models in practice
  • Own the end-to-end lifecycle: lead discovery, write crisp PRDs and technical specs, prioritize trade-offs, run experiments, ship initial solutions, and scale successful pilots into durable, repeatable offerings
  • Lead complex, high-stakes engagements: independently run technical working sessions with senior customer stakeholders
  • define success metrics
  • surface risks early
  • and drive programs to measurable outcomes
  • Partner across Scale: collaborate closely with research (agents, browser/SWE agents), platform, operations, security, and finance to deliver reliable, production-grade results for demanding customers
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity based compensation.
  • Fulltime
Read More
Arrow Right

Researcher, Preparedness

The Preparedness team helps us prepare for the development of increasingly capab...
Location
Location
United States , San Francisco
Salary
Salary:
295000.00 - 445000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Passionate and knowledgeable about short-term and long-term AI safety risks
  • Ability to think outside the box and have a robust 'red-teaming mindset'
  • Experience in ML research engineering, ML observability and monitoring, creating large language model-enabled applications, and/or another technical domain applicable to AI risk
  • Able to operate effectively in a dynamic and extremely fast-paced research environment as well as scope and deliver projects end-to-end
Job Responsibility
Job Responsibility
  • Own the scientific validity of frontier preparedness capability evaluations—designing new evals grounded in real threat models (including high-consequence domains like CBRN as well as cyber and other frontier-risk areas), and maintaining existing evals so they don't stale or silently regress
  • Define datasets, graders, rubrics, and threshold guidance, and produce auditable artifacts (evaluation cards, capability reports, system-card inputs) that leadership can trust during high-stakes launches
  • Work on identifying emerging AI safety risks and new methodologies for exploring the impact of these risks
  • Build (and then continuously refine) evaluations of frontier AI models that assess the extent of identified risks
  • Design and build scalable systems and processes that can support these kinds of evaluations
  • Contribute to the refinement of risk management and the overall development of 'best practice' guidelines for AI safety evaluations
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Research Engineering Manager - Model Training

Perplexity is seeking a Research Engineering Manager to lead the team of all-sta...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 470000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience with large-scale LLMs and Deep Learning systems
  • Strong Python and PyTorch skills
  • Experience leading or managing research or engineering teams working on large-scale AI model development, including driving complex projects from idea to production
  • Self‑starter with a willingness to take ownership of tasks and navigate ambiguity in a fast‑moving environment
  • Passion for tackling challenging problems in AI model quality, speed, safety, and reliability
  • 10+ years of technical experience, with at least 2 of those years as a manager and at least 4 of those years working on large-scale AI model development
Job Responsibility
Job Responsibility
  • Lead a team of researchers and engineers focused on training SotA models for Perplexity-relevant use cases, leveraging the latest supervised and reinforcement learning techniques
  • Drive research and engineering efforts to develop production models through advanced model training and alignment techniques, including RL, SFT, and other approaches
  • Become deeply familiar with the team’s technical stack, leading from the front through hands-on technical contributions
  • Own the data, training, and eval pipelines required to train and continuously improve LLM models
  • Design and iterate on model training and finetuning algorithms (e.g., preference‑based methods, reinforcement learning from human or AI feedback) through an approach that balances scientific rigor and iteration velocity
  • Design evaluations and improve the production model training pipeline to reliably deliver models that lie on the Pareto frontier of speed and quality
  • Work closely with engineering teams to integrate in-house models into our product and rapidly iterate based on real‑world usage
  • Manage day‑to‑day execution, project planning, and prioritization for the model training team to hit ambitious quality and performance goals
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

Product Manager, Central Products

Meta Product Managers work with cross-functional teams of engineers, designers, ...
Location
Location
United States , Menlo Park
Salary
Salary:
205000.00 - 277000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years product management and/or Product Design
  • 10+ years of experience working collaboratively with engineering, design and user research teams
  • Experience navigating through the full product life-cycle, integrating customer feedback into product requirements, driving prioritization, and pre- and post-launch execution
  • Critical thinking and analytical leadership experience
  • Demonstrated proficiency using AI-enabled tools to build product artifacts at scale
  • Experience developing and championing AI-native strategies across organizations
  • Experience presenting to executive audiences
  • BA/BS in Computer Science or related field
Job Responsibility
Job Responsibility
  • Is the primary driver for identifying significant near and long-term opportunities in a large Product area, and driving product mission, strategies, and roadmaps in the context of broader organizational strategies and goals
  • Generate buy-in and drive consensus across organizations. Bring clarity and structure to ambiguous opportunities. Consistently demonstrate initiative and execute with limited oversight
  • Critically evaluate when AI is (and isn't) the optimal solution at portfolio level, setting the standard for rigorous tradeoff analysis
  • Translate AI capabilities into compelling, differentiated product visions that define market categories
  • Champion AI-native strategies including comprehensive evals and data strategies that enable org-wide continuous improvement
  • Drive product development with teams of engineers and designers, while maintaining team health
  • Work closely with cross-functional teams to drive product mission, define product requirements, coordinate resources from other groups (design, legal, etc.), develop roadmaps, and guide the team through key milestones
  • Reimagine workflows, responsibly using AI tools to transform team velocity and capability at organizational scale
  • Foster a culture of rapid experimentation and learning that becomes a competitive advantage
  • Scale AI best practices (including responsible AI use), workflows, and artifacts across the organization so capability compounds exponentially
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Ab Initio ETL Developer

The primary purpose of this role is to work on the development work for the Prod...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ Year Software development experience
  • 3+ Year Oracle PL/SQL or SQL experience
  • 3+ Year ETL Experience (AbInitio or Informatica). Hands on experience in developing complex ETL applications
  • Ab Initio experience is required
  • Proficiency with Oracle PL/SQL or SQL, SQL tuning, writing packages, triggers, functions and procedures
  • UNIX knowledge
  • Experience with data conversion / migration
  • Excellent trouble shooting and debugging skills
  • Worked in Onsite - offshore model
  • Strong analytic skills
Job Responsibility
Job Responsibility
  • Understanding Business Requirements and Functional Requirements provided by Business Analysts and to convert into Technical Design Documents and leading the development team to deliver on those requirements
  • Leading a Technical Team in Pune supporting GTPL in Product Processor Departments
  • Ensure projects Plans are created and PTS documentation is up to date
  • Work closely with Cross Functional Teams e.g. Business Analysis, Product Assurance, Platforms and Infrastructure, Business Office, Controls and Production Support. Prepare handover documents, manage SIT with oversight of UAT
  • Identify and proactively resolve issues that could impact system performance, reliability, and usability
  • Demonstrates an in-depth understanding of how the development function integrates within overall business/technology to achieve objectives
  • requires a good understanding of the industry
  • Work proactively & independently to address development requirements and articulate issues/challenges with enough lead time to address risks
  • Ability to understand complex data problems, analyze and provide generic solutions compatible with existing Infrastructure
  • Design, Implement, Integrate and test new features
  • Fulltime
Read More
Arrow Right

Senior Technical Program Manager

Join Microsoft AI, where we’re helping build Humanist Superintelligence—AI that ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 4+ years of experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • Bachelor's Degree AND 8+ years of experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • 6+ years of experience managing cross-functional and/or cross-team projects
  • 2+ year(s) of experience managing cross-functional and/or cross-team projects
  • 2+ year(s) of experience leading AI-focused products or programs with direct involvement in model evaluation strategies (Evals), optimization through loss function analysis (Losses), or overseeing data collection pipelines to support scalable training and validation workflows (Data Collection)
  • 2+ year(s) of experience reading and/or writing code (e.g., sample documentation, product demos)
  • Prior exposure to Large Language Models (LLMs), model evaluation, or human-in-the-loop systems
  • Experience working in a startup-like environment
Job Responsibility
Job Responsibility
  • Cross-Team Coordination: Partner with Research, Engineering, Data, and Product teams to align on goals, identify opportunities for enhancing model performance, share learnings, and drive execution at scale
  • Quality Frameworks: Build and run systems to assess and enhance model responses, such as annotation platforms and evaluation pipelines. Review performance metrics to find areas for improvement
  • Data-Driven Decision Making: Collaborate with analytics teams to define success metrics and correlate improvements to user impact
  • Operational Excellence: Manage all components of project planning, execution, and delivery by developing consistent processes that accommodate product complexity and organizational growth, ensuring coordination with overall product strategy and deadlines
  • Strategic Program Architecture: Partner with other TPgMs and leadership to design scalable processes and frameworks that extend beyond model response quality, supporting evolving priorities across Copilot. Maintain a deep understanding of the latest advancements in AI and Machine Learning
  • Fulltime
Read More
Arrow Right

Ab Initio ETL Developer

The primary purpose of this role is to work on the development work for the Prod...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ Year Software development experience
  • 3+ Year Oracle PL/SQL or SQL experience
  • 3+ Year ETL Experience (AbInitio or Informatica)
  • Hands on experience in developing complex ETL applications
  • Ab Initio experience is required
  • Proficiency with Oracle PL/SQL or SQL, SQL tuning, writing packages, triggers, functions and procedures
  • UNIX knowledge
  • Experience with data conversion / migration
  • Excellent trouble shooting and debugging skills
  • Worked in Onsite - offshore model
Job Responsibility
Job Responsibility
  • Understanding Business Requirements and Functional Requirements provided by Business Analysts and to convert into Technical Design Documents and deliver on those requirements
  • Leading a Technical Team in Pune supporting GTPL in Product Processor Departments
  • Ensure projects Plans are created and PTS documentation is up to date
  • Work closely with Cross Functional Teams e.g. Business Analysis, Product Assurance, Platforms and Infrastructure, Business Office, Controls and Production Support
  • Prepare handover documents, manage SIT with oversight of UAT
  • Identify and proactively resolve issues that could impact system performance, reliability, and usability
  • Demonstrates an in-depth understanding of how the development function integrates within overall business/technology to achieve objectives
  • requires a good understanding of the industry
  • Work proactively & independently to address development requirements and articulate issues/challenges with enough lead time to address risks
  • Ability to understand complex data problems, analyze and provide generic solutions compatible with existing Infrastructure
  • Fulltime
Read More
Arrow Right