CrawlJobs Logo

Research Scientist, Agent Robustness

United States, San Francisco 197400.00 - 246750.00 USD / Year · Job Posted March 22, 2026
Apply Position
Job Link Share

Job Description

As a Research Scientist working on Agent Robustness you will work on the fundamental challenges of building AI agents that are safe and aligned with humans.

Job Responsibility

  • Research the science of AI agent capabilities with a focus on safety, risk factors, and benchmarking methodologies
  • Design and build harnesses to test AI agents’ tendency to take harmful actions
  • Design and build exploits and mitigations for new failure modes
  • Characterize and design mitigations for potential failure modes of systems involving multiple interacting AI agents

Requirements

  • Commitment to mission of promoting safe, secure, and trustworthy AI deployments
  • Practical experience conducting technical research collaboratively
  • Experience building and leveraging agent scaffolding, designing evaluation harnesses, and quickly turning new ideas into working prototypes
  • Experience with post-training and RL techniques such as RLHF, DPO, GRPO
  • A track record of published research in machine learning, particularly in generative AI
  • At least three years of experience addressing sophisticated ML problems
  • Strong written and verbal communication skills

Nice to have

  • Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect
  • Experience with red-teaming, prompt injection, or adversarial testing of AI systems

What we offer

  • Comprehensive health, dental and vision coverage
  • Retirement benefits
  • Learning and development stipend
  • Generous PTO
  • Commuter stipend
  • Equity grant

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Scientist, Agent Robustness

8 matching positions

Research Scientist / Engineer — Multimodal Agent

This is a rare and foundational opportunity to define the future of multimodal A...
Location
Location
United States , Palo Alto
Salary
Salary:
250000.00 - 450000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong foundation in machine learning, foundation models and agentic systems
  • Deep understanding of agentic systems and approaches in LLM/VLM reasoning, coding models, LLM/VLM tool calling
  • Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets)
Job Responsibility
Job Responsibility
  • Architect large-scale multimodal agentic models that use reasoning, planning, coding, and tool calling to achieve complex, multi-step multimodal work
  • Hillclimbing existing tasks and formulating new tasks through data
  • Design, implement, and run robust data pipelines for constructing, enriching, and filtering massive pixel datasets
  • Train large-scale multimodal models on massive datasets and GPU clusters
  • Define and build novel evaluation frameworks to measure multimodal agents
  • Fulltime
Read More
Arrow Right

Research Scientist - Large Language Model

This is a rare opportunity to help define the future of large-scale language mod...
Location
Location
United States , Palo Alto
Salary
Salary:
250000.00 - 450000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong foundation in machine learning and large language models
  • Deep understanding of autoregressive transformers and large-scale training dynamics
  • Experience with pre-training large models and/or post-training techniques such as instruction tuning, RLHF, preference optimization, or distillation
  • Hands-on experience with PyTorch and distributed training at scale
  • Comfortable operating across research and production environments
Job Responsibility
Job Responsibility
  • Architect and scale large autoregressive language models
  • Design improved pre-training objectives to enhance reasoning, knowledge retention, and compositional generalization
  • Develop mid-training strategies such as continued pre-training, domain adaptation, curriculum learning, and synthetic data integration
  • Advance post-training techniques, including instruction tuning, preference optimization, reinforcement learning, distillation, and inference-time compute scaling
  • Study and improve long-context modeling, planning depth, and multi-step reasoning behavior
  • Curate and construct massive, high-quality text corpora for pre-training
  • Design synthetic data pipelines for reasoning, tool use, mathematics, coding, and structured problem solving
  • Develop filtering, mixture weighting, and curriculum strategies that shape emergent capabilities
  • Formulate new tasks that improve coherence, logical consistency, factual grounding, and robustness
  • Train frontier-scale language models across large GPU clusters
  • Fulltime
Read More
Arrow Right

Ai Research Scientist, Video Generation And Post Training, Fair

Meta is seeking a Research Scientist to join the Fundamental AI Research (FAIR) ...
Location
Location
United States , Menlo Park
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD or equivalent experience in Computer Science, Electrical Engineering, or a related field
  • Demonstrated expertise in video generation, computer vision, or multimodal AI
  • Experience with large-scale model training, post-training optimization techniques, and data curation
  • Publication record in relevant fields
Job Responsibility
Job Responsibility
  • Conduct fundamental and applied research in video generation, including generative models, video synthesis, and multimodal learning
  • Develop and optimize post-training paradigms for large-scale video and multimodal models, improving their performance, robustness, and generalization
  • Collaborate with teams across Meta to build perceptual foundations for real-time embodied agents and conversational AI
  • Contribute to the development and deployment of frontier models (e.g., Llama, LMMs) and push the boundaries of video and media generation
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Staff Applied Research Scientist - Martech AI

We are seeking a highly experienced and strategic Staff Applied Research Scienti...
Location
Location
United States , New York City; Palo Alto; Chevy Chase; Dallas; Seattle; Chicago; Austin
Salary
Salary:
130000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS in Computer Science, Machine Learning, Statistics, Mathematics, or a related quantitative field, PhD is preferred
  • At least 6+ years of professional experience delivering applied AI/ML solutions in production, including leading cross-functional initiatives with enterprise impact
  • Demonstrated ability to leverage LLMs and agentic AI systems to develop and deploy personalized marketing solutions, including individualized content generation, campaign targeting, and optimization of customer engagement/retention/conversion strategies
  • Strong proficiency in Python and SQL
  • deep experience with ML frameworks such as PyTorch, TensorFlow, and Scikit-learn
  • Demonstrated experience establishing KPI frameworks, experimentation, and causal analysis to quantify model impact and inform prioritization
  • Excellent stakeholder management skills with a track record of driving alignment and adoption across product, engineering, and business teams
  • Strong written and verbal communication skills
  • ability to set the right context and explain complex technical topics to varied audiences, including executives
Job Responsibility
Job Responsibility
  • Identify High-Impact Opportunities: Proactively surface and shape high-value AI/ML initiatives by engaging with product, engineering, and operations to align technical roadmaps with strategic business goals
  • Architecture & Technical Direction: Provide architectural leadership for AI/ML solutions impacting multiple stakeholders. Establish standards for scalability, reliability, observability, compliance, and cost efficiency across online and batch systems
  • Development & Productionization: Lead end-to-end delivery of AI/ML solutions, including model design, data pipelines, feature stores, evaluation, deployment, A/B testing, and monitoring in real-time and batch environments. Ensure clear plans, milestones, and on-time delivery
  • ROI Measurement & Experimentation: Establish robust mechanisms to quantify business impact, including KPI definition, experimentation frameworks, and causal inference approaches to guide decision-making and prioritize investments
  • Innovation & Research Integration: Stay current with cutting-edge research in ML, GenAI, and optimization. Prototype and harden novel techniques that push the boundaries of innovation within GEICO’s insurance ecosystem
  • Set technical direction for multi-quarter research initiatives
  • build evaluation frameworks, ensure reproducibility/responsible AI, and drive cross-functional adoption
  • shepherd patents
  • Cross-Functional Collaboration: Champion collaboration across Product, Engineering, Data Platform, Governance, Legal, and Operations to ensure responsible, compliant, and effective adoption of AI systems
  • Mentorship & Capability Building: Mentor junior and senior scientists, elevate technical standards (coding, testing, documentation, reproducibility), and foster a culture of scientific rigor and engineering excellence
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Research Scientist / Engineer — Foundation Model

This is a rare and foundational opportunity to define the future of multimodal A...
Location
Location
United States , Palo Alto
Salary
Salary:
Not provided
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong foundation in machine learning, foundation models and agentic systems
  • Deep understanding of agentic systems and approaches in LLM/VLM reasoning, coding models, LLM/VLM tool calling
  • Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets)
  • Able to contribute continuous 6 months in the internship
Job Responsibility
Job Responsibility
  • Architect large-scale multimodal agentic models that use reasoning, planning, coding, and tool calling to achieve complex, multi-step multimodal work
  • Design, implement, and run robust data pipelines for constructing, enriching, and filtering agentic datasets
  • Train large-scale multimodal agents on massive datasets and GPU clusters
  • Define and build novel evaluation frameworks to measure multimodal agents
  • Fulltime
Read More
Arrow Right

Research Scientist II

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in relevant field AND 1+ year(s) related research experience
  • Bachelor's Degree in relevant field AND 2+ years related research experience
  • OR equivalent experience
  • Proven ability to communicate complex technical concepts to diverse stakeholders
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Doctorate in relevant field OR Master's Degree in relevant field AND 3+ years related research experience OR Bachelor's Degree in relevant field AND 5+ years related research experience OR equivalent experience
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Strong track record in cross-functional product development and delivering measurable impact through data-driven iteration
Job Responsibility
Job Responsibility
  • Build and expand collaborative partnerships across product, engineering, and research groups inside and outside Microsoft
  • Provide growing expertise that accelerates technology transfer, strengthens data security practices, and advances internal tools, benchmarking efforts, patent filings, and whitepaper development
  • Contribute to cutting-edge research by collaborating with peers and engineering teams to advance existing projects, develop new ideas, and publish high-quality papers
  • Coauthor or lead publications for top-tier conferences and journals with impact on par with postdoctoral research output
  • Drive research projects to completion, delivering novel algorithms, prototypes, theories, datasets, tools, or insights that meaningfully advance one or more open research problems
  • Uphold Microsoft’s commitments to security, ethics, and privacy by incorporating responsible research practices into data collection, experimentation, and system design
  • Support the development of trustworthy, robust, privacy-preserving, and ethically aligned technologies
  • Help define clear research problems and goals, contributing to the formulation of compelling problem statements and feasible research plans with measurable impact
  • Develop deep understanding of the state of the art, tracking new methods, tools, and breakthroughs in the research community
  • Contribute domain expertise in multiple specialized techniques to guide project planning, scoping, and execution
  • Fulltime
Read More
Arrow Right

Data Scientist - Applied AI Research

The Data Scientist - Applied AI Research will ideate, design, and develop NLP ac...
Location
Location
United States , Westlake
Salary
Salary:
Not provided
fidelity.com Logo
Fidelity Investments
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1+ years of Data Science experience, specializing with NLP
  • 1+ years of generative AI experience including LLMs, Agents, MCPs, etc
  • 1+ years of experience working in an Agile environment
  • 1+ years of experience with AWS products in a Linux environment
  • Experience developing and optimizing solutions from transformer-based, fastText-based or ensemble based models
  • Experience with Pytorch
  • Understanding of text representation techniques and classification algorithms
  • Deep understanding of experiment design and documentation
  • Statistical acumen and experience applying statistical concepts to data science experiments
  • Deep knowledge of machine learning algorithms, with the ability to choose the optimal algorithm for a given problem
Job Responsibility
Job Responsibility
  • Work closely with the Agile team members to bring ML solutions into the product
  • Benchmark and optimize existing ML solutions performance (e.g, model footprint or latency)
  • Deliver reports on a sprint cadence
  • Peer review code and reports written by teammates
  • Bring good ideas during brainstorming sessions
  • Fulltime
Read More
Arrow Right

Principal/Senior Applied Scientist Security Models Training Team - Next-Gen frontier research

The Security Models Training team is expanding to drive the development of a new...
Location
Location
Israel , Tel Aviv, Herzliya
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • M.Sc. / Ph.D. in Computer Science, Information Systems, Electrical or Computer Engineering or Data Science (Ph.D. strongly preferred)
  • Candidates with M.Sc. / Ph.D. in related fields with proven industry experience or a strong publication record in the areas of LLM, Information Retrieval, Machine Learning, Natural Language Processing, Time Series Forecasting and Deep Learning are considered as well
  • Proven hands-on experience of at least 5 years (including post-grad work) in building and deploying Machine Learning products
  • Key areas of expertise include Natural Language Processing and Large Language Models, along with an understanding of concepts such as Privacy and Responsible AI
  • Candidates are expected to demonstrate a strong history of successfully translating applied research into production-ready solutions, along with a proven track record of delivering projects within large-scale production environments
  • Proven expertise in the LLM and/or time-series forecasting domain, demonstrating comprehensive knowledge of relevant concepts in the domain
  • Ideal applicants should be proficient in areas such as LLM’s pre and post training, including CPT, SFT and RL, LLM benchmarking, agentic flows, and model alignment
  • Hands-on experience in building neural model architectures at the 100M+ scale and the proficiency to adapt them at all abstraction levels down the individual block (e.g. changing the innerworkings of an attention block, introducing new blocks, or changing the routings)
  • Demonstrated proficiency in problem-solving and data analysis, with substantial expertise in evaluating the performance of large language models (LLMs) and/or time-series forecasting models, developing benchmarks tailored to practical scenarios
Job Responsibility
Job Responsibility
  • Technical Leadership & Ownership: set technical direction for major security domain initiatives
  • lead security model programs spanning pre‑training, task tuning, reinforcement learning, and evaluation
  • translate cutting‑edge research into production‑ready capabilities
  • Advanced Model Design – Building and customizing deep learning model architectures (e.g., modifying transformer blocks, attention/memory modules, etc.) at the SLM/LLM scale
  • making principled architectural tradeoffs to improve reliability, robustness, and security‑specific behavior
  • Advanced Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and other modalities, including time-series
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks
  • define objective evaluation frameworks and quality gates
  • run ablation studies to measure impact and optimize data and training effectiveness to support confident product decisions
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets, with attention to privacy, governance, and long‑term reuse across security scenarios
  • Fulltime
Read More
Arrow Right