CrawlJobs Logo

Applied AI Researcher, Benchmarking

Distyl AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

130000.00 - 250000.00 USD / Year

Job Description:

The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity. Their systems become the standard by which new architectures, techniques, and releases are judged. Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards.

Job Responsibility:

  • Design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact
  • Construct benchmarks that reflect real-world complexity
  • Explore new paradigms for evaluating intelligent systems (adversarial robustness testing, longitudinal performance tracking, human-in-the-loop assessment)
  • Investigate how metrics shape model behavior
  • Establish rigorous methodologies for quantifying emergent capability

Requirements:

  • Experience designing and running evaluations (built or maintained benchmarks, test suites, or experimental frameworks)
  • Statistical and analytical rigor (design fair, reproducible experiments)
  • Experience building with models, not just building models (expertise in compound AI systems, agentic collaboration, ensembling, ReAct, graph-of-thoughts)
  • Proven track record of research results (published in top journals or posted work online)
  • Uses AI every day (tools like ChatGPT, Cursor, Perplexity)
  • Strong programming and data analysis skills
  • Biases towards showing vs telling
What we offer:
  • 100% covered medical, dental, and vision for employees and dependents
  • 401(k) with additional perks (commuter benefits, in-office lunch)
  • Access to state-of-the-art models
  • Generous usage of modern AI tools
  • Ownership of high-impact projects across top enterprises
  • Meaningful equity

Additional Information:

Job Posted:
March 08, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Applied AI Researcher, Benchmarking

Research Engineer

As a Research Engineer at Mercor, you’ll work at the intersection of engineering...
Location
Location
United States , San Francisco
Salary
Salary:
130000.00 - 500000.00 USD / Year
mercor.com Logo
Mercor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong applied research background, with a focus on post-training and/or model evaluation
  • Strong coding proficiency and hands-on experience working with machine learning models
  • Strong understanding of data structures, algorithms, backend systems, and core engineering fundamentals
  • Familiarity with APIs, SQL/NoSQL databases, and cloud platforms
  • Ability to reason deeply about model behavior, experimental results, and data quality
  • Excitement to work in person in San Francisco, five days a week (with optional remote Saturdays), and thrive in a high-intensity, high-ownership environment
Job Responsibility
Job Responsibility
  • Work on post-training and RLVR pipelines to understand how datasets, rewards, and training strategies impact model performance
  • Design and run reward-shaping experiments and algorithmic improvements (e.g., GRPO, DAPO) to improve LLM tool-use, agentic behavior, and real-world reasoning
  • Quantify data usability, quality, and performance uplift on key benchmarks
  • Build and maintain data generation and augmentation pipelines that scale with training needs
  • Create and refine rubrics, evaluators, and scoring frameworks that guide training and evaluation decisions
  • Build and operate LLM evaluation systems, benchmarks, and metrics at scale
  • Collaborate closely with AI researchers, applied AI teams, and experts producing training data
  • Operate in a fast-paced, experimental research environment with rapid iteration cycles and high ownership
What we offer
What we offer
  • Generous equity grant vested over 4 years
  • A $20K relocation bonus (if moving to the Bay Area)
  • A $10K housing bonus (if you live within 0.5 miles of our office)
  • A $1K monthly stipend for meals
  • Free Equinox membership
  • Health insurance
  • Fulltime
Read More
Arrow Right

AI Research Engineer

We are looking for an AI Research Engineer to join the PAIR team and play a cent...
Location
Location
France , Hem
Salary
Salary:
Not provided
hornetsecurity.com Logo
Hornetsecurity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Applied AI research engineer with at least 3 years of experience in backend development or AI in production
  • Strong command of Go (microservices, REST/gRPC, high performance) and Python for AI/ML
  • Solid experience with cloud-native architectures: Docker, Kubernetes, CI/CD, observability, distributed systems, and real-time services
  • AI/ML/NLP skills: LLMs, embeddings, classification, text generation, model evaluation
  • Proven ability to design, optimize, and deploy scalable AI services in production
  • Scientific curiosity, autonomy, rigor, and strong teamwork skills
  • Excellent communication skills, documentation abilities, and the capacity to simplify complex topics
  • Professional fluency in English, both written and spoken
Job Responsibility
Job Responsibility
  • Design, develop, and maintain AI services from prototype to production
  • Ensure robustness, performance, scalability, and operational reliability of solutions in industrial settings
  • Methodically test and benchmark AI models (standards, metrics, comparisons)
  • Document results and propose innovative solutions tailored to cybersecurity challenges
  • Maintain active and structured monitoring of advances in AI/ML, LLMs, agents, NLP, as well as DevOps and MLOps best practices
  • Anticipate technological developments and contribute to the technical roadmap
  • Be a key contributor to technical quality, knowledge sharing, and internal communication
  • Produce clear documentation and provide technical support to teams
What we offer
What we offer
  • Flexible hybrid work arrangement
  • Meal vouchers: €10 per voucher (including €5.92 contribution from Hornetsecurity)
  • 100% coverage of public transportation costs
  • Health insurance & supplementary pension plan (Axa)
  • Sports and wellness benefits (subsidy provided)
  • International exchange program
  • Fulltime
Read More
Arrow Right

AI Research Engineer

We are looking for an AI Research Engineer to join the PAIR team and play a cent...
Location
Location
Germany , Hannover
Salary
Salary:
Not provided
hornetsecurity.com Logo
Hornetsecurity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Applied AI research engineer with at least 3 years of experience in backend development or AI in production
  • Strong command of Go (microservices, REST/gRPC, high performance) and Python for AI/ML
  • Solid experience with cloud-native architectures: Docker, Kubernetes, CI/CD, observability, distributed systems, and real-time services
  • AI/ML/NLP skills: LLMs, embeddings, classification, text generation, model evaluation
  • Proven ability to design, optimize, and deploy scalable AI services in production
  • Scientific curiosity, autonomy, rigor, and strong teamwork skills
  • Excellent communication skills, documentation abilities, and the capacity to simplify complex topics
  • Professional fluency in English, both written and spoken
Job Responsibility
Job Responsibility
  • Design, develop, and maintain AI services from prototype to production
  • Ensure robustness, performance, scalability, and operational reliability of solutions in industrial settings
  • Methodically test and benchmark AI models (standards, metrics, comparisons)
  • Document results and propose innovative solutions tailored to cybersecurity challenges
  • Maintain active and structured monitoring of advances in AI/ML, LLMs, agents, NLP, as well as DevOps and MLOps best practices
  • Anticipate technological developments and contribute to the technical roadmap
  • Be a key contributor to technical quality, knowledge sharing, and internal communication
  • Produce clear documentation and provide technical support to teams
What we offer
What we offer
  • Hybrid home-office options and flexible, trust-based working hours
  • Be part of a growing global company in one of the most dynamic industries — cybersecurity
  • Short decision-making paths and flat hierarchies within an open and collaborative work environment
  • Opportunities for personal and professional development
  • Be-Active Bonus — financial support for fitness and sports club memberships
  • Temporary Employee Exchange Program — opportunities to work at international office locations (e.g. Malta, Madrid, Montreal, Washington, D.C.)
  • Referral Bonus — €1,500 for each successful referral
Read More
Arrow Right

Research Engineer

We’re looking for a Research Engineer to build the intelligent systems that powe...
Location
Location
United States , New York
Salary
Salary:
200000.00 - 300000.00 USD / Year
antimetal.com Logo
Antimetal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in applied ML, research engineering, preferably at a company shipping production AI systems
  • Production experience contributing to agentic/LLM systems, including multi-step reasoning, reinforcement learning, fine-tuning, and orchestration
  • Proven experience bringing work from prototype to production, using data and experimentation to drive product and architectural decisions
  • Strong on ML fundamentals: statistical modeling, probabilistic methods, time-series analysis, evaluation methodology
  • Real world expertise in one area of applied ML: search, statistical modeling, NLP, etc
  • Experience constructing and running end-to-end evaluation pipelines with real world data
  • Proficient in Python and Typescript, with experience using common ML libraries and data engineering tools
  • Strong problem-solving skills, with a focus on creating highly maintainable, scalable code
  • Comfortable with ambiguity and iterative development, prototyping, and adapting quickly to feedback
Job Responsibility
Job Responsibility
  • Experiment, Evaluate, Iterate, Ship: Run experiments across our research areas, analyze results, validate what works, and take successful approaches to production
  • Build Evaluation Infrastructure: Partner with platform on live and offline evaluation pipelines, benchmarks, and synthetic data generation
  • Explore Research Directions: Apply and develop techniques from best-in-class AI Agents, ML, and SRE research to our problem domain
  • Collaborate Across Teams: Work with platform and product to integrate capabilities and productionize prototypes into scalable and reliable services
What we offer
What we offer
  • Pay & ownership — Competitive salary with generous equity grants
  • Full coverage + retirement — Fully covered health, dental, and vision, plus retirement benefits
  • Unlimited PTO — Take the time you need to recharge
  • Dinner on late nights — Working late? Dinner is on us
  • Fitness stipend — Monthly support for your health and wellness
  • Tools of the trade — Any equipment you need to do your best work
  • Commute perks — Citi Bike + train benefits
  • Fulltime
Read More
Arrow Right

AI Research Scientist, Robotics

The ideal Research Scientist candidate will use their skills in system design an...
Location
Location
United States , Redmond
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Currently has or is in the process of obtaining a PhD degree in the field of Artificial Intelligence, Robotics, Computer Vision, Machine Learning, Language, a related field, or equivalent practical experience
  • Experience with any of the following research areas: robotics, motion planning, embodied AI, human-robot interaction, sim-to-real transfer, learning from demonstration, reinforcement learning, dexterous manipulation, digital agents, vision language models, computer vision, egocentric perception, and/or LLMs
  • Experience in relevant robotics related research areas, such as: VLM, robot learning, reinforcement learning, imitation learning, action-conditioned world models, task and motion planning, sim-to-real transfer robotic control, manipulation, navigation, or generally embodied AI
Job Responsibility
Job Responsibility
  • Perform fundamental and applied research to push the scientific and technological frontiers of embodied artificial intelligence
  • Invent/improve novel data-driven paradigms for robotics, leveraging a variety of modalities (images, video, text, audio, tactile, etc)
  • Investigate paradigms that can deliver a spectrum of embodied behaviors - from simulated characters to real robots, and from short-horizon, low-level to long-horizon, high-level intelligence
  • Develop algorithms based on state-of-the-art machine learning and neural network methodologies
  • Define, build and benchmark new functionalities needed for the next generation of AI
  • Conduct research towards long-term product goals while identifying intermediate milestones
  • Plan and execute novel research based on long-term objectives of the organization
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

User Experience Researcher - AI

We are looking for a researcher to help shape next-gen compliance systems, exper...
Location
Location
United States , Menlo Park
Salary
Salary:
266000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 13+ years of relevant experience in user experience, applied research and/or product research and development or a Master’s degree and 11+ years relevant experience, or PhD and 8+ years relevant experience
  • Experience driving research direction and serving as a thought partner to leadership
  • Experience in working with Design, Data, Product, and Engineering to empower product development through foundational research and AI evaluation
  • Experience running or designing AI model evals from a UX perspective
  • Experience with code/scripting to prototype research tools or analyze data programmatically
Job Responsibility
Job Responsibility
  • Thought partner to Risk organizational leadership on research direction
  • Determine foundational questions for the orginization with a holistic view
  • Navigate ambiguity—shape direction in AI pods with minimal context
  • Act as direction lead or "unblocker" in fast-moving environments
  • Run evals and measurement (large scale benchmarking, evals, risk flywheel measurement)
  • Work with engineering and cross functional partners on evals to inform how we can improve models
  • Inform designing and architecting AI products from the start—not just evaluating after the fact
  • Develop new methods and innovate new approaches (stimuli generation, upper funnel work, etc.)
  • Apply systems thinking to connect the dots holistically across the organization
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

AI Research Scientist, Robotics

At Meta, we’re building the future of human connection and the technology that e...
Location
Location
United States , Redmond
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD degree in the field of Artificial Intelligence, Robotics, Computer Vision, Machine Learning, Language, a related field, or equivalent practical experience
  • Experience with any of the following research areas: robotics, motion planning, embodied AI, human-robot interaction, sim-to-real transfer, learning from demonstration, reinforcement learning, dexterous manipulation, digital agents, vision language models, computer vision, egocentric perception, and/or Large Language Models
  • 5+ years of industry experience in relevant robotics related research areas, such as: Vision Language Models robot learning, reinforcement learning, imitation learning, action-conditioned world models, task and motion planning, sim-to-real transfer robotic control, manipulation, navigation, or generally embodied AI
Job Responsibility
Job Responsibility
  • Perform fundamental and applied research to push the scientific and technological frontiers of embodied artificial intelligence
  • Invent/improve novel data-driven paradigms for robotics, leveraging a variety of modalities (images, video, text, audio, tactile, etc.)
  • Investigate paradigms that can deliver a spectrum of embodied behaviors - from simulated characters to real robots, and from short-horizon, low-level to long-horizon, high-level intelligence
  • Develop algorithms based on state-of-the-art machine learning and neural network methodologies
  • Define, build and benchmark new functionality needed for the next generation of AI
  • Conduct research towards long-term product goals while identifying intermediate milestones
  • Lead, plan, and execute novel research based on long-term objectives of the organization
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Researcher in Agentic AI Systems & Infrastructure

At MSR Cambridge we are shaping the future of AI infrastructure by tackling ambi...
Location
Location
United Kingdom , Cambridge
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD (or near completion) in Computer Science, Machine Learning, Electrical Engineering, or a related field
  • Strong background in ML-systems co-design, AI inference systems, or machine learning systems
  • Demonstrated ability to conduct independent, high-impact research, evidenced by publications, open-source systems, or deployed artifacts
  • Ability to work effectively in collaborative, cross-disciplinary research teams
Job Responsibility
Job Responsibility
  • Conduct original research on the design, architecture, and optimization of agentic AI systems, focusing on memory, communication, and orchestration
  • Prototype new components for multiagent inference with system-level optimizations (e.g. shared latent memory/KV-cache, agent-level parallelism) using relevant framework tools and inference backends like vLLM and SGLang
  • Explore ML & systems codesign opportunities, such as aligning model capabilities with systems constraints, hardware characteristics, and orchestration strategies, and using Pytorch and other relevant tools of LLM fine-tuning on GPU clusters
  • Evaluate proposed ideas through real-system experiments, large-scale benchmark evaluation, and empirical studies on real workloads
  • Work closely with a multidisciplinary team to address both fundamental and applied research challenges
  • Communicate results clearly, sharing insights with the wider team and partner groups
  • Contribute to an open, multidisciplinary research environment
  • Fulltime
Read More
Arrow Right