CrawlJobs Logo

Applied AI Researcher, Benchmarking

United States, San Francisco 130000.00 - 250000.00 USD / Year · Job Posted March 08, 2026
Apply Position
Job Link Share

Job Description

The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity. Their systems become the standard by which new architectures, techniques, and releases are judged. Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards.

Job Responsibility

  • Design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact
  • Construct benchmarks that reflect real-world complexity
  • Explore new paradigms for evaluating intelligent systems (adversarial robustness testing, longitudinal performance tracking, human-in-the-loop assessment)
  • Investigate how metrics shape model behavior
  • Establish rigorous methodologies for quantifying emergent capability

Requirements

  • Experience designing and running evaluations (built or maintained benchmarks, test suites, or experimental frameworks)
  • Statistical and analytical rigor (design fair, reproducible experiments)
  • Experience building with models, not just building models (expertise in compound AI systems, agentic collaboration, ensembling, ReAct, graph-of-thoughts)
  • Proven track record of research results (published in top journals or posted work online)
  • Uses AI every day (tools like ChatGPT, Cursor, Perplexity)
  • Strong programming and data analysis skills
  • Biases towards showing vs telling

What we offer

  • 100% covered medical, dental, and vision for employees and dependents
  • 401(k) with additional perks (commuter benefits, in-office lunch)
  • Access to state-of-the-art models
  • Generous usage of modern AI tools
  • Ownership of high-impact projects across top enterprises
  • Meaningful equity

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Applied AI Researcher, Benchmarking

8 matching positions

Data Scientist - Applied AI Research

The Data Scientist - Applied AI Research will ideate, design, and develop NLP ac...
Location
Location
United States , Westlake
Salary
Salary:
Not provided
fidelity.com Logo
Fidelity Investments
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1+ years of Data Science experience, specializing with NLP
  • 1+ years of generative AI experience including LLMs, Agents, MCPs, etc
  • 1+ years of experience working in an Agile environment
  • 1+ years of experience with AWS products in a Linux environment
  • Experience developing and optimizing solutions from transformer-based, fastText-based or ensemble based models
  • Experience with Pytorch
  • Understanding of text representation techniques and classification algorithms
  • Deep understanding of experiment design and documentation
  • Statistical acumen and experience applying statistical concepts to data science experiments
  • Deep knowledge of machine learning algorithms, with the ability to choose the optimal algorithm for a given problem
Job Responsibility
Job Responsibility
  • Work closely with the Agile team members to bring ML solutions into the product
  • Benchmark and optimize existing ML solutions performance (e.g, model footprint or latency)
  • Deliver reports on a sprint cadence
  • Peer review code and reports written by teammates
  • Bring good ideas during brainstorming sessions
  • Fulltime
Read More
Arrow Right

Applied AI Engineer

Security represents the most critical priorities for our customers in a world aw...
Location
Location
United States , Multiple Locations
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 2+ years related experience (e.g., statistics, predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft background and Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • 3+ years technical engineering experience with coding in languages including C#, Java AND Python
  • 2+ years of experience with LLMs and open-source GenAI frameworks, such as LangChain, LlamaIndex, Haystack, or equivalents (e.g., Transformers, AutoGen, DSPy), including agent-based orchestration, prompt engineering, retrieval-augmented generation (RAG), and fine-tuning and evaluation
  • 2+ years experience in shipping at least 2 large scale ML/AI-based services or applications on cloud platforms (Azure, AWS, GCP, etc.)
  • Proficiency in writing production-quality software code in one or more modern programming languages (Python, C#)
  • 2+ years experience developing software systems end-to-end, from design to implementation
Job Responsibility
Job Responsibility
  • Design, develop, and deploy end-to-end AI/ML systems, including data ingestion, model training, evaluation, and integration into production environments
  • Build and optimize applications leveraging LLMs and open-source GenAI frameworks such as LangChain, LlamaIndex, Haystack, Transformers, AutoGen, and DSPy
  • Implement advanced GenAI techniques including agent-based orchestration, prompt engineering, retrieval-augmented generation (RAG), and model fine-tuning
  • Write production-grade software in Python and C# or Java, ensuring maintainability, scalability, and performance
  • Collaborate with cross-functional teams to translate business requirements into technical solutions
  • Ship and maintain large-scale AI applications, with a focus on performance monitoring and continuous improvement
  • Conduct rigorous evaluation of AI models using appropriate metrics and benchmarks
  • Optimize models for latency, throughput, and accuracy in real-world scenarios
  • Work closely with data scientists, product managers, and other engineers to drive AI initiatives
  • Stay current with the latest advancements in GenAI, LLMs, and AI frameworks
  • Fulltime
Read More
Arrow Right

User Experience Researcher - AI

We are looking for a researcher to help shape next-gen compliance systems, exper...
Location
Location
United States , Menlo Park
Salary
Salary:
266000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 13+ years of relevant experience in user experience, applied research and/or product research and development or a Master’s degree and 11+ years relevant experience, or PhD and 8+ years relevant experience
  • Experience driving research direction and serving as a thought partner to leadership
  • Experience in working with Design, Data, Product, and Engineering to empower product development through foundational research and AI evaluation
  • Experience running or designing AI model evals from a UX perspective
  • Experience with code/scripting to prototype research tools or analyze data programmatically
Job Responsibility
Job Responsibility
  • Thought partner to Risk organizational leadership on research direction
  • Determine foundational questions for the orginization with a holistic view
  • Navigate ambiguity—shape direction in AI pods with minimal context
  • Act as direction lead or "unblocker" in fast-moving environments
  • Run evals and measurement (large scale benchmarking, evals, risk flywheel measurement)
  • Work with engineering and cross functional partners on evals to inform how we can improve models
  • Inform designing and architecting AI products from the start—not just evaluating after the fact
  • Develop new methods and innovate new approaches (stimuli generation, upper funnel work, etc.)
  • Apply systems thinking to connect the dots holistically across the organization
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Engineer, Applied AI

As an Applied AI Engineer at Norm Ai, you will design, build, and iterate on ent...
Location
Location
United States , New York City
Salary
Salary:
220000.00 - 285000.00 USD / Year
norm.ai Logo
Norm AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience as a software engineer, ML engineer, or data scientist
  • Strong programming skills in Python, proficiency in Docker and data storage technologies such as Postgres and Redis
  • Willingness to work with frontend technologies (Typescript, React, etc.) to bridge the gap between research ideas and the productization of new initiatives
  • Experience in AI/ML infrastructure and tooling
  • Interest in becoming a legal and compliance domain expert and working closely with professionals in the field
  • Strong reading, writing, and quantitative skills, as well as a data-driven approach to problem solving
Job Responsibility
Job Responsibility
  • Design, build, and iterate on enterprise-grade AI agents for end-to-end completion of high-stakes legal and compliance workflows
  • Regularly contribute performant, high-quality code to our production codebase
  • Oversee data collection, experimentation, and data analysis to continuously improve the platform
  • Build comprehensive benchmarks and evaluation suites to measure agent performance
  • Develop new AI workflows and systems that make cutting-edge AI available across our platform
  • Architect the system of underlying prompts that power our agents
  • Enable continuous learning in our agents through memory, feedback, and client-specific preferences
  • Develop legal and compliance domain expertise
What we offer
What we offer
  • Employee equity
  • 401(k) plan with an employer match
  • Top-tier insurance coverage, encompassing health, dental, hospital, accident, and vision plans
  • Relocation reimbursement for candidates needing to relocate to NYC
  • Fast-paced learning environment where professional growth is constant
  • Fulltime
Read More
Arrow Right

AI Researcher

Silicon Valley’s top AI companies work with Mercor to find domain experts who ca...
Location
Location
United States , San Francisco
Salary
Salary:
200000.00 - 300000.00 USD / Year
mercor.com Logo
Mercor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD or M.S. and 2+ years of work experience in a computer science, electrical engineering, econometrics, or another STEM field that provides a solid understanding of ML and model evaluation
  • Strong publication record in AI research, ideally in LLM evaluation. Dataset and evaluation papers are preferred
  • Strong understanding of LLMs and the data on which they are trained and evaluated against
  • Strong communication skills and ability to present findings clearly and concisely
  • Familiarity with data annotation workflows
  • Good understanding of statistics
  • Willingness to work 6 days a week, with monday-friday in person in San Francisco
Job Responsibility
Job Responsibility
  • Build benchmarks that measure real world value of AI models
  • Publish LLM evaluation papers in top conferences with the support of the Mercor Applied AI and Operations teams
  • Push the frontier of understanding data ROI in model development including multi-modality, code, tool-use, and more
  • Design and validate novel data collection and annotation offerings for the leading industry labs and big tech companies
What we offer
What we offer
  • Offers Equity
  • A $20K relocation bonus (if moving to the Bay Area)
  • A $10K housing bonus (if you live within 0.5 miles of our office)
  • A $1K monthly stipend for meals
  • Free Equinox membership
  • Health insurance
  • Fulltime
Read More
Arrow Right

Applied AI Software Engineer

As an applied AI software engineer on our team, you'll be working on researching...
Location
Location
Canada , Toronto
Salary
Salary:
120000.00 - 200000.00 CAD / Year
FOSSA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working on greenfield projects as part of small teams
  • Experience building and deploying software in high-velocity environments
  • Experience building and supporting scalable SaaS products and features
  • Experience with relational databases and writing performant SQL queries
  • Experience interacting with LLM APIs as a service
  • Enjoy combining the latest technologies and tools with best practices to accelerate development
  • Passionate about getting creative with AI to work around its shortcomings in creating automated solutions to complex problems
  • Interest in the art of programming languages (design, compilers and syntax parsing, etc)
  • Comfortable navigating complex domains and building intuitive software for them
  • Thrive in an environment that prefers prototypes over proposals
Job Responsibility
Job Responsibility
  • Work closely with our R&D business unit to research, plan, design, prototype, and build the future of FOSSA
  • Own user-impacting functionality from conception to completion
  • Build and scale the most valuable prototypes into production
  • Have significant ownership in our technical architecture and R&D roadmap
  • Primarily work in Typescript, but touch a variety of codebases in other languages, such as Rust, Haskell, or Go
  • Contribute to structured analysis of various languages (Java, Python, others) our customers use
  • Experiment with LLMs and context engineering and benchmark the results to ensure we accomplish maximal value and consistency from them
What we offer
What we offer
  • Amazing team culture and environment
  • Named by Built In as Best Start-up to work for 2024, 2025 and Forbes 2022
  • Competitive salary and equity package
  • Unlimited PTO
  • Fulltime
Read More
Arrow Right

Applied AI Engineer

You'll build the AI agent capabilities that power Sapien's autonomous finance op...
Location
Location
United States , NYC
Salary
Salary:
Not provided
sapien.ai Logo
Sapien
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong algorithmic thinking
  • Experience with modern agent frameworks, LLMs, and AI systems: fine-tuning, retrieval augmentation, tool use, or agentic architectures
  • Comfort working end-to-end: from implementing research ideas and prototyping architectures to deploying production systems and iterating on real customer feedback
Job Responsibility
Job Responsibility
  • Design and implement agent architectures that enable observability, human-in-the-loop verification, and precise context control across complex financial workflows
  • Build library learning systems that reduce LLM dependencies by learning reusable patterns for planning, code generation, and data localization from customer interactions
  • Create graph-based company representations and develop efficient search methods using embeddings, semantic clustering, and custom retrieval strategies
  • Build multi-modal parsers that unify diverse financial data sources (Excel, ERPs, CRMs) into coherent, queryable schemas that agents can reason over
  • Design benchmarking and evaluation suites that quantify Sapien's accuracy, reliability, and business impact across different customer workflows
  • Fulltime
Read More
Arrow Right

Applied Legal Researcher

At Harvey, we’re transforming how legal and professional services operate — not ...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 220000.00 USD / Year
harvey.ai Logo
Harvey
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of experience at a law firm, financial institution, or equivalent professional service provider
  • Demonstrated ability to deliver high quality work product on demanding deadlines
  • Ability to effectively communicate with a variety internal and external stakeholders and translate complex problems effectively between them
  • Ability to define positive outcomes in situations with underspecified success criteria
  • Deep intellectual curiosity and eagerness to learn across domains
  • Willingness and desire to do work in the trenches - e.g., grading hundreds of answers, breaking down thousands of documents
Job Responsibility
Job Responsibility
  • Develop and deliver subject-matter expertise to support AI research
  • Work closely with our engineering, product, and design teams to define and develop AI systems
  • Build and improve AI systems through prompt engineering, fine tuning, and other techniques
  • Build proprietary benchmarks and datasets to evaluate models and model systems
  • Partner directly with clients to understand their workflows, identify pain points, and translate complex business and legal requirements into technical solutions
What we offer
What we offer
  • Offers Equity
  • Comprehensive health, dental and vision coverage
  • retirement benefits (401k match up to 4%)
  • flexible PTO
  • Fulltime
Read More
Arrow Right