CrawlJobs Logo

Applied AI Researcher, Benchmarking

distyl.ai Logo

Distyl AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

130000.00 - 250000.00 USD / Year

Job Description:

The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity. Their systems become the standard by which new architectures, techniques, and releases are judged. Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards.

Job Responsibility:

  • Design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact
  • Construct benchmarks that reflect real-world complexity
  • Explore new paradigms for evaluating intelligent systems (adversarial robustness testing, longitudinal performance tracking, human-in-the-loop assessment)
  • Investigate how metrics shape model behavior
  • Establish rigorous methodologies for quantifying emergent capability

Requirements:

  • Experience designing and running evaluations (built or maintained benchmarks, test suites, or experimental frameworks)
  • Statistical and analytical rigor (design fair, reproducible experiments)
  • Experience building with models, not just building models (expertise in compound AI systems, agentic collaboration, ensembling, ReAct, graph-of-thoughts)
  • Proven track record of research results (published in top journals or posted work online)
  • Uses AI every day (tools like ChatGPT, Cursor, Perplexity)
  • Strong programming and data analysis skills
  • Biases towards showing vs telling
What we offer:
  • 100% covered medical, dental, and vision for employees and dependents
  • 401(k) with additional perks (commuter benefits, in-office lunch)
  • Access to state-of-the-art models
  • Generous usage of modern AI tools
  • Ownership of high-impact projects across top enterprises
  • Meaningful equity

Additional Information:

Job Posted:
March 08, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Applied AI Researcher, Benchmarking

AI Research Engineer

We are looking for an AI Research Engineer to join the PAIR team and play a cent...
Location
Location
North Macedonia , Skopje
Salary
Salary:
Not provided
hornetsecurity.com Logo
Hornetsecurity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Applied AI research engineer with at least 3 years of experience in backend development or AI in production
  • Strong command of Go (microservices, REST/gRPC, high performance) and Python for AI/ML
  • Solid experience with cloud-native architectures: Docker, Kubernetes, CI/CD, observability, distributed systems, and real-time services
  • AI/ML/NLP skills: LLMs, embeddings, classification, text generation, model evaluation
  • Proven ability to design, optimize, and deploy scalable AI services in production
  • Scientific curiosity, autonomy, rigor, and strong teamwork skills
  • Excellent communication skills, documentation abilities, and the capacity to simplify complex topics
  • Professional fluency in English, both written and spoken
Job Responsibility
Job Responsibility
  • End-to-End Ownership of AI Solutions: Design, develop, and maintain AI services from prototype to production
  • Ensure robustness, performance, scalability, and operational reliability of solutions in industrial settings
  • Rigorous Experimentation & Applied Research: Methodically test and benchmark AI models (standards, metrics, comparisons)
  • Document results and propose innovative solutions tailored to cybersecurity challenges
  • Innovation & Technology Watch: Maintain active and structured monitoring of advances in AI/ML, LLMs, agents, NLP, as well as DevOps and MLOps best practices
  • Anticipate technological developments and contribute to the technical roadmap
  • Technical Leadership, Documentation & Collaboration: Be a key contributor to technical quality, knowledge sharing, and internal communication
  • Produce clear documentation and provide technical support to teams
What we offer
What we offer
  • Free space for innovation and independent action in a fast-growing international company
  • Short decision paths and flat hierarchies in an open work atmosphere
  • Extensive onboarding with a welcome kit, 2-day Onboarding Bootcamp, a Mentoring Program, and regular feedback meetings
  • Temporary Employee Exchange Program – we provide the ability for you to work at our global office locations and explore the world (e.g. Berlin, Madrid, Malta, Montréal)
  • Home-office-option (in a hybrid setting) and flexible working time
  • Team events like Laser Tag, Office Movie Nights, Foodie Fridays and much more
  • Fit Kit subscription and private insurance for your health
  • Referral Bonus – 1500€ for each successful referral
  • Fulltime
Read More
Arrow Right

Research Program Manager

We are seeking a Research Program Manager to build and scale global research pro...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Industrial Design, Product Design, Human Computer Interaction, User Experience, Interaction Design, or related field AND 4+ years experience working in product or service design
  • OR Bachelor's Degree in Industrial Design, Product Design, Human Computer Interaction, User Experience, Interaction Design, or related field AND 5+ years experience working in product or service design
  • OR equivalent experience (e.g., demonstrated experience working in product or service design or using design thinking to solve problems)
  • 3+ years of experience in research operations, user research, or related operational roles
  • Proven track record of building and scaling operational systems and processes
  • Solid project management skills with ability to manage multiple initiatives simultaneously
  • Experience with research tools and platforms (e.g., UserTesting, Qualtrics, participant recruitment tools, etc.), especially AI aspects of these tools
  • Experience leveraging data to inform decisions and drive change
  • 4+ years experience shipping products, services, or games and/or delivering to customers as a result of an end-to-end design process
Job Responsibility
Job Responsibility
  • Scale research programs and maximize impact across the organization
  • Design and launch programs that reduce process friction and enhance researcher impact and satisfaction
  • Design and manage operational systems for Experience Reviews, Experience Audits, and Customer Programs (Orion)
  • Create scheduling protocols and operational procedures to ensure research activities run smoothly and efficiently
  • Track research deliverables and team capacity to optimize resource allocation and workload management
  • Establish metrics and reporting mechanisms to demonstrate research program value and impact
  • Build and maintain research governance frameworks, including standard operating procedures around legal, procurement, and data privacy considerations
  • Establish project operations management processes and guidelines to ensure consistency and compliance
  • Partner with legal, procurement, and security teams to create streamlined approval processes for research activities
  • Develop and maintain documentation of research standards and best practices
  • Fulltime
Read More
Arrow Right

Research Engineer

As a Research Engineer at Mercor, you’ll work at the intersection of engineering...
Location
Location
United States , San Francisco
Salary
Salary:
130000.00 - 500000.00 USD / Year
mercor.com Logo
Mercor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong applied research background, with a focus on post-training and/or model evaluation
  • Strong coding proficiency and hands-on experience working with machine learning models
  • Strong understanding of data structures, algorithms, backend systems, and core engineering fundamentals
  • Familiarity with APIs, SQL/NoSQL databases, and cloud platforms
  • Ability to reason deeply about model behavior, experimental results, and data quality
  • Excitement to work in person in San Francisco, five days a week (with optional remote Saturdays), and thrive in a high-intensity, high-ownership environment
Job Responsibility
Job Responsibility
  • Work on post-training and RLVR pipelines to understand how datasets, rewards, and training strategies impact model performance
  • Design and run reward-shaping experiments and algorithmic improvements (e.g., GRPO, DAPO) to improve LLM tool-use, agentic behavior, and real-world reasoning
  • Quantify data usability, quality, and performance uplift on key benchmarks
  • Build and maintain data generation and augmentation pipelines that scale with training needs
  • Create and refine rubrics, evaluators, and scoring frameworks that guide training and evaluation decisions
  • Build and operate LLM evaluation systems, benchmarks, and metrics at scale
  • Collaborate closely with AI researchers, applied AI teams, and experts producing training data
  • Operate in a fast-paced, experimental research environment with rapid iteration cycles and high ownership
What we offer
What we offer
  • Generous equity grant vested over 4 years
  • A $20K relocation bonus (if moving to the Bay Area)
  • A $10K housing bonus (if you live within 0.5 miles of our office)
  • A $1K monthly stipend for meals
  • Free Equinox membership
  • Health insurance
  • Fulltime
Read More
Arrow Right

AI Research Engineer

We are looking for an AI Research Engineer to join the PAIR team and play a cent...
Location
Location
France , Hem
Salary
Salary:
Not provided
hornetsecurity.com Logo
Hornetsecurity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Applied AI research engineer with at least 3 years of experience in backend development or AI in production
  • Strong command of Go (microservices, REST/gRPC, high performance) and Python for AI/ML
  • Solid experience with cloud-native architectures: Docker, Kubernetes, CI/CD, observability, distributed systems, and real-time services
  • AI/ML/NLP skills: LLMs, embeddings, classification, text generation, model evaluation
  • Proven ability to design, optimize, and deploy scalable AI services in production
  • Scientific curiosity, autonomy, rigor, and strong teamwork skills
  • Excellent communication skills, documentation abilities, and the capacity to simplify complex topics
  • Professional fluency in English, both written and spoken
Job Responsibility
Job Responsibility
  • Design, develop, and maintain AI services from prototype to production
  • Ensure robustness, performance, scalability, and operational reliability of solutions in industrial settings
  • Methodically test and benchmark AI models (standards, metrics, comparisons)
  • Document results and propose innovative solutions tailored to cybersecurity challenges
  • Maintain active and structured monitoring of advances in AI/ML, LLMs, agents, NLP, as well as DevOps and MLOps best practices
  • Anticipate technological developments and contribute to the technical roadmap
  • Be a key contributor to technical quality, knowledge sharing, and internal communication
  • Produce clear documentation and provide technical support to teams
What we offer
What we offer
  • Flexible hybrid work arrangement
  • Meal vouchers: €10 per voucher (including €5.92 contribution from Hornetsecurity)
  • 100% coverage of public transportation costs
  • Health insurance & supplementary pension plan (Axa)
  • Sports and wellness benefits (subsidy provided)
  • International exchange program
  • Fulltime
Read More
Arrow Right

AI Research Engineer

We are looking for an AI Research Engineer to join the PAIR team and play a cent...
Location
Location
Germany , Hannover
Salary
Salary:
Not provided
hornetsecurity.com Logo
Hornetsecurity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Applied AI research engineer with at least 3 years of experience in backend development or AI in production
  • Strong command of Go (microservices, REST/gRPC, high performance) and Python for AI/ML
  • Solid experience with cloud-native architectures: Docker, Kubernetes, CI/CD, observability, distributed systems, and real-time services
  • AI/ML/NLP skills: LLMs, embeddings, classification, text generation, model evaluation
  • Proven ability to design, optimize, and deploy scalable AI services in production
  • Scientific curiosity, autonomy, rigor, and strong teamwork skills
  • Excellent communication skills, documentation abilities, and the capacity to simplify complex topics
  • Professional fluency in English, both written and spoken
Job Responsibility
Job Responsibility
  • Design, develop, and maintain AI services from prototype to production
  • Ensure robustness, performance, scalability, and operational reliability of solutions in industrial settings
  • Methodically test and benchmark AI models (standards, metrics, comparisons)
  • Document results and propose innovative solutions tailored to cybersecurity challenges
  • Maintain active and structured monitoring of advances in AI/ML, LLMs, agents, NLP, as well as DevOps and MLOps best practices
  • Anticipate technological developments and contribute to the technical roadmap
  • Be a key contributor to technical quality, knowledge sharing, and internal communication
  • Produce clear documentation and provide technical support to teams
What we offer
What we offer
  • Hybrid home-office options and flexible, trust-based working hours
  • Be part of a growing global company in one of the most dynamic industries — cybersecurity
  • Short decision-making paths and flat hierarchies within an open and collaborative work environment
  • Opportunities for personal and professional development
  • Be-Active Bonus — financial support for fitness and sports club memberships
  • Temporary Employee Exchange Program — opportunities to work at international office locations (e.g. Malta, Madrid, Montreal, Washington, D.C.)
  • Referral Bonus — €1,500 for each successful referral
Read More
Arrow Right

Research Engineer

We’re looking for a Research Engineer to build the intelligent systems that powe...
Location
Location
United States , New York
Salary
Salary:
200000.00 - 300000.00 USD / Year
antimetal.com Logo
Antimetal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in applied ML, research engineering, preferably at a company shipping production AI systems
  • Production experience contributing to agentic/LLM systems, including multi-step reasoning, reinforcement learning, fine-tuning, and orchestration
  • Proven experience bringing work from prototype to production, using data and experimentation to drive product and architectural decisions
  • Strong on ML fundamentals: statistical modeling, probabilistic methods, time-series analysis, evaluation methodology
  • Real world expertise in one area of applied ML: search, statistical modeling, NLP, etc
  • Experience constructing and running end-to-end evaluation pipelines with real world data
  • Proficient in Python and Typescript, with experience using common ML libraries and data engineering tools
  • Strong problem-solving skills, with a focus on creating highly maintainable, scalable code
  • Comfortable with ambiguity and iterative development, prototyping, and adapting quickly to feedback
Job Responsibility
Job Responsibility
  • Experiment, Evaluate, Iterate, Ship: Run experiments across our research areas, analyze results, validate what works, and take successful approaches to production
  • Build Evaluation Infrastructure: Partner with platform on live and offline evaluation pipelines, benchmarks, and synthetic data generation
  • Explore Research Directions: Apply and develop techniques from best-in-class AI Agents, ML, and SRE research to our problem domain
  • Collaborate Across Teams: Work with platform and product to integrate capabilities and productionize prototypes into scalable and reliable services
What we offer
What we offer
  • Pay & ownership — Competitive salary with generous equity grants
  • Full coverage + retirement — Fully covered health, dental, and vision, plus retirement benefits
  • Unlimited PTO — Take the time you need to recharge
  • Dinner on late nights — Working late? Dinner is on us
  • Fitness stipend — Monthly support for your health and wellness
  • Tools of the trade — Any equipment you need to do your best work
  • Commute perks — Citi Bike + train benefits
  • Fulltime
Read More
Arrow Right

PhD Student Long-term Personalization - Agentic AI

As part of your PhD, your focus will be on long-term personalization for vehicle...
Location
Location
Germany , Wolfsburg
Salary
Salary:
Not provided
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Good to very good university degree qualifying for doctoral studies in mathematics, physics, computer science, data science, electrical engineering or a related field
  • Advanced proficiency in coding, including experience with data processing, optimization techniques, and software engineering best practices
  • Deep understanding of AI/ML concepts, including generative models, agentic architectures, and multi‑modal systems
  • Experience with applied AI research, such as model training/fine‑tuning, benchmarking, or developing experimental prototypes
  • Strong analytical and problem‑solving abilities, ideally demonstrated through prior thesis work, publications, or research projects
  • German language level A2 and English language level C1
Job Responsibility
Job Responsibility
  • Develop, analyze, and evaluate long‑term personalization mechanisms that adapt to changing customer behavior, vehicle context, and environmental factors
  • Investigate methods for integrating heterogeneous input types (including behavior, car settings, explicit preferences, and direct customer request) into cohesive and adaptive personalization strategies
  • Design user‑transparent and user‑controllable personalization interfaces, enabling customers to review, adjust, or delete personalization data in a trustworthy and predictable manner
  • Research long‑term stability and consistency, including validation methodologies, and mechanisms for maintaining reliable personalization over extended vehicle usage
  • Identify and evaluate responsible‑AI guardrails to ensure safety, predictability, and compliance when handling long‑term personalized data
What we offer
What we offer
  • Attractive salary & 30 vacation days (+ 24.12. and 31.12. off)
  • 35-hour week, flexible working hours, remote work
  • Special conditions for the purchase and leasing of vehicles
  • Free seminars on scientific work and interdisciplinary qualifications
  • Participation in the doctoral network for scientific exchange with science representatives and other doctoral candidates within the Volkswagen Group
  • Fulltime
Read More
Arrow Right

AI Research Scientist, Robotics

The ideal Research Scientist candidate will use their skills in system design an...
Location
Location
United States , Redmond
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Currently has or is in the process of obtaining a PhD degree in the field of Artificial Intelligence, Robotics, Computer Vision, Machine Learning, Language, a related field, or equivalent practical experience
  • Experience with any of the following research areas: robotics, motion planning, embodied AI, human-robot interaction, sim-to-real transfer, learning from demonstration, reinforcement learning, dexterous manipulation, digital agents, vision language models, computer vision, egocentric perception, and/or LLMs
  • Experience in relevant robotics related research areas, such as: VLM, robot learning, reinforcement learning, imitation learning, action-conditioned world models, task and motion planning, sim-to-real transfer robotic control, manipulation, navigation, or generally embodied AI
Job Responsibility
Job Responsibility
  • Perform fundamental and applied research to push the scientific and technological frontiers of embodied artificial intelligence
  • Invent/improve novel data-driven paradigms for robotics, leveraging a variety of modalities (images, video, text, audio, tactile, etc)
  • Investigate paradigms that can deliver a spectrum of embodied behaviors - from simulated characters to real robots, and from short-horizon, low-level to long-horizon, high-level intelligence
  • Develop algorithms based on state-of-the-art machine learning and neural network methodologies
  • Define, build and benchmark new functionalities needed for the next generation of AI
  • Conduct research towards long-term product goals while identifying intermediate milestones
  • Plan and execute novel research based on long-term objectives of the organization
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right