CrawlJobs Logo

Senior Research Engineer, Model Evaluation

United States; Canada; United Kingdom, Toronto · Job Posted February 20, 2026
Apply Position
Job Link Share

Job Description

Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop new techniques to accurately measure our models' performance on frontier capabilities. In this role, you are responsible for creating next-generation evaluation methods and scalable infrastructure to measure LLM progress.

Job Responsibility

  • Develop evaluation benchmarks, datasets, and environments for measuring the bleeding edge of model capabilities
  • Conduct research to push the state-of-the-art in LLM evaluation methods, including training LLM judges
  • improving evaluation efficiency
  • and scalably building high-quality datasets
  • Build scalable tools for investigating and understanding evaluation results that are used by all members of technical staff at Cohere, as well as leadership and our CEO
  • Learn from and work with the best researchers and engineers in the field

Requirements

  • You enjoy pushing the limits of what LLMs are capable of, and you have built high-quality evaluation resources to measure those capabilities (datasets, simulators, environments, etc.)
  • You have a track record of developing new methods and/or data to evaluate LLMs, e.g. publications at top-tier conferences, popular benchmarks, etc.
  • You have deep experience building with and around LLMs, and you have built tools for analyzing and understanding their performance
  • You have strong software engineering skills

What we offer

  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Research Engineer, Model Evaluation

8 matching positions

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI is building the fastest, most capable open-source-aligned LLMs and i...
Location
Location
United States , San Francisco
Salary
Salary:
220000.00 - 270000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
Job Responsibility
Job Responsibility
  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • Fulltime
Read More
Arrow Right

Senior Research Scientist, Model Evaluation

Evaluation is critical to making progress in scaling intelligence. As models con...
Location
Location
United States; Canada; United Kingdom , Toronto; New York; Seattle; San Francisco; London; Paris
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Enjoy rapidly building prototypes that demonstrate the boundaries of what LLMs are capable of
  • Have developed resources to measure LLM capabilities
  • Have spent dozens of hours reviewing complex data and LLM outputs to ensure high data quality
  • Obsessive about rigorously measuring AI capabilities and ensuring measurements align with the capabilities you care about
  • Have strong software engineering skills
Job Responsibility
Job Responsibility
  • Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish
  • Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations
  • Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges
  • refining LLM-based data synthesis pipelines
  • and improving evaluation efficiency
  • Build scalable and reusable tools for digging into model performance
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Proficiency in Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Identify and resolve model quality gaps, latency issues, and scale bottlenecks using PyTorch, or TensorFlow
  • Operate CI/CD and MLOps workflows including model versioning, retraining, evaluation, and monitoring
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

As a Senior Research Engineer at Microsoft, you will help advance Microsoft’s mi...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Proficiency in Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Build AI-First Contact Center Experiences
  • Bringing State-of-the-Art Research to Products
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • Partner with product teams to improve customer and agent outcomes
  • Fulltime
Read More
Arrow Right

Senior Research Engineer - Dynamics 365 Contact Center

We are looking for a Senior Research Engineer to join our team. As a Senior Rese...
Location
Location
Czech Republic , Prague
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Experience developing and deploying live production systems, as part of a product team
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Build collaborative relationships with product and business groups to deliver AI-driven impact
  • Research and implement state-of-the-art using foundation models, prompt engineering, RAG, graphs, multi-agent architectures, as well as classical machine learning techniques
  • Fine-tune foundation models using domain-specific datasets
  • Evaluate model behavior on relevance, bias, hallucination, and response quality via offline evaluations, shadow experiments, online experiments, and ROI analysis
  • Build rapid AI solution prototypes, contribute to production deployment of these solutions, debug production code, support MLOps/AIOps
  • Contribute to papers, patents, and conference presentations
  • Translate research into production-ready solutions and measure their impact through A/B testing and telemetry that address customer needs
  • Ability to use data to identify gaps in AI quality, uncover insights and implement PoCs to show proof of concepts
  • Demonstrate deep expertise in AI subfields (e.g., deep learning, Generative AI, NLP, muti-modal models) to translate cutting-edge research into practical, real-world solutions that drive product innovation and business impact
  • Share insights on industry trends and applied technologies with engineering and product teams
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, Physics, or a related field and 4 or more years in applied ML or AI research and product engineering
  • OR Master’s degree and 3 or more years in applied ML or AI research and product engineering
  • OR PhD in a relevant field and 2 or more years with generative AI, LLMs, or related ML algorithms
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Bringing State-of-the-Art Research to Products
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • End-to-End System Development
  • ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 3+ years of experience using Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Bringing State-of-the-Art Research to Products: Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • End-to-End System Development: ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Identify and resolve model quality gaps, latency issues, and scale bottlenecks using PyTorch, or TensorFlow
  • Operate CI/CD and MLOps workflows including model versioning, retraining, evaluation, and monitoring
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

As a Research Engineer at Microsoft, you will set the technical vision and lead ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Proven track record leading large-scale AI systems and cross-org initiatives that shipped
  • Solid software engineering foundations and hands-on depth in Python plus deep-learning frameworks (PyTorch/ TensorFlow) and modern MLOps/tooling
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Architect and deliver complex AI systems across model development, data, infra, evaluation, and deployment spanning multiple product lines
  • Set technical direction for large programs
  • drive alignment across Research, Engineering, and Product
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Integrate LLMs, multimodal models, multi-agent architectures, and RAG into Microsoft’s ecosystem
  • Establish best practices for MLOps, governance, and Responsible AI, compliant with Microsoft principles and industry standards
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • Fulltime
Read More
Arrow Right