CrawlJobs Logo

Senior Research Scientist, Model Evaluation

cohere.com Logo

Cohere

Location Icon

Location:
United States; Canada; United Kingdom , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop new evaluation techniques that accurately reflect what models are already capable of, as well as set the agenda for what future models should be capable of. In this role, you are responsible for creating these next-generation evaluation methods and infrastructure to measure LLM progress.

Job Responsibility:

  • Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish
  • Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations
  • Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges
  • refining LLM-based data synthesis pipelines
  • and improving evaluation efficiency
  • Build scalable and reusable tools for digging into model performance

Requirements:

  • Enjoy rapidly building prototypes that demonstrate the boundaries of what LLMs are capable of
  • Have developed resources to measure LLM capabilities
  • Have spent dozens of hours reviewing complex data and LLM outputs to ensure high data quality
  • Obsessive about rigorously measuring AI capabilities and ensuring measurements align with the capabilities you care about
  • Have strong software engineering skills
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Research Scientist, Model Evaluation

Senior Research Scientist, Intelligent Talent Acquisition - Lead Generation & Detection Services

Do you want a role with deep meaning and the ability to make a major impact? As ...
Location
Location
United Kingdom , Edinburgh
Salary
Salary:
Not provided
amazon.de Logo
Amazon Pforzheim GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's degree, or a PhD and experience in quantitative field research
  • Experience investigating the feasibility of applying scientific principles and concepts to business problems and products
  • 5+ years of experience in applied selection research, job analysis, test development, and validation
  • Foundational skills in conducting experimental research studies and data analysis
  • Proficiency in scripting for data analysis (e.g., R, Python)
Job Responsibility
Job Responsibility
  • Partner on design and development of AI-powered systems to scale job analyses enterprise-wide
  • Match potential candidates to the jobs they’ll be most successful in
  • Conduct validation research for top-of-funnel AI-based evaluation tools
  • Develop and implement novel research strategies using the latest technology
  • Build solutions while experiencing Amazon’s customer-focused culture
  • Work with diverse groups of people and inter-disciplinary cross-functional teams to solve complex business problems
Read More
Arrow Right

Senior Data Scientist - AI Modeling

As a Senior Data Scientist - AI Modeling at Baxter, you will work on creating an...
Location
Location
United States
Salary
Salary:
104000.00 - 143000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in STEM (science, technology, engineering, math) related field or a similar quantitative analytics field
  • 4+ years of professional experience with a variety of data products / data science model / algorithm development and implementing in production
  • Software development experience
  • Experience with healthcare data and working in a HIPAA regulated environment preferred
  • Experience with varying database structures and large datasets preferred
  • Experience with modern data science tools, such as Spark, Scala, Python, Databricks
  • Experience in Microsoft Azure cloud environment is preferred
  • Proficiency with developing data visualization technology and capabilities (i.e., Power BI, Tableau)
  • Brings a drive for creatively applying pragmatic and scalable approaches to Machine Learning to tackle difficult problems affecting patients and providers
  • Passionate about working on a high-performance team toward a multi-year vision with incremental deliverables
Job Responsibility
Job Responsibility
  • Responsible for the development and implementation of predictive modeling algorithms and techniques to address unmet needs, customer/business problems and optimize user experiences
  • Conduct in-depth research to stay at the forefront of AI advancements, exploring opportunities to integrate predictive and generative AI models into our products and services
  • Predictive and generative AI Modeling
  • Formulate problem statements and hypotheses for diverse business challenges (clinical, operational and business process optimization problems)
  • Create Spark & Python code in Databricks to retrieve data from across disparate data sources and create new innovative actionable insights
  • Prepare data for effective model training
  • Develop, train, and evaluate predictive AI models using various tailored to specific problems
  • Continuously refine and optimize models for performance, scalability, and efficiency
  • Deploy models into production environments and supervise their performance
  • Identify opportunities where generative AI models can add value
What we offer
What we offer
  • Comprehensive medical and dental coverage starting on day one
  • Insurance coverage for basic life, accident, short-term and long-term disability, and business travel accident
  • Employee Stock Purchase Plan (ESPP) with discount
  • 401(k) Retirement Savings Plan with employee contributions and company matching
  • Flexible Spending Accounts
  • Educational assistance programs
  • Paid holidays
  • Paid time off ranging from 20 to 35 days based on length of service
  • Family and medical leaves of absence
  • Paid parental leave
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Modeling

As a Senior Data Scientist specializing in AI Modeling, you will develop and imp...
Location
Location
United States
Salary
Salary:
104000.00 - 143000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in STEM or a similar quantitative analytics field
  • 4+ years of professional experience with a variety of data products/data science model/algorithm development and implementing in production
  • Software development experience
  • Experience with healthcare data and working in a HIPAA regulated environment preferred
  • Experience with varying database structures and large datasets preferred
  • Experience with modern data science tools including Spark, Scala, Python, Databricks, and more
  • Experience in Microsoft Azure cloud environment preferred
  • Proficiency with developing data visualization technology and capabilities such as Power BI and Tableau
  • Entrepreneurial self-starter, curious about both technology and business, and driven by delivering end-user value and impact.
Job Responsibility
Job Responsibility
  • Development and implementation of predictive modeling algorithms and techniques to address unmet needs, customer/business problems, and optimize user experiences
  • Conduct research to stay at the forefront of AI advancements
  • Predictive and generative AI Modeling
  • Formulate problem statements and hypotheses for diverse business challenges
  • Create Spark & Python code in Databricks to retrieve data and create actionable insights
  • Prepare data for effective model training
  • Develop, train, and evaluate predictive AI models
  • Continuously refine and optimize models
  • Deploy models into production environments and supervise their performance
  • Identify opportunities for generative AI models
What we offer
What we offer
  • Comprehensive compensation and benefits packages
  • Health and dental coverage starting on day one
  • Basic life, accident, short-term and long-term disability insurance
  • Business travel accident insurance
  • Employee Stock Purchase Plan (ESPP) with discounted company stock
  • 401(k) Retirement Savings Plan with employee contributions and company matching
  • Flexible Spending Accounts
  • Educational assistance programs
  • Paid holidays
  • Paid time off ranging from 20 to 35 days
  • Fulltime
Read More
Arrow Right

Senior People Scientist

The Sr People Scientist is responsible for supplying to the development of an en...
Location
Location
United States , Bellevue
Salary
Salary:
127700.00 - 230300.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree Quantitative Subject area (math, statistics, economics, computer science, physics, engineering)
  • Master's/Advanced Degree Quantitative Subject area (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • Doctorate Quantitative Discipline (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • 7-10 years Research science or related experience
  • Proven experience with Gen AI for foundational models and LLM and demonstrating for analytics
  • 4-7 years Combination of deep technical skills and business savvy to interface and influence all levels and fields
Job Responsibility
Job Responsibility
  • Support the vision and research science roadmap in collaboration with the HR leadership team and senior leadership partners
  • Collaborate in identifying and addressing large-scale, sophisticated business problems related to employee experience, talent, and organizational capability
  • Drive the development and integration of diverse and complex data sources for advanced and sophisticated qualitative and quantitative modeling
  • Contribute to maintaining high standards in research science, including supporting the mentoring and development of team members
  • Develop and implement network analytics, AI/ML, and Deep Learning models to analyze sophisticated datasets and support innovation in people science
  • Build and run true A/B and quasi-experimental designs to assess the impact of mechanisms, programs, and various tested solutions that align to the overall T-Mobile people strategy
  • evaluate research initiatives to provide bottom line value, return on investment and improvements
  • Translate technical research findings into clear, concise, and engaging reports that support decisions and applications across the employee lifecycle
  • Collaborate with multiple teams and account teams to influence, build consensus, and drive significant T-Mobile wide changes related to applying research science proposals and recommendations, including changes to programs, engineering and system needs, and people strategy roadmaps
What we offer
What we offer
  • medical, dental and vision insurance
  • flexible spending account
  • 401(k)
  • employee stock grants
  • employee stock purchase plan
  • paid time off
  • up to 12 paid holidays
  • paid parental and family leave
  • family building benefits
  • back-up care
  • Fulltime
Read More
Arrow Right

Senior Data Scientist

We’re looking for a experienced Data Scientist with deep learning experience to ...
Location
Location
France , Lyon
Salary
Salary:
Not provided
hawkcell.com Logo
HawkCell
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in deep learning applied to imaging
  • Proven track record of bringing models to production
  • Strong coding standards and experience with Git/GitHub collaboration tools
  • Proficiency in packaging tools like Poetry or uv
  • Published research papers and eagerness to continue publishing
  • Comfortable working in a fast-paced, collaborative environment
  • Ability to work in an international and English-first environment
Job Responsibility
Job Responsibility
  • Lead development of models for denoising, synthetic contrast generation, and classification
  • Design algorithms for segmentation, localization, and characterization of pathologies
  • Collaborate with other DS team members and mentor junior profiles
  • Ensure clean, maintainable code and scalable model packaging (Poetry/uv)
  • Put models into production and help design robust evaluation pipelines
  • Publish your work in peer-reviewed venues and represent Hawkcell in the AI community
  • Contribute to IT topics and help bridge R&D and productionization
What we offer
What we offer
  • A mission to revolutionize the animal healthcare industry
  • A great and ambitious team to grow with
  • An international culture with 10+ nationalities in the team
  • An amazing office for you to share with other Hawkstars in Lyon, France
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right

Senior Data Scientist - Inference, Global Markets

Partner closely with product managers, designers, engineers and operations acros...
Location
Location
China
Salary
Salary:
Not provided
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience in a fast-paced tech environment with a BS/Master’s in a technical field related to mathematics, computer science, statistics, economics, machine learning
  • 2+ years of relevant experience and a PhD in similar fields
  • Strong knowledge of causal inference and experimentation
  • Expertise in SQL, Python or R
  • Ability to solve business problems using appropriate methods and models
  • Strong stakeholder communication skills and the ability to translate complex analyses into compelling narratives and business actions
Job Responsibility
Job Responsibility
  • Work closely with cross-functional stakeholders to define product scopes, evaluate impact, and set roadmap priorities
  • Architect and implement rigorous measurement plans, using A/B tests and quasi-experimental methods to assess product success and inform strategic bets
  • Interpret unexpected outcomes, identify bias in experiments, and drive solutions to ensure measurement quality
  • Conduct in-depth research into customer behaviors and preferences across markets to unlock opportunities for international business growth
  • Develop scalable frameworks, models and systems that enable product features to be more refined and tailored to the needs of local markets
  • Actively present data findings and ideas to stakeholders at different levels, and turn proposals into actions and tangible results for the business and our customers
Read More
Arrow Right