CrawlJobs Logo

Senior Research Engineer, Model Evaluation

cohere.com Logo

Cohere

Location Icon

Location:
United States; Canada; United Kingdom , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop new techniques to accurately measure our models' performance on frontier capabilities. In this role, you are responsible for creating next-generation evaluation methods and scalable infrastructure to measure LLM progress.

Job Responsibility:

  • Develop evaluation benchmarks, datasets, and environments for measuring the bleeding edge of model capabilities
  • Conduct research to push the state-of-the-art in LLM evaluation methods, including training LLM judges
  • improving evaluation efficiency
  • and scalably building high-quality datasets
  • Build scalable tools for investigating and understanding evaluation results that are used by all members of technical staff at Cohere, as well as leadership and our CEO
  • Learn from and work with the best researchers and engineers in the field

Requirements:

  • You enjoy pushing the limits of what LLMs are capable of, and you have built high-quality evaluation resources to measure those capabilities (datasets, simulators, environments, etc.)
  • You have a track record of developing new methods and/or data to evaluate LLMs, e.g. publications at top-tier conferences, popular benchmarks, etc.
  • You have deep experience building with and around LLMs, and you have built tools for analyzing and understanding their performance
  • You have strong software engineering skills
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Research Engineer, Model Evaluation

Senior Research Engineer

iProov has continued to scale rapidly this year and is currently looking for a S...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
iproov.com Logo
iProov
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 2-3 years experience delivering machine learning or computer vision systems in an industry environment
  • A Masters or PhD in a numerate discipline such as Computer Science, Engineering, Computational Neuroscience
  • Proven experience of high-quality delivery in machine learning and computer vision
  • Strong mathematical ability
  • Proficiency in Python and modern deep learning frameworks
  • Strong problem-solving skills and out-of-box thinking
Job Responsibility
Job Responsibility
  • Lead research projects in the areas of computer vision, machine learning and biometrics
  • Develop and train deep learning models for face verification, liveness and attack detection
  • Take the lead on shaping data and evaluation strategies, including synthetic and adversarial data
  • Investigate failure modes and improve generalisation, robustness and fairness
  • Work with platform and optimisation engineers to turn models into production-quality services
  • Communicate results and recommendations clearly with stakeholders across the business
  • Contribute to the broader research strategy of the company
What we offer
What we offer
  • 25 days Annual Leave, plus 8 Bank Holidays
  • Growth Shares allocated after passing probation
  • Salary sacrifice schemes including: Pension, Cycle To Work and Electric Car Scheme
  • Nursery Sacrifice Scheme
  • Work Overseas Perk - Work globally for up to 2 weeks
  • Life Assurance
  • SmartHealth Access to private GP, Psychologist, Nutritionist along with tailored fitness plans for both you and your family
  • Benefit from personalized 1:1 career coaching with our in-house Occupational Psychologist
  • Award winning L&D platform with personal allocated training budgets
  • Enhanced paid family leave
  • Fulltime
Read More
Arrow Right

Senior Sustainability Engineer

The Senior Engineer Expert in Sustainability will be directly involved in the Eu...
Location
Location
Spain , Valencia
Salary
Salary:
Not provided
lomartov.com Logo
LOMARTOV
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Over 5 years of professional experience in research and innovation
  • Strong engineering knowledge basis
  • Specific knowledge in Eco-design and Life Cycle Assessment methodology
  • Ability to support the generation of valuable insights on sustainability related topics, and in preparing clear and compelling reports and presentations of results (mainly in English)
  • Bachelor’s and master’s degrees in scientific field (e.g. Biology, Biotechnology, Chemistry or Engineering among others)
  • PhD or postdoctoral activity in the field of Sustainable Development, LCA and Eco-Design, showing a strong understanding and analytical skills towards sustainability topics
  • Excellent level of Spanish and English (C1/C2 level in both languages)
  • Strong written and verbal communication skills
  • Strong ability to work in a multidisciplinary environment and adaptive capabilities to different research sectors
  • Ready to face new challenges, able to work under pressure and used to EU deadlines, ability to work effectively as a member of a team, flexibility, and high motivation
Job Responsibility
Job Responsibility
  • Developing and carrying out research activities strongly related with the sustainable evaluation and validation of novel processes and technologies
  • Conducting Life Cycle Assessment (LCA), Life Cycle Cost studies (LCC), Social LCA (sLCA) and Eco-design activities both for international projects and private contracts
  • Circular Economy modelling
  • Carbon and water footprint calculation
  • Material Flow Analysis (MFA)
  • modelling and simulation of specific processes
  • Levelized Cost Analysis
  • proposition of Material Circularity Indicators (MCI)
  • Risk Assessments in terms of Health and Safety criteria of the process/product considered
  • Recyclability assessments
What we offer
What we offer
  • Positive work environment, with a professional team in a dynamic developing company
  • Fulltime
Read More
Arrow Right

Sr. Staff Thermal Design and Modeling Engineer

The Enphase Energy Storage and Systems Innovation team in the office of the CTO ...
Location
Location
United States , Austin
Salary
Salary:
110000.00 - 167000.00 USD / Year
enphase.com Logo
Enphase Energy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS/PhD Mechanical Engineering or closely related discipline
  • Minimum experience: BS and 5+ years / MS and 3+ years / PhD with thermal design experience for a Senior Engineer
  • BS and 8+ years / MS and 6+ years / PhD and 3+ years thermal design experience for a Staff Engineer
  • BS and 12+ years / MS and 8+ years / PhD and 5+ years thermal design experience for a Sr. Staff Engineer
  • Significant experience designing active and passive thermal management solutions within a product and with key thermal components such as heat sinks, air flow systems, coolant systems, multi-phase cooling, phase change materials, and thermal interface materials
  • Excellent understanding of conduction, convection, and radiation
  • Experience with analytical approaches and simulation tools for thermal modeling (e.g., Solidworks Flow Sim, Fluent, Icepak, COMSOL, etc.)
  • High proficiency with an enterprise CAD package (Solidworks preferred)
  • Experience designing test plans, building experimental setups, collecting data, and analyzing results for thermal performance
  • Familiarity with design for manufacturability and reliability
Job Responsibility
Job Responsibility
  • Design, integrate, and validate thermal management solutions for Enphase energy storage and systems products
  • Build, update, and analyze computational models including heat transfer, combustion, and air flow to simulate thermal performance in energy storage and systems products
  • Specify key functional and performance requirements
  • design and execute verification test plans
  • Design high quality mechanical and electromechanical part and assembly designs efficiently and with good modeling practices
  • Produce 2D drawings and 3D CAD models, evaluate parts, and collaborate with a multi-disciplinary design team, suppliers, and contract manufacturing partners to develop product prototypes
  • Research, analyze, implement, and test advanced thermal management technologies to support innovative energy storage and system product designs
  • Complete documentation for design, testing, and analysis
  • Hands-on fabrication of electromechanical assemblies and prototypes
  • Work with a multi-disciplinary team to test and troubleshoot prototype designs
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Agentic

Join us in building the future of finance. Our mission is to democratize finance...
Location
Location
United States , Bellevue; Menlo Park; New York; Washington; Denver; Westlake; Chicago; Lake Mary; Clearwater; Gainesville
Salary
Salary:
146000.00 - 220000.00 USD / Year
robinhood.com Logo
Robinhood
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong technical expertise in software development, with understanding of agentic workflows—including reasoning loops, tool invocation, memory, and orchestration of autonomous AI agents
  • Hands-on experience using Large Language Models, including prompt engineering, fine-tuning, model distillation, and deploying optimized models (e.g. via DPO, PPO) into production environments
  • Proven ability to build and scale ML/AI systems, from experimentation to deployment—owning dataset generation, evaluation pipelines, A/B testing, and performance monitoring
  • Leadership and mentorship capabilities, with a track record of guiding complex technical projects and supporting the growth of teammates through code/design reviews and technical direction
  • Excellent communication and collaboration skills, with the ability to translate technical ideas into actionable plans and work effectively with cross-functional partners, including product and infrastructure teams
  • Innovation mindset and commitment to continuous learning and a bias toward action, staying at the forefront of ML/AI trends, agentic systems research, and best practices in tooling, safety, and evaluation
Job Responsibility
Job Responsibility
  • Design and create tools and workflows for agent development that support rapid prototyping—define agents, compose toolchains, and construct reasoning loops with minimal overhead
  • Build platform solutions to support scalable experimentation, synthetic dataset generation, and multi-agent evaluation across diverse tasks and domains
  • Develop feedback and optimization pipelines that incorporate both automated metrics and human-in-the-loop evaluation signals to fine-tune agent behavior
  • Implement and scale optimization techniques such as Direct Preference Optimization (DPO), Proximal Policy Optimization (PPO), and reward modeling to improve agent performance
  • Launch and support fine-tuned models in production environments with robust evaluation, rollback strategies, and performance monitoring
  • Collaborate closely with applied AI/ML teams to translate state-of-the-art research in agentic reasoning, planning, and tool use into reliable, production-ready systems
What we offer
What we offer
  • Market competitive and pay equity-focused compensation structure
  • 100% paid health insurance for employees with 90% coverage for dependents
  • Annual lifestyle wallet for personal wellness, learning and development, and more
  • Lifetime maximum benefit for family forming and fertility benefits
  • Dedicated mental health support for employees and eligible dependents
  • Generous time away including company holidays, paid time off, sick time, parental leave, and more
  • Lively office environment with catered meals, fully stocked kitchens, and geo-specific commuter benefits
  • Bonus opportunities
  • Equity
  • Fulltime
Read More
Arrow Right

Senior Generative AI Engineer

The Citi Innovation Lab is a leader in creating new ideas, innovative technology...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with transformer-based models and their applications
  • Strong understanding of LLM, LLM model selection, benchmarking, and optimization
  • Experience with RAG systems and vector databases
  • Proficiency in developing and deploying AI agents
  • Knowledge of open-source models and methods, including benchmarks for evaluating AI performance
  • Knowledge of security risks and mitigation strategies for autonomous AI agents, including OWASP guidelines
  • Proficiency in Python and experience with libraries such as Pandas, Tabula, and TensorFlow/PyTorch
  • Strong problem-solving skills and attention to detail
  • Excellent communication and documentation skills
Job Responsibility
Job Responsibility
  • Develop and implement enterprise scale cutting edge models such as visual document understanding and text2code
  • Implement and Optimize vector-based retrieval systems for RAG by covering embedding models, ANN indexing, hybrid search, and re-ranking
  • Implement autonomous AI agents to implement adaptive, error resistant data extraction, and content validation tasks
  • Develop and deploy enterprise software applications using state of the art practices, such as micro services, modular code, as well as proficiency in writing unit and integration tests to ensure the accuracy and reliability of the AI applications
  • Ensure data privacy and security in all AI-driven processes, adhering to OWASP guidelines and Citi’s stringent authentication and authorization policies
  • Collaborate with cross-functional teams to integrate AI solutions into existing workflows
  • Document the development process and create comprehensive technical specifications
  • Manage and maintain AI applications, ensuring best practices in model management and versioning
  • Deploy resulting AI applications using industrial strength framework and processes, including Kubernetes and OpenShift for scalable and efficient operations on-premises
  • Ability to research and develop and utilize transformer-based models for enhanced application performance
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Controls

As a Senior Software Engineer on our controls team, you will deliver mission-cri...
Location
Location
United States , Santa Clara
Salary
Salary:
150000.00 - 200000.00 USD / Year
plus.ai Logo
PlusAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's or PhD degree in Mechanical Engineering, Robotics, Aerospace Engineering, Computer Science, or related field
  • 2+ years of MLE experience or industry experience designing and developing for robotics applications
  • Strong foundation in motion control and modern neural network architectures, with expertise in at least one application area, such as IL/RL, time-series analysis, or dynamic system modeling
  • Skilled in debugging robotic systems within Linux environments, with strong programming expertise in Python and C++
  • Experience in model development & training with modern frameworks (e.g. PyTorch)
  • Hands-on familiarity with data ingestion and processing pipelines
Job Responsibility
Job Responsibility
  • Design, implement, and enhance control algorithms by developing frameworks that integrate MPC with learning based approaches (DL/RL/IL)
  • Responsible for the conceptual design and implementation of data driven controller by working cross-functionally with domain experts and other stakeholders by specifying meaningful insights for solving trajectory tracking problems
  • Develop tools and infrastructure for dataset generation, training, and evaluation to drive advancements in online control optimization
  • Ensure all model development keeps a real-time focus and operates efficiently in compute-constrained environments
  • Take a lead role in the planning and execution of vehicle testing in the offline simulation environment and on public roads to systematically improve performance, as well as performing root cause analysis and debugging to address any issues
  • Track and incorporate the latest multidisciplinary research advancements in a fast-moving field
  • Ensure that your work is performed in accordance with the company’s Quality Management System (QMS) requirements and contribute to continuous improvement efforts
  • Ensure team compliance with QMS, monitor quality, and drive process improvements
What we offer
What we offer
  • Work, learn and grow in a highly future-oriented, innovative and dynamic field
  • Wide range of opportunities for personal and professional development
  • Catered free lunch, unlimited snacks and beverages
  • Highly competitive salary and benefits package, including 401(k) plan
  • Fulltime
Read More
Arrow Right

Senior Analytics Engineer

The Analytics Engineering team provides all parts of Intercom’s business with th...
Location
Location
United States , San Francisco
Salary
Salary:
192400.00 - 223600.00 USD / Year
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You write advanced SQL with a preference for well-architected data models, optimized query performance, and clearly documented code
  • You’re familiar with the modern data stack. DBT and Snowflake experience are a big plus.
  • A growth mindset and eagerness to learn.
  • You exhibit great judgment and sharp business and product instincts that allow you to differentiate essential versus nice-to-have and to make good choices about trade-offs
  • You practice excellent communication skills, and you tailor explanations of technical concepts to a variety of audiences
Job Responsibility
Job Responsibility
  • Data Platform Development: Design, build, and manage scalable data pipelines and ELT processes to support a robust, analytics-ready data platform.
  • Cross-functional Collaboration: Partner with engineering, analytics, and business teams to understand data needs and ensure accurate, insightful data solutions.
  • Data Strategy & Governance: Lead initiatives in data model development, data quality ownership, warehouse management, and production support for critical workflows.
  • Advanced Analytics & Insights: Conducted in-depth data analysis and built custom models to support strategic business decisions and performance measurement.
  • Automation & Optimization: Streamline data collection and reporting processes to reduce manual effort and improve efficiency.
  • Innovation in Data Infrastructure: Created scalable solutions like unified data pipelines and access control systems to meet evolving organizational needs.
  • Impact Measurement: Developed a strategy and technical pipeline for evaluating research output impact using citation metadata.
What we offer
What we offer
  • Competitive salary and meaningful equity
  • Comprehensive medical, dental, and vision coverage
  • Regular compensation reviews - great work is rewarded!
  • Flexible paid time off policy
  • Paid Parental Leave Program
  • 401k plan & match
  • In-office bicycle storage
  • Fun events for Intercomrades, friends, and family!
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

Machine Learning is a cornerstone at Taskrabbit, and we're looking for a seasone...
Location
Location
United States , New York; San Francisco
Salary
Salary:
148000.00 - 200000.00 USD / Year
taskrabbit.com Logo
Taskrabbit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, MS, or PhD in Computer Science, Statistics, Operations Research, or a related quantitative field
  • 3+ years of industry experience building and deploying high-quality, production-grade machine learning models and systems
  • Strong theoretical knowledge and hands-on experience in machine learning, particularly in areas like search, ranking, recommender systems, or NLP
  • Solid software engineering skills with proficiency in one or more programming languages, including Python
  • Experience with popular ML libraries like Scikit-learn, lightgbm, xgboost, TensorFlow, PyTorch, etc.
  • Proficiency in SQL is also required for writing complex queries and transforming data
  • Experience building REST API-based services
  • Experience with modern data and ML technologies, such as Docker, Kubernetes, Kafka, Airflow, data warehouses (eg snowflake, redshift or BigQuery), and data lakes
  • Excellent communication skills, with the ability to present complex findings and recommendations clearly to both technical and non-technical audiences
  • A passion for quickly learning new technologies and a drive to solve challenging problems
Job Responsibility
Job Responsibility
  • Model Development & Research: Research, design, and implement machine learning models to solve key business problems in areas like search ranking, recommendations, and content discovery
  • End-to-End ML Lifecycle: Own the entire lifecycle of ML models, including feature engineering, training, evaluation, deployment, and monitoring
  • Infrastructure & Scalability: Build scalable and reliable ML infrastructure and data pipelines that support reproducible feature engineering and machine learning model deployment in real-time, near real-time, and batch processes
  • Performance & Quality: Build monitoring services to understand data quality and model performance of complex systems, and collaborate with engineering and science teams to optimize existing algorithms for training and evaluation
  • Software Engineering Excellence: Independently solve complex problems, write clean, efficient, and sustainable code, and actively participate in code reviews, documentation, and the full software engineering lifecycle
What we offer
What we offer
  • Taskrabbit is a Hybrid Company
  • The People
  • The Diverse Culture
  • Taskrabbit offers our employees with employer-paid health insurance and a 401k match with immediate vesting for our US based employees
  • We offer all of our global employees generous and flexible time off with 2 company-wide closure weeks, Taskrabbit product stipends, wellness + productivity + education stipends, IKEA discounts, reproductive health support, and more
  • Fulltime
Read More
Arrow Right