CrawlJobs Logo

Member of Technical Staff, LLM Evaluation

United States, Mountain View 139900.00 - 274800.00 USD / Year · Job Posted February 10, 2026
Apply Position
Job Link Share

Job Description

As a Member of Technical Staff, LLM Evaluation, you will develop and implement cutting-edge methodologies to help us evaluate how well Copilot performs in real-world usage scenarios. Users turn to Copilot for all types of endeavors, making it critical that we ensure our AI systems effectively help them meet their needs. Our vision for meeting user needs is expansive and includes not only task completion, but also affective aspects of the experience. You will be responsible for developing new methods to evaluate LLMs, train classifiers, experimenting with data collection techniques, and implementing methodologies to provide real-time signals on Copilot performance. We're looking for outstanding individuals with experience in the social sciences, machine learning, and analysis of natural language. The right candidate is a creative problem solver who will work closely with user researchers and product leaders to build automated evaluation frameworks that help us drive improvements in Copilot.

Job Responsibility

  • Leverage expertise to measure the performance of Copilot, identify failure modes and novel mitigation strategies, including data mining, prompt engineering, LLM as a judge, and classifier training
  • Creative problem solving, navigating complexity with clarity, independently shaping direction and delivering results even when the path isn’t obvious
  • Create and implement comprehensive evaluation frameworks across diverse scenarios, edge cases, and potential failure modes
  • Build automated testing systems, generalize solutions into repeatable frameworks, and write efficient code for model pipelines and intervention systems
  • Maintain a user-oriented perspective by understanding needs from user perspectives, validating approaches through user research, and serving as a trusted advisor on AI matters
  • Track advances in research, identify relevant state-of-the-art techniques, and adapt algorithms to drive innovation in production systems serving millions of users

Requirements

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 7+ years data-science experience
  • OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 10+ years data science experience
  • OR equivalent experience
  • Experience prompting and working with large language models
  • Experience writing production-quality Python code
  • Demonstrated interest in Responsible AI

Nice to have

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 8+ years data-science experience
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 10+ years data-science experience
  • OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 12+ years data-science experience
  • OR equivalent experience

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Member of Technical Staff, LLM Evaluation

8 matching positions

Member of Technical Staff, Data Analysis and Evaluation

As a Member of Technical Staff in Data Analysis and Evaluation, you will play a ...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extremely strong software engineering skills
  • Strong expertise in designing and conducting data collection tasks, including working with human annotators
  • Strong statistical skills and experience evaluating scientific experiments related to data collection and model performance
  • Experience analysing datasets with respect to their quality, biases, and suitability for training ML models
  • Hands-on experience training large language models (LLMs) on distributed training infrastructures
  • Familiarity with evaluating and improving the generalisability and robustness of ML systems
  • Proficiency in programming languages such as Python and ML frameworks (e.g., PyTorch, TensorFlow, JAX)
  • Excellent communication skills to collaborate effectively with cross-functional teams and present findings
  • One or more papers at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP)
Job Responsibility
Job Responsibility
  • Design and oversee data collection tasks, including supporting human annotators and ensuring data quality
  • Develop and apply statistical methods to evaluate the quality and reliability of datasets
  • Analyse and assess the generalisability and robustness of ML systems across diverse use cases
  • Collaborate with teams to improve dataset quality and model performance
  • Train and fine-tune large language models (LLMs) on distributed training infrastructures
  • Conduct experiments to evaluate model performance and identify areas for improvement
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Super Intelligence Post-Training team is dedicated to advancing...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or master’s degree in computer science, Engineering, or a related field, or equivalent practical experience
  • 5+ years of professional experience, including 2+ years with Python and ML frameworks such as PyTorch or TensorFlow
  • Hands-on experience with training or fine-tuning LLMs or multimodal models
  • Familiarity with production ML systems and concepts like model serving, caching, batching, and monitoring
  • Understanding of distributed systems and cloud-based infrastructure
Job Responsibility
Job Responsibility
  • Implement large-scale model training, especially with LLMs, SLMs, multimodal, or code-specific models
  • Develop robust evaluation frameworks to assess model performance, conduct systematic benchmarking, and address identified weaknesses while ensuring compliance with customer standards
  • Write efficient, production-quality code and debug complex distributed systems
  • Build and maintain internal tools to streamline training and evaluation workflows and automate repetitive tasks within secure development environments
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Post Training - MAI Superintelligence Team

At Microsoft AI, we are on a mission to develop the most cutting-edge algorithms...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Machine Learning, Mathematics, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Have experience with reward modeling, RL, or other post-training techniques
Job Responsibility
Job Responsibility
  • Develop data collection, evaluation, and post-training methods for models
  • Design hypotheses and experiment plans for rapidly iterating on model performance
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Engineer

Help build the infrastructure that powers training, evaluation, and data platfor...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering background building reliable, scalable production systems (Python preferred)
  • Hands‑on experience supporting large‑scale ML / LLM training, evaluation, or experimentation infrastructure
  • Operating GPU‑heavy workloads in cloud environments using Docker and Kubernetes (scheduling, utilization, isolation)
  • Designing and running data / compute pipelines and orchestration (e.g., Airflow, Argo) with object storage (Azure Blob / S3)
  • Platform reliability and operability: observability, metrics, logging, tracing, alerting (Prometheus, Grafana, OpenTelemetry)
Job Responsibility
Job Responsibility
  • Design and build core platform services for scalable training and evaluation, including cluster orchestration, job scheduling, data and compute pipelines, and artifact management
  • Standardize containerized workflows by maintaining Docker images, CI/CD, and runtime configurations
  • advocate for best practices in security, reproducibility, and cost efficiency
  • Implement end-to-end observability and operations through metrics, tracing, logging, dashboard development, monitoring, and automated alerts for model training and platform health (using Prometheus, Grafana, OpenTelemetry)
  • Architect and operate services on Azure cloud platforms, managing infrastructure-as-code (Terraform/Helm), secrets, networking, and storage
  • Enhance developer experience by creating tools, CLIs, and portals that simplify job submission, metrics analysis, and experiment management for generalist software engineering and research teams
  • Enforce security and compliance policies for data access, container hardening, and supply-chain integrity, and partner with security and privacy teams to maintain robust practices in multi-tenant environments and secret management
  • Collaborate cross-functionally with data, model, and product teams to align infrastructure roadmaps with training needs, evaluation protocols, and Copilot product goals
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Machine Learning

As a Member of Technical Staff - Machine Learning (AI Team), you will work to cr...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Doctorate in Computer Science, Machine Learning, Human-Centered AI or related field AND 2+ year(s) experience (e.g., finetuning models with supervision or reinforcement learning, understanding and fixing data quality and curation, working with collaborators on creating new products)
  • OR Master's Degree in Computer Science, Machine Learning, or related field AND 5+ years experience (e.g., managing structured and unstructured data, developing and debugging models, creating infrastructure for AI-powered products)
  • OR Bachelor's Degree in Computer Science, Mathematics, Machine Learning, Physics, or related field AND 7+ years data-science experience (e.g., managing structured and unstructured data, applying machine learning techniques and driving product direction)
  • Demonstrated engineering experience or research experience (e.g. creating or leading the creation of a feature in a different company, complex graduate work, research papers, or other experience)
  • 4+ years of data science experience (e.g., managing structured and unstructured data, applying machine learning techniques and driving product direction)
  • Experience prompting, evaluating, and working with large language models
  • Experience writing production-quality Python code
Job Responsibility
Job Responsibility
  • Leverage subject matter expertise to improve model quality for interactive and agentive experiences
  • Oversee data acquisition or generation efforts, ensuring that the data meets the model needs
  • Generalize machine learning (ML) solutions into repeatable frameworks
  • Lead evaluation efforts of models, including those deployed within Microsoft products and the Cloud API
  • Track advances in industry and academia, identifies relevant state-of-the-art research, and adapts algorithms and/or techniques to drive innovation and develop new solutions
  • Independently write efficient, readable, extensible code and model pipelines
  • Commit to a customer-oriented focus by acknowledging customer needs and perspectives and building AI products that delight customers
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Principal Engineering Manager

As Microsoft continues to push the boundaries of AI, we are on the lookout for s...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, Javascript, or Python OR equivalent experience
  • Demonstrated track record of building and scaling engineering organizations (hiring teams from scratch, structuring orgs, growing managers)
  • Experience delivering large-scale software systems in AI, machine learning, or related fields
  • Experience managing organizations of 30+ engineers across multiple teams and workstreams
  • Deep expertise in LLM evaluation, AI quality measurement, or ML infrastructure at scale
  • Track record of partnering with senior leadership (VP/CVP level) to set strategy and drive cross-organizational programs
  • Experience recruiting and developing senior engineering talent (principal engineers, engineering managers) in a competitive market
  • Proven ability to operate effectively in fast-paced, ambiguous environments — comfortable making decisions with incomplete information and course-correcting quickly
  • Strong technical judgment: ability to evaluate architectural tradeoffs, assess technical risk, and guide teams toward sound engineering decisions without needing to write the code yourself
  • Experience leading distributed or multi-site engineering teams.
Job Responsibility
Job Responsibility
  • Build and lead a multi-team engineering organization (30+ engineers across multiple teams), including hiring and developing engineering managers who lead their own teams
  • Set the technical and organizational strategy for Copilot AI Evaluation and response quality, aligning with MAI's broader product and engineering vision
  • Partner with senior Eng and Product leadership (Partner+ level) to define priorities, influence roadmaps, and drive cross-organizational initiatives
  • Own end-to-end delivery of evaluation platforms, novel evaluation techniques, and agentic solutions for measuring and improving Copilot quality at scale
  • Recruit, develop, and retain world-class engineering talent — building a culture of technical excellence, accountability, and continuous learning
  • Drive operational rigor: establish engineering processes, quality bars, and delivery cadences that enable predictable, high-quality execution across multiple concurrent workstreams
  • Navigate ambiguity and make high-judgment tradeoff decisions on technology, staffing, and investment priorities in a fast-moving AI landscape
  • Foster a diverse, inclusive team culture where engineers at all levels can do their best work and grow their careers
  • Embody our Culture and Values.
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence Post Training team is dedicated to advancing ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field AND 3+ years related research experience OR Master's Degree in relevant field AND 4+ years related research experience OR Bachelor's Degree in relevant field AND 6+ years related research experience OR equivalent experience
  • 5+ years of coding experience in Python and experience with ML frameworks such as PyTorch and Triton
  • 3+ years of experience in data curation and synthesis, creating and refining datasets to optimize training outcomes
  • 3+ years of proven ability to design and scale training infrastructure and pipelines in production environments
  • 3+ years of large-scale model training - especially with LLMs, SLMs, multimodal, or code-specific models
  • Prior research publication record with over 3000 citations
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Perform large-scale model training - Especially with LLMs, SLMs, multimodal, or code-specific models
  • Perform data curation and synthesis - Creating and refining datasets to optimize training outcomes
  • Hands-on coding- write efficient, production-quality code and debug complex training jobs
  • Work on both proprietary and open-source frameworks - Demonstrated proficiency in training pipelines and architecture
  • Full-stack modeling responsibility - From data ingestion and training to evaluation and inference management
  • Contribute to or build on existing innovations like technical report of the well-known models
  • Develop novel AI solutions that bridge language, vision, and code understanding
  • Help develop models powering tools like GitHub Copilot, Cursor, and VS Code suggestions
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Machine Learning

As a Member of Technical Staff - Machine Learning, you will work to create LLM m...
Location
Location
United States , Mountain View
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Demonstrated engineering experience or research experience (e.g. creating or leading the creation of a feature in a different company, complex graduate work, research papers, or other experience)
  • Experience prompting, evaluating, and working with large language models
  • Experience writing production-quality Python code
Job Responsibility
Job Responsibility
  • Own and pursue a research agenda to improve model capability and performance for agentive application
  • Collaborate closely with the other research and product teams, from pretraining to model hosting to unlock new model capabilities
  • Build robust evaluations for tracking modeling improvements
  • Design, implement, test, and debug code across our research stack
  • Fulltime
Read More
Arrow Right