CrawlJobs Logo

Research Scientist Intern, LLM Evaluation

meta.com Logo

Meta

Location Icon

Location:
United States , Bellevue

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

7650.00 - 12134.00 USD / Month

Job Description:

Meta is seeking LLM Evaluation Scientists to join our Meta Superintelligence Lab, focusing on the evaluation and benchmarking of large language models (LLMs) across language and multimodal domains. We are committed to advancing the field of artificial intelligence by developing rigorous methodologies and tools to assess and improve the capabilities, safety, and reliability of cutting-edge AI systems. We are looking for individuals passionate about LLM evaluation, benchmarking, prompt engineering, data analysis, and the development of robust evaluation frameworks. As an LLM Evaluation Scientist, you will have the opportunity to shape the future of AI by ensuring our models meet the highest standards of performance and safety at scale.

Job Responsibility:

  • Design, implement, and maintain comprehensive evaluation protocols for large language models, including both automated and human-in-the-loop assessments
  • Develop and curate high-quality datasets and benchmarks to measure model performance, safety, fairness, and robustness across a variety of tasks and modalities
  • Analyze model outputs to identify strengths, weaknesses, and failure modes, and provide actionable insights to research and engineering teams
  • Collaborate with researchers, engineers, and cross-functional partners to define evaluation goals, communicate findings, and drive improvements in model quality
  • Develop tools and infrastructure to streamline and scale evaluation processes, including dashboards, annotation platforms, and reporting systems
  • Stay up-to-date with the latest research in LLM evaluation, benchmarking, and responsible AI, and incorporate best practices into Meta’s workflows
  • Disseminate evaluation results through internal reports, presentations, and, when appropriate, external publications
  • Contribute to the development of evaluation methodologies that can be applied to Meta product development and deployment

Requirements:

  • Currently has or is in the process of obtaining a Ph.D. degree in Computer Science, Artificial Intelligence, Generative AI, or a relevant technical field
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • Experience with Python, C++, C, Java or other related languages
  • Experience building systems based on machine learning and/or deep learning methods

Nice to have:

  • Intent to return to the degree program after the completion of the internship/co-op
  • Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or conferences such as NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, ICCV, ECCV, or similar
  • Experience working and communicating cross functionally in a team environment
  • Experience in advancing AI techniques, including core contributions to open source libraries and frameworks in computer vision
  • Publications or experience in machine learning, AI, computer vision, optimization, computer science, statistics, applied mathematics, or data science
  • Experience solving analytical problems using quantitative approaches
  • Experience setting up ML experiments and analyzing their results
  • Experience manipulating and analyzing complex, large scale, high-dimensionality data from varying sources
  • Experience in utilizing theoretical and empirical research to solve problems
  • Experience with deep learning frameworks

Additional Information:

Job Posted:
January 26, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Scientist Intern, LLM Evaluation

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
Canada
Salary
Salary:
55.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Tech Lead Manager Machine Learning Research Scientist LLM Evals

As the Tech Lead Manager of the LLM Evals Research team, you will lead a talente...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
280000.00 - 380000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development
  • Experience and track of recording in landing major research impacts in a fast-paced environment
  • Experience supporting and leading a team of research scientists and research engineers
  • Excellent written and verbal communication skills
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
  • Previous experience in a customer facing role
Job Responsibility
Job Responsibility
  • Lead a team of highly effective research scientists and research engineers on LLM evals
  • Conduct research on the effectiveness and limitations of existing LLM evaluation techniques
  • Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness
  • Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
  • Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols
  • Implement scalable and reproducible evaluation pipelines using modern ML frameworks
  • Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives
  • Remain up-to-date on ongoing research in the team, help work through technical challenges, and be involved in design decisions
  • Remain deeply involved in the research community, both understanding trends, and setting them
  • Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • Fulltime
Read More
Arrow Right

Staff Machine Learning Research Scientist, LLM Evals

As a Staff Machine Learning Research Scientist on the LLM Evals team, you will l...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
280000.00 - 380000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development
  • Experience and track of recording in landing major research impacts in a fast-paced environment
  • Experience tech leading a team of research scientists and research engineers
  • Excellent written and verbal communication skills
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
  • Previous experience in a customer facing role.
Job Responsibility
Job Responsibility
  • Drive research on the effectiveness and limitations of existing LLM evaluation techniques
  • Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness
  • Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
  • Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols
  • Implement scalable and reproducible evaluation pipelines using modern ML frameworks
  • Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives
  • Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects
  • Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science
  • Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results.
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • commuter stipend (may be eligible).
  • Fulltime
Read More
Arrow Right

Senior/Staff Machine Learning Engineer - Health Evaluation - AI Teams

At Doctolib, we're on a mission to transform how healthcare is delivered by harn...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MSc or PhD in Computer Science, Machine Learning, Data Science, or related field
  • 7+ years of hands-on experience working with large language models (e.g., GPT, Claude, Llama, or BERT-like architectures)
  • Proven experience in evaluating agentic or reasoning systems (e.g., autonomous agents, tool-using LLMs, dialogue systems, or task-oriented assistants)
  • Strong track record in experiment design, metric definition, and evaluation automation
  • Ability to bridge research and production, influencing modeling and product decisions
  • Excellent communication skills and a collaborative mindset
Job Responsibility
Job Responsibility
  • Define and own the evaluation strategy for our AI agentic system - metrics, protocols, datasets, and tooling
  • Implement and maintain automated evaluation pipelines to monitor model quality, safety, and alignment across iterations
  • Run systematic experiments to assess reasoning, factuality, robustness, and user experience
  • Collaborate closely with model developers and research scientists to provide insights and drive iterative improvement
  • Contribute to research and internal knowledge sharing on LLM evaluation methodologies and best practices
What we offer
What we offer
  • Free health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right
New

Data & Applied Scientist II

Are you a skilled Data Scientist with a passion for AI? Would you like to build ...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Do you have a degree (Bachelor’s, Master’s, or Doctorate) in a relevant quantitative field, or equivalent experience, along with the required level of data science experience?
  • Extensive experience with coding in SQL and R/Python/Spark to implement statistical models, machine learning, and analysis on big data
  • Deep knowledge of LLM/GPT fundamentals, and experience with prompt engineering, evaluation of LLM output, and agents
  • Outstanding written and oral communication, exemplified through experience in collaborative problem-solving, and presenting findings to technical and non-technical audiences
  • Extensive experience translating research or business problems into analytical, machine learning, and AI‑driven solutions
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Collaborate with cross‑functional partners to understand customer and product goals and contribute to growth in Microsoft 365 Copilot through the application of best practices in data science.
  • Translate business and customer problems into analytical, machine learning, causal modeling, and AI‑driven solutions by selecting and applying appropriate methodologies in Python and SQL.
  • Design, execute, and analyse A/B experiments by forming hypotheses, building scorecards, calculating new metrics, and interpreting results to inform product decisions.
  • Write efficient, readable, and maintainable production‑quality code and collaborate with engineering partners to integrate data models and analyses into Azure‑based systems.
  • Engage in AI research and development activities involving LLM fundamentals, prompt engineering, evaluation of LLM output, agents, and construction of LLM powered applications.
  • Evaluate model and analysis performance against business objectives by testing on real or production data, incorporating stakeholder feedback, and contributing to reviews of assumptions, risks, and limitations.
  • Learn and apply current data science, AI, privacy, security, and compliance best practices
  • engage with internal research and senior peers to share knowledge and contribute to scalable, responsible data‑driven solutions across Microsoft.
  • Fulltime
Read More
Arrow Right