CrawlJobs Logo

Research Scientist Intern, LLM Evaluation

meta.com Logo

Meta

Location Icon

Location:
United States , Bellevue

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

7650.00 - 12134.00 USD / Month

Job Description:

Meta is seeking LLM Evaluation Scientists to join our Meta Superintelligence Lab, focusing on the evaluation and benchmarking of large language models (LLMs) across language and multimodal domains. We are committed to advancing the field of artificial intelligence by developing rigorous methodologies and tools to assess and improve the capabilities, safety, and reliability of cutting-edge AI systems. We are looking for individuals passionate about LLM evaluation, benchmarking, prompt engineering, data analysis, and the development of robust evaluation frameworks. As an LLM Evaluation Scientist, you will have the opportunity to shape the future of AI by ensuring our models meet the highest standards of performance and safety at scale.

Job Responsibility:

  • Design, implement, and maintain comprehensive evaluation protocols for large language models, including both automated and human-in-the-loop assessments
  • Develop and curate high-quality datasets and benchmarks to measure model performance, safety, fairness, and robustness across a variety of tasks and modalities
  • Analyze model outputs to identify strengths, weaknesses, and failure modes, and provide actionable insights to research and engineering teams
  • Collaborate with researchers, engineers, and cross-functional partners to define evaluation goals, communicate findings, and drive improvements in model quality
  • Develop tools and infrastructure to streamline and scale evaluation processes, including dashboards, annotation platforms, and reporting systems
  • Stay up-to-date with the latest research in LLM evaluation, benchmarking, and responsible AI, and incorporate best practices into Meta’s workflows
  • Disseminate evaluation results through internal reports, presentations, and, when appropriate, external publications
  • Contribute to the development of evaluation methodologies that can be applied to Meta product development and deployment

Requirements:

  • Currently has or is in the process of obtaining a Ph.D. degree in Computer Science, Artificial Intelligence, Generative AI, or a relevant technical field
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • Experience with Python, C++, C, Java or other related languages
  • Experience building systems based on machine learning and/or deep learning methods

Nice to have:

  • Intent to return to the degree program after the completion of the internship/co-op
  • Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or conferences such as NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, ICCV, ECCV, or similar
  • Experience working and communicating cross functionally in a team environment
  • Experience in advancing AI techniques, including core contributions to open source libraries and frameworks in computer vision
  • Publications or experience in machine learning, AI, computer vision, optimization, computer science, statistics, applied mathematics, or data science
  • Experience solving analytical problems using quantitative approaches
  • Experience setting up ML experiments and analyzing their results
  • Experience manipulating and analyzing complex, large scale, high-dimensionality data from varying sources
  • Experience in utilizing theoretical and empirical research to solve problems
  • Experience with deep learning frameworks

Additional Information:

Job Posted:
January 26, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Scientist Intern, LLM Evaluation

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
Canada
Salary
Salary:
55.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior/Staff Machine Learning Engineer - Health Evaluation - AI Teams

At Doctolib, we're on a mission to transform how healthcare is delivered by harn...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MSc or PhD in Computer Science, Machine Learning, Data Science, or related field
  • 7+ years of hands-on experience working with large language models (e.g., GPT, Claude, Llama, or BERT-like architectures)
  • Proven experience in evaluating agentic or reasoning systems (e.g., autonomous agents, tool-using LLMs, dialogue systems, or task-oriented assistants)
  • Strong track record in experiment design, metric definition, and evaluation automation
  • Ability to bridge research and production, influencing modeling and product decisions
  • Excellent communication skills and a collaborative mindset
Job Responsibility
Job Responsibility
  • Define and own the evaluation strategy for our AI agentic system - metrics, protocols, datasets, and tooling
  • Implement and maintain automated evaluation pipelines to monitor model quality, safety, and alignment across iterations
  • Run systematic experiments to assess reasoning, factuality, robustness, and user experience
  • Collaborate closely with model developers and research scientists to provide insights and drive iterative improvement
  • Contribute to research and internal knowledge sharing on LLM evaluation methodologies and best practices
What we offer
What we offer
  • Free health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right

Manager, Machine Learning - Community Support Engineering

The Community Support Platform (CSP) at Airbnb is a critical system that drives ...
Location
Location
United States
Salary
Salary:
204000.00 - 255000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in various machine learning and AI methodologies, including LLMs and non-LLMs, tailored for user-facing products
  • Proven experience in leading teams that develop large-scale ML models and systems to improve online user experiences
  • Strong leadership skills with a track record of nurturing an innovative and collaborative team environment
  • Exceptional verbal and written communication abilities, with a keen eye for detail
  • Demonstrated capability to work effectively with stakeholders at all organizational levels, both internally and externally
  • Skilled in navigating and resolving ambiguous challenges through proactive and strategic approaches
  • PhD, or Master's degree in Computer Science, Mathematics, Statistics, or related technical field
  • 10+ years of experience in building and shipping AI models and products, including 2+ years of experience with LLMs
  • 5+ years managing machine learning teams that deliver large impact
  • Expert knowledge of machine learning algorithms and techniques
Job Responsibility
Job Responsibility
  • Lead and mentor a dynamic team of highly skilled applied scientists and machine learning engineers in the research, design and optimization of AI models and services
  • Develop and refine the overarching strategy for the ML and AI aspects of our community support products, focusing on scalability, quality, safety, performance, and reliability
  • Foster rapid development cycles without sacrificing quality, collaborating closely with platform, backend, and frontend engineers to engineer robust ML models and systems that enhance community support initiatives
  • Evaluate technical trade-offs in key decisions, ensuring optimal outcomes through data-backed strategies
  • Conduct thorough design and architecture reviews to continually elevate our standards of technical excellence
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right
New

Research Scientist Intern, Computer Vision - Video Intelligence

The Video Intelligence team is seeking highly motivated Research Interns to join...
Location
Location
United States , Menlo Park
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has or is in the process of obtaining a Ph.D. degree in Computer Science, Electrical Engineering, or related field with a focus on generative modeling, computer vision, or machine learning
  • Programming experience in Python and hands-on experience in deep learning frameworks such as PyTorch
  • Experience working on generative models e.g. diffusion models, LLM, autoregressive transformers, etc
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Conduct research on advanced topics in video generation and understanding, including but not limited to text-to-video generative models, image-to-video generative models, video understanding models, unified native video generative models
  • Design, implement, and evaluate novel algorithms and model architectures
  • Collaborate closely with researchers and engineers across the Video Intelligence group and broader GenAI teams
  • Contribute to publications in top-tier conferences and journals in AI, computer vision, and machine learning
  • Present findings and share insights that help shape both ongoing research and future directions
Read More
Arrow Right
New

Research Scientist Intern, Audio

We are seeking a highly motivated and talented Audio Research Scientist Intern t...
Location
Location
United States , Redmond
Salary
Salary:
7313.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or in the process of obtaining a PhD in Computer Science, Electrical Engineering, Auditory Neuroscience, Audio Signal processing or a related field
  • Experience in building deep learning models
  • Experience with LLM models
  • 2+ years experience with Python and PyTorch
  • Understanding of audio processing concepts
  • Proven communication and collaboration skills
  • Demonstrated skill in learning and applying new concepts, techniques, and tools to solve complex problems
  • Must obtain work authorization in country of employment at the time of hire and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Train or fine-tune audio models in Pytorch
  • Process and analyze speech and audio data: including binaural data simulation, data cleaning, feature extraction and visualization
  • Collaborate with other researchers to collect data through listening experiments
  • Design and conduct experiments to evaluate the performance of these models and interpret results
  • Communicate findings through written reports and presentations
Read More
Arrow Right