CrawlJobs Logo

Research Intern - AI Evaluation and Alignment

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

6710.00 - 13270.00 USD / Month

Job Description:

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. Microsoft Research and Copilot Studio team are seeking Research Interns to help advance the quality, reliability and evaluation of Large Language Model (LLM)-based systems. Research Interns will collaborate with applied scientists and engineers to explore new machine learning methods that improve how Artificial Intelligence (AI) systems assess and align with human expectations.

Job Responsibility:

  • Co-developing a research project in collaboration with the supervisor and research mentors
  • Designing and implementing machine learning approaches, including training and fine-tuning using real-world datasets
  • Developing evaluation frameworks and benchmarking methods to assess model quality, robustness, and generalization
  • Presentation and communication of research findings

Requirements:

  • Currently enrolled in a PhD program in Statistics, Computer Science, Physics, Operations Research, or a related technical field
  • At least 1 year of hands-on experience working on LLM-related projects (e.g., prompt engineering, building and evaluating LLM-based systems, rewards modeling etc.)
  • At least 1 year of experience coding in Python
  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship
  • Submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples
  • Submit a list of projects you worked on in the last 2 years with the following information: Start and end date for the project, Brief overview of what the project is about, What you did on the project, What technologies you used for the project

Nice to have:

  • Prior experience in reward models for large language models or LLM-as-a-Judge
  • Strong experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with software engineering best practices (e.g. git)
  • Experience with LLM post-training and evaluation or LLM-based judge systems
  • Research experience demonstrated through publications or projects
  • Ability to work independently in ambiguous or rapidly evolving situations and collaborate effectively across disciplines

Additional Information:

Job Posted:
February 04, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Intern - AI Evaluation and Alignment

AI Research Intern

Join the cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , San Francisco
Salary
Salary:
55.00 - 66.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field with an anticipated degree completion date between September 2025 - June 2026
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques.
Job Responsibility
Job Responsibility
  • Collaborate with Research Scientists and Machine Learning Engineers
  • Contribute to the design and execution of research experiments
  • Curate, preprocess, and manage large datasets for training and evaluation
  • Execute continued training and alignment of LLMs
  • Evaluate advanced ML algorithms
  • Publish research in internal and external industry workshops and academic journals.
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
Canada
Salary
Salary:
55.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Research Associate

The role involves accelerating research towards new applications, core methodolo...
Location
Location
United States , Milpitas
Salary
Salary:
43.27 - 93.15 USD / Hour
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
May 26, 2026
Flip Icon
Requirements
Requirements
  • Pursuing a PhD degree (or other degree with significant research and innovation experience) in a relevant discipline (e.g. electrical engineering, computer science, machine learning, applied physics, mathematics, statistics, etc.)
  • Track record of world-class innovative contributions and ideas in AI/ML and/or related areas
  • Experience with innovative solution development, such as developing proofs-of-concept, first-of-a-kind solutions, and/or technology transfer
  • Experience with in-memory computing accelerators
  • Experience with micro architecture design for custom accelerators
  • Experience in deep learning research, algorithms, and data structures
  • Experience in design and test of custom integrated circuit IP for AI/ML applications
  • Experience with emerging analog memory devices for computing applications such as memristor, ReRAM, PCM, and others
  • Experience in system software, GPU acceleration, DL model execution and performance optimization
  • Self-motivated, proactive, with leadership qualities
Job Responsibility
Job Responsibility
  • Investigation of high-performance accelerators which combine CMOS and emerging ReRAM device technologies (or memristors) for computing applications including machine learning, neuromorphic computing, network security, finite automata, and other novel computational models
  • Design of prototype systems and/or integrated circuits
  • Invention of new architectures, circuits, and/or devices to take advantage of physical hardware systems for acceleration of target computations
  • Operation of existing hardware platforms
  • Performance evaluations with competing systems
  • Provide thought leadership and technical influence both internally and externally to HPE
  • Contribute along the full range from initial novel ideas to design, development, implementation, evaluation, and technology transfer
  • Collaborate with HPE Labs research teams as well as with external partners
  • Work in alignment with HPE's broader innovation community
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Programs catered to helping employees reach career goals
  • Unconditional inclusion in work environment
  • Fulltime
Read More
Arrow Right

Director, Digital Ecosystem Applications

This position is responsible for the Software Platforms group at the Innovation ...
Location
Location
United States , Belmont
Salary
Salary:
240000.00 - 285000.00 USD / Year
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years with 2+ years in a technical leadership role
  • CS, EE, M.S. Engineering (or equivalent) REQUIRED
  • M.S. Engineering (or equivalent) or PhD PREFERRED
  • Analytical and conceptual thinking – using logic and reason, creative and strategic
  • Communication skills – interpersonal, presentation and written
  • Managing interdisciplinary teams on individual projects
  • Integration – joining people, processes or systems
  • Influencing and negotiation skills
  • Problem solving
  • Resource management
Job Responsibility
Job Responsibility
  • Define the technical mission, architecture strategy, and long‑term platform vision for the In‑Vehicle Computing & Digital Ecosystem Applications team, spanning Android Automotive OS (AAOS), in‑vehicle compute platforms, Software‑Defined Vehicle (SDV) architecture, and AI‑driven cockpit intelligence
  • Provide technical leadership across the full software stack, including Android Framework, System Services, HAL layers, middleware, connectivity stacks, media/audio frameworks, HMI toolchains, and cloud‑connected AI runtimes within an SDV‑aligned architecture
  • Lead and mentor engineering teams in platform bring‑up, system integration, performance optimization, and development of AI‑agentic features, multimodal interaction models, and next‑generation speech technologies
  • Manage multi‑year budgets for platform development, AI integration, SDV‑aligned compute evolution, SoC evaluations, cloud services, and prototype programs
  • Deliver executive‑level technical reporting on architecture decisions, platform readiness, SDV integration milestones, AI progress, risks, and strategic recommendations
  • Drive strategic planning for ICC’s infotainment and cockpit portfolio, including AAOS evolution, hybrid cloud/edge AI pipelines, intelligent mobile agent technologies, and SDV‑centric software and compute roadmaps
  • Align technical roadmaps with global VW Group Innovation teams across infotainment, connectivity, AI/ML, vehicle architecture, cloud services, and SDV platform strategy, ensuring cross‑platform consistency and shared component reuse
  • Build strategic relationships with SoC vendors, Tier‑1 suppliers, cloud providers, and AI technology partners to influence cockpit compute and SDV platform evolution
  • Maintain partnerships with Silicon Valley companies specializing in AI runtimes, LLMs, speech, multimodal interaction, and automotive‑grade SDV‑compatible software frameworks
  • Collaborate with academic and research institutions on AI‑agentic systems, embedded ML, HMI, and in‑vehicle compute architectures aligned with SDV principles
What we offer
What we offer
  • Eligibility for annual performance bonus
  • Healthcare benefits
  • 401(k), with company match
  • Defined contribution retirement program
  • Tuition reimbursement
  • Company lease car program
  • Paid time off
  • Fulltime
Read More
Arrow Right

Senior/Staff Machine Learning Engineer - Health Evaluation - AI Teams

At Doctolib, we're on a mission to transform how healthcare is delivered by harn...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MSc or PhD in Computer Science, Machine Learning, Data Science, or related field
  • 7+ years of hands-on experience working with large language models (e.g., GPT, Claude, Llama, or BERT-like architectures)
  • Proven experience in evaluating agentic or reasoning systems (e.g., autonomous agents, tool-using LLMs, dialogue systems, or task-oriented assistants)
  • Strong track record in experiment design, metric definition, and evaluation automation
  • Ability to bridge research and production, influencing modeling and product decisions
  • Excellent communication skills and a collaborative mindset
Job Responsibility
Job Responsibility
  • Define and own the evaluation strategy for our AI agentic system - metrics, protocols, datasets, and tooling
  • Implement and maintain automated evaluation pipelines to monitor model quality, safety, and alignment across iterations
  • Run systematic experiments to assess reasoning, factuality, robustness, and user experience
  • Collaborate closely with model developers and research scientists to provide insights and drive iterative improvement
  • Contribute to research and internal knowledge sharing on LLM evaluation methodologies and best practices
What we offer
What we offer
  • Free health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...
Location
Location
Canada; United States
Salary
Salary:
300000.00 - 450000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
  • Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
  • Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
  • Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
  • Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
  • Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
  • Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
  • Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
  • Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
  • Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency
Job Responsibility
Job Responsibility
  • Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
  • Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
  • Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
  • Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
  • Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
  • Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
  • Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
  • Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
  • Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
  • Ensure robust observability, monitoring, and safety guardrails for all AI systems in production
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA
  • Fulltime
Read More
Arrow Right

Delivery Quality Engineer, AI Business

As a Delivery Quality Engineer within Prolific AI Data Services, you will be the...
Location
Location
Mexico , Mexico City
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in quality engineering, data or annotation quality, analytics engineering, trust and integrity, or ML/LLM evaluation operations
  • Strong proficiency in Python and SQL, with comfort applying statistical concepts such as sampling strategies, confidence levels, and agreement metrics
  • A proven track record of turning ambiguous or messy quality problems into clear metrics, automated checks, and durable process improvements
  • Strong quality systems thinking, with the ability to translate complex edge cases into clear rules, tests, rubrics, and governance mechanisms
  • Hands-on experience instrumenting workflows and implementing pragmatic automation that catches quality and integrity issues early
  • Demonstrated ability to influence cross-functional teams (Product, Engineering, Operations, Client teams) and drive change without direct authority
  • Strong customer empathy, with a clear understanding of what “useful, trustworthy data” means for research, AI training, and evaluation use cases
Job Responsibility
Job Responsibility
  • Own end-to-end quality design for Prolific managed service studies, including rubrics, acceptance criteria, defect taxonomies, severity models, and clear definitions of done
  • Define, implement, and maintain quality measurement systems, including sampling plans, golden sets, calibration protocols, agreement targets, adjudication workflows, and drift detection
  • Build and deploy automated quality checks and launch gates using Python and SQL, such as schema and format validation, completeness checks, anomaly detection, consistency testing, and label distribution monitoring
  • Design and run launch readiness processes, including pre-launch checks, pilot calibration, ramp criteria, full-launch thresholds, and pause/rollback mechanisms
  • Partner with Product and Engineering to embed in-study quality controls and authenticity checks into workflows, tooling, and escalation paths
  • Write and continuously improve guidelines and training materials to keep participants, reviewers, and internal teams aligned on evolving quality standards
  • Investigate quality and integrity issues end to end, running root-cause analysis across guidelines, UX, screening, training, and operations, and driving corrective and preventive actions (CAPAs)
  • Build dashboards and operating cadences to track defect rates, rework, throughput versus quality trade-offs, integrity events, and SLA adherence
  • Lead calibration sessions and coach QA leads and reviewers to improve decision consistency, rubric application, and overall quality judgement
  • Translate one-off quality fixes into repeatable, scalable playbooks across customers, programs, and study types
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
  • equity
  • opportunity to earn a cash variable element, such as a bonus or commission
  • Fulltime
Read More
Arrow Right