This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. Microsoft Research and Copilot Studio team are seeking Research Interns to help advance the quality, reliability and evaluation of Large Language Model (LLM)-based systems. Research Interns will collaborate with applied scientists and engineers to explore new machine learning methods that improve how Artificial Intelligence (AI) systems assess and align with human expectations.
Job Responsibility:
Co-developing a research project in collaboration with the supervisor and research mentors
Designing and implementing machine learning approaches, including training and fine-tuning using real-world datasets
Developing evaluation frameworks and benchmarking methods to assess model quality, robustness, and generalization
Presentation and communication of research findings
Requirements:
Currently enrolled in a PhD program in Statistics, Computer Science, Physics, Operations Research, or a related technical field
At least 1 year of hands-on experience working on LLM-related projects (e.g., prompt engineering, building and evaluating LLM-based systems, rewards modeling etc.)
At least 1 year of experience coding in Python
Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship
Submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples
Submit a list of projects you worked on in the last 2 years with the following information: Start and end date for the project, Brief overview of what the project is about, What you did on the project, What technologies you used for the project
Nice to have:
Prior experience in reward models for large language models or LLM-as-a-Judge
Strong experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with software engineering best practices (e.g. git)
Experience with LLM post-training and evaluation or LLM-based judge systems
Research experience demonstrated through publications or projects
Ability to work independently in ambiguous or rapidly evolving situations and collaborate effectively across disciplines