This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. The Interactive Multimodal Futures (IMF) group at Microsoft Research seeks a PhD-level Research Intern to work on a project at the intersection of situated interaction, affective computing, and human-centered AI systems. The project will include elements of multimodal sensing (physiology, speech, gaze, gestures, olfaction/gas, etc.), signal processing, and real-time interaction.
Job Responsibility:
Design and implement research prototypes for real-time situated and adaptive interaction
Explore the use of the latest generative AI techniques related to interpreting multimodal interaction, conversation, and behavioral signals
Conduct user studies and analyze multimodal data
Contribute to publications and share findings with the research community
Requirements:
Currently enrolled in a PhD or equivalent program in HCI, HRI, Computer Science, Cognitive Science, Robotics, Electrical Engineering, Psychology, or related STEM field
At least 2 years of research experience using human-centered approaches in HCI, HRI, ML, CV, or Affective Computing
Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship
You’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples
Nice to have:
Experience writing peer-reviewed publications
Experience with generative AI techniques, ML frameworks (e.g., PyTorch), and real-time interactive systems
Strong collaboration and communication skills
Conducting human-subjects research
Experience implementing research prototypes (frontend, backend, or both)
Using human-centered design & research methods
Familiarity with reinforcement learning or time-series signal processing
Working with large datasets (e.g., text, vision, physiology, behavioral)
Background in affective computing, behavioral, or physiological sensing
Experience collecting data with wearable devices
Hardware prototyping or wearable device experience
Healthcare/wellbeing experience - e.g., pain assessment and modeling (nociceptive signals, stress/anxiety detection), or collaboration with clinical partners
Olfaction & gas sensing - e.g., experience with electronic noses, breath analysis, or volatile organic compound sensing
Demonstrated experience in programming multimodal systems that interact with real human users, e.g., robots or virtual agents, particularly by integrating multiple machine-learned components such as computer vision, speech recognition, dialogue handling, natural language generation, etc.
Demonstrated experience in conducting research outside of a controlled lab environment, e.g., field research, ethnography, in-the-wild studies, etc.
Experience prototyping real-time conversational AI agents using tools such as the OpenAI Realtime API, including function calling, and supporting interactions via text, voice, and other interfaces
Designed, implemented, and evaluated different personality styles for AI agents, varying factors such as communication style, voice characteristics, and emotional tone to study their impact on user experience, engagement, and trust
Incorporated multimodal sensory inputs (e.g., text, audio, contextual signals) to enhance interaction quality and make agent responses more adaptive and context aware
Leveraging the latest in AI techniques to perform sensing in the real world