This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re looking for an AI Researcher to join our core AI team and help push the frontier of multimodal conversational intelligence. If you thrive in fast-paced environments, love turning abstract ideas into running code, and get energy from exploring the edge of what’s possible then this is the perfect role for you.
Job Responsibility:
Conduct research on Foundational Multimodal Models in the context of Conversational Avatars (e.g., Neural Avatars, Talking-Heads)
Model video, audio, and language sequences using Autoregressive, Predictive Architectures (e.g., V-JEPA), and/or Diffusion paradigms with an emphasis on temporal and sequential data rather than static images
Collaborate with the Applied ML team to bring your work to life in production systems
Stay at the cutting edge of multimodal learning and help us define what “cutting edge” means next
Requirements:
A PhD (or near completion) in a relevant field, or equivalent hands-on research experience
Experience modeling human behavior and generation (facial expressions, affect, or speech). Ideally in conversational or interactive settings
Deep understanding of sequence modeling in video/audio/language domains
Familiarity with large model training, especially LLMs or VLMs
Strong background in Deep Learning (from Transformers to Diffusion Models) and how to make them work in practice
Excellent programming skills, especially in PyTorch
Nice to have:
Publications in top-tier conferences like CVPR, ICCV, NeurIPS, ECCV, or ACMMM
Broader understanding of generative AI and multimodal architectures
Familiarity with software engineering best practices
Curiosity and a flexible mindset — you like building and experimenting