This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms. Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale. We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners. Be part of shaping a future where humans and machines truly understand each other.
Job Responsibility:
Research and develop audio-visual generation models for conversational agents (e.g. Neural Avatars, Talking-Heads)
Focus on models that are tightly coupled with conversation flow, ensuring verbal and non-verbal signals work seamlessly together
Experiment with diffusion models (DDPMs, LDMs, etc.), long-video generation, and audio generation
Collaborate with the Applied ML team to bring your research into real-world production
Stay ahead of the latest advancements in multimodal generation — and help shape the next wave
Requirements:
A PhD (or near completion) in a relevant field, or equivalent hands-on research experience
Experience applying image/video generation models in practice
Strong foundations in generative modeling and rapid prototyping
Deep familiarity with diffusion models, including recent advances in efficiency
Good understanding of video-language models and multimodal generation
Proficiency in PyTorch and GPU-based inference
Nice to have:
Experience with long-video or audio generation
Skills in 3D graphics, Gaussian splatting, or large-scale training setups
Broader exposure to generative models and rendering
Familiarity with software engineering best practices
Publications in top-tier or respected venues (CVPR, NeurIPS, BMVC, ICASSP, etc.)