This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a rare and foundational opportunity to define the future of creative AI. You will be at the forefront of building and training large-scale multimodal generative models, directly impacting how users create and interact with video and audio. This role offers the chance to bridge cutting-edge research with magical, shipped products, working end-to-end on novel problems with no existing playbook.
Job Responsibility:
Architect large-scale video and audio generative models, focusing on strong temporal coherence and high perceptual quality
Design, implement, and run robust data pipelines for curating, filtering, and captioning massive video and audio datasets
Train large-scale video and audio generative models on massive datasets and GPU clusters
Define and build novel evaluation frameworks to measure realism, temporal consistency, controllability, and human-aligned creative quality
Requirements:
Strong foundation in machine learning and generative modeling, with experience in video, audio, or multimodal domains
Deep understanding of autoregressive, diffusion/flow-based, or hybrid generative models, and their tradeoffs for long-horizon generation
Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets)
Nice to have:
Experience in the following around data, modeling, or evaluation: Text-to-video/audio models