Research Scientist / Engineer – Multimodal Capabilities Job at Luma AI (Palo Alto)

Job Description

This is a high-impact opportunity to define the future of what our models can do. As a first-principles researcher, you will tackle the most ambitious questions at the heart of our mission: how can the fusion of vision, audio, and language unlock entirely new, magical behaviors in Al? You will not just be improving existing systems, you will be charting the course for the next generation of model capabilities, designing the core experiments that will shape the future of our technology and products.

Job Responsibility

Research and Define the next frontier of multimodal capabilities, identifying key gaps in our current models and designing the experiments to solve them
Design and Execute novel experiments, datasets, and methodologies to systematically improve model performance across vision, audio, and language
Develop and Pioneer new evaluation frameworks and benchmarking approaches to precisely measure novel multimodal behaviors and capabilities
Collaborate Deeply with other research teams to translate your findings into our core training recipes and unlock new product experiences
Build and Prototype compelling demonstrations that showcase the groundbreaking multimodal capabilities you have unlocked

Requirements

PhD or equivalent research experience in a field related to AI, Machine Learning, or Computer Science
Strong programming skills in Python and deep, hands-on experience with PyTorch
Proven track record of working with multimodal data pipelines and curating large-scale datasets for research
Deep, fundamental understanding of at least one of the core modalities: computer vision, audio processing, or natural language processing
Thrive on tackling the most ambitious, open-ended research challenges in a fast-paced, collaborative environment

Nice to have

Direct expertise working with complex, interleaved multimodal data (video, audio, text)
Hands-on experience training or fine-tuning Vision Language Models (VLMs), Audio Language Models, or large-scale generative video models from scratch
A strong publication record in top-tier AI conferences (e.g., NeurIPS, ICML, CVPR, ICLR)
Experience leading ambitious, open-ended research projects from ideation to tangible results

Luma AI - All Job Offers

Select Country

Research Scientist / Engineer – Multimodal Capabilities

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Research Scientist / Engineer – Multimodal Capabilities

Machine Learning Research Scientist / Research Engineer, Post-Training

Research Scientist / Engineer – Realtime Interactive

Senior Machine Learning Engineer - Fraud (Research Scientist)

Senior Machine Learning Engineer - Fraud (Research Scientist)

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

Research Scientist Intern, Real-Time Multimodal AI

Research Scientist Intern, Audio Quality with AI (PhD)

AI Research Engineer - Social Products (Technical Leadership)

Our AI answers in your language