CrawlJobs Logo

Research Scientist / Engineer – Multimodal Capabilities

United States, Palo Alto 187500.00 - 395000.00 USD / Year · Job Posted January 13, 2026
Apply Position
Job Link Share

Job Description

This is a high-impact opportunity to define the future of what our models can do. As a first-principles researcher, you will tackle the most ambitious questions at the heart of our mission: how can the fusion of vision, audio, and language unlock entirely new, magical behaviors in Al? You will not just be improving existing systems, you will be charting the course for the next generation of model capabilities, designing the core experiments that will shape the future of our technology and products.

Job Responsibility

  • Research and Define the next frontier of multimodal capabilities, identifying key gaps in our current models and designing the experiments to solve them
  • Design and Execute novel experiments, datasets, and methodologies to systematically improve model performance across vision, audio, and language
  • Develop and Pioneer new evaluation frameworks and benchmarking approaches to precisely measure novel multimodal behaviors and capabilities
  • Collaborate Deeply with other research teams to translate your findings into our core training recipes and unlock new product experiences
  • Build and Prototype compelling demonstrations that showcase the groundbreaking multimodal capabilities you have unlocked

Requirements

  • PhD or equivalent research experience in a field related to AI, Machine Learning, or Computer Science
  • Strong programming skills in Python and deep, hands-on experience with PyTorch
  • Proven track record of working with multimodal data pipelines and curating large-scale datasets for research
  • Deep, fundamental understanding of at least one of the core modalities: computer vision, audio processing, or natural language processing
  • Thrive on tackling the most ambitious, open-ended research challenges in a fast-paced, collaborative environment

Nice to have

  • Direct expertise working with complex, interleaved multimodal data (video, audio, text)
  • Hands-on experience training or fine-tuning Vision Language Models (VLMs), Audio Language Models, or large-scale generative video models from scratch
  • A strong publication record in top-tier AI conferences (e.g., NeurIPS, ICML, CVPR, ICLR)
  • Experience leading ambitious, open-ended research projects from ideation to tangible results

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Scientist / Engineer – Multimodal Capabilities

8 matching positions

Machine Learning Research Scientist / Research Engineer, Post-Training

Scale works with the industry’s leading AI labs to provide high quality data and...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
252000.00 - 315000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field
  • Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning
  • Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning
  • Excellent written and verbal communication skills
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
  • Previous experience in a customer facing role
Job Responsibility
Job Responsibility
  • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities
  • Design and experiment new approaches to preference optimization
  • Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness
  • Publish research findings in top-tier AI conferences
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • commuter stipend
  • Fulltime
Read More
Arrow Right

Research Scientist / Engineer – Realtime Interactive

At Luma, the Realtime Interactive team is responsible for building an entirely n...
Location
Location
United States , Palo Alto
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with fine-tuning large-scale generative models
  • Proficiency in PyTorch and distributed training frameworks
  • (Preferred) Strong background in methods for optimizing model inference (distillation, quantization, sparsity, compression, etc.)
  • (Preferred) Experience in gathering, processing, and annotating datasets
Job Responsibility
Job Responsibility
  • Work on top of pretrained multimodal generative models to fine-tune and optimize them for realtime generation
  • Design novel algorithms and techniques to solve problems with autoregressive visual generation, long-range temporal consistency, and long-term memory
  • Develop interactive applications with tight latency constraints
  • Process data to develop advanced interactive capabilities and controls for World Modeling, such as controlling character and camera movement, audio, and more
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer - Fraud (Research Scientist)

The Data team within Plaid’s Fraud organization builds the machine learning syst...
Location
Location
United States , San Francisco
Salary
Salary:
225600.00 - 337200.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD strongly preferred
  • we will consider equivalent research experience with a strong publication/innovation track record
  • 3+ years of experience as a Machine Learning Engineer or Research Scientist
  • Strong scientific rigor and communication
  • Strong Python skills + ability to build high-quality research prototypes
Job Responsibility
Job Responsibility
  • Build next-generation fraud detection capabilities by researching and prototyping state-of-the-art methods across graph ML, sequential modeling, and multimodal learning
  • Owning a research roadmap that ships: moving from papers/prototypes to measurable product impact
  • Publishing applied research and collaborating with a high-caliber team across Data, Product, and Engineering
  • Working with one of the largest financial datasets to generate insights that help hundreds of millions of consumers achieve greater financial freedom
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer - Fraud (Research Scientist)

The Data team within Plaid’s Fraud organization builds the machine learning syst...
Location
Location
United States
Salary
Salary:
225600.00 - 337200.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD strongly preferred
  • we will consider equivalent research experience with a strong publication/innovation track record
  • 3+ years of experience as a Machine Learning Engineer or Research Scientist
  • Strong scientific rigor and communication
  • Strong Python skills + ability to build high-quality research prototypes
Job Responsibility
Job Responsibility
  • Build next-generation fraud detection capabilities by researching and prototyping state-of-the-art methods across graph ML, sequential modeling, and multimodal learning
  • Owning a research roadmap that ships: moving from papers/prototypes to measurable product impact
  • Publishing applied research and collaborating with a high-caliber team across Data, Product, and Engineering
  • Working with one of the largest financial datasets to generate insights that help hundreds of millions of consumers achieve greater financial freedom
  • Fulltime
Read More
Arrow Right

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

The Monetization GenAI Video Gen & Visual Search group, part of the Ads pillar, ...
Location
Location
United States , Menlo Park, CA
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Has obtained a PhD in Computer Science, AI/ML, or a relevant technical field
  • Experience as a technical lead, driving major technical initiatives with cross-functional impact and influencing strategy across multiple teams
  • 4+ years of experience training large language and/or vision models, with extensive and recent experience training multimodal LLMs
  • Research expertise in video generation/understanding, multimodal learning, or diffusion models
  • Demonstrated significant industry influence in the field of AI and/or recently published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV)
Job Responsibility
Job Responsibility
  • Lead end-to-end AI research and model development for video-centric generative AI across Meta's advertising surfaces
  • Drive advancements in video generation & enhancement
  • Develop video-to-video & audio generation capabilities
  • Advance video & visual understanding through novel research
  • Conduct foundation model research to support generative AI innovation
  • Define research agendas and pioneer new directions in video/audio generation and multimodal understanding
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Real-Time Multimodal AI

Reality Labs is building the future of connection through world-class AR/VR hard...
Location
Location
United States , Burlingame
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, Electrical Engineering, or a related field
  • 2+ years of research experience in one or more of the following areas: multimodal learning, vision-language models, large language models, or foundation model fine-tuning
  • Hands-on experience fine-tuning large foundation models (e.g., LLaVA, InternVL, Qwen-VL, LLaMA, or similar)
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch
  • Excellent communication skills and ability to work independently
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Research and develop novel approaches for fine-tuning large multimodal foundation models (vision-language, audio-visual) for real-time applications
  • Design and implement efficient inference pipelines for deploying fine-tuned models in real-time communication scenarios
  • Explore agentic architectures that leverage fine-tuned models as tools within larger AI systems
  • Collaborate with cross-functional teams to integrate models into prototype experiences
  • Document and present research progress with the goal of publishing findings at top-tier ML/CV conferences
  • Contribute to building working prototypes that demonstrate the capabilities of fine-tuned multimodal models
Read More
Arrow Right

Research Scientist Intern, Audio Quality with AI (PhD)

The Meta Reality Labs Research Team brings together a world-class team of resear...
Location
Location
United States , Redmond
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in the field of Speech and Hearing Science, Auditory Neuroscience, Computational Neuroscience, Computer Science, Artificial Intelligence, Generative AI, Transformer Models, Machine Learning, Signal Processing or Computer vision
  • 3+ years experience with Python, Matlab, or similar
  • 3+ years experience with machine learning software platforms such as PyTorch, TensorFlow, etc
  • Background in speech perception, psychoacoustics, or acoustic phonetics
  • Experience deploying novel audio computational models and LLMs
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Investigate systematic phonemic errors as causal factors in perceived speech quality degradation, and link them to human perceptual judgments
  • Build and curate datasets and benchmarks of speech for phoneme-level analysis
  • Explore and compare the capabilities of audio and video (multimodal) LLMs as tools to support this analysis
  • Relate findings to human perceptual data (quality preference and intelligibility) and translate them into actionable insights for research and engineering teams
  • Where appropriate, adapt multimodal models to the task in a supporting capacity
  • Collaborate with researchers, engineers, and cross-functional partners to define goals, communicate findings, and drive improvements in speech quality
  • Develop tools and infrastructure to streamline and scale the analysis
  • Stay current with research in speech perception and audio quality and intelligibility assessment, and incorporate best practices into Meta's workflows
  • Disseminate results through internal reports and presentations, and, when appropriate, external publications
What we offer
What we offer
  • benefits
  • Fulltime
Read More
Arrow Right

AI Research Engineer - Social Products (Technical Leadership)

We're hiring Research Engineers to join teams across Meta working at the interse...
Location
Location
United States , Bellevue
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience with large scale model training, implementing algorithms, and evaluating speech-based systems
  • 5+ YOE as an Applied AI Research Scientist or Applied AI Research Engineer
Job Responsibility
Job Responsibility
  • Contribute to the training of next-generation multimodal foundation models, advance their capabilities in understanding, generation, and grounding, and enable them for downstream product use-cases
  • Support creative data sourcing, high-quality pre/mid/post-training data curation, and scale and optimize data pipelines for multimodal large language models (LLMs)
  • Lead, collaborate, and execute on research that pushes forward the state of the art in multimodal reasoning and generation research, and prioritize research that can be directly applied to Meta's product development
What we offer
What we offer
  • bonus
  • equity
  • Fulltime
Read More
Arrow Right