CrawlJobs Logo

AI Researcher (Multimodal Perception Models)

United States, San Francisco · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

We’re looking for an AI Researcher to join our core AI team and help push the frontier of multimodal conversational intelligence. If you thrive in fast-paced environments, love turning abstract ideas into running code, and get energy from exploring the edge of what’s possible then this is the perfect role for you.

Job Responsibility

  • Conduct research on Foundational Multimodal Models in the context of Conversational Avatars (e.g., Neural Avatars, Talking-Heads)
  • Model video, audio, and language sequences using Autoregressive, Predictive Architectures (e.g., V-JEPA), and/or Diffusion paradigms with an emphasis on temporal and sequential data rather than static images
  • Collaborate with the Applied ML team to bring your work to life in production systems
  • Stay at the cutting edge of multimodal learning and help us define what “cutting edge” means next

Requirements

  • A PhD (or near completion) in a relevant field, or equivalent hands-on research experience
  • Experience modeling human behavior and generation (facial expressions, affect, or speech). Ideally in conversational or interactive settings
  • Deep understanding of sequence modeling in video/audio/language domains
  • Familiarity with large model training, especially LLMs or VLMs
  • Strong background in Deep Learning (from Transformers to Diffusion Models) and how to make them work in practice
  • Excellent programming skills, especially in PyTorch

Nice to have

  • Publications in top-tier conferences like CVPR, ICCV, NeurIPS, ECCV, or ACMMM
  • Broader understanding of generative AI and multimodal architectures
  • Familiarity with software engineering best practices
  • Curiosity and a flexible mindset — you like building and experimenting

What we offer

  • flexible work schedule
  • unlimited PTO
  • competitive healthcare
  • gear stipends

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI Researcher (Multimodal Perception Models)

8 matching positions

Senior AI Researcher - AV

GM Israel (Herzliya) takes a significant part in introducing sophisticated softw...
Location
Location
Israel , Herzliya
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Electrical Engineering, Robotics, or a related field (Excellent M.Sc. graduates will be considered)
  • Over 3 years of research experience in computer vision, machine learning, autonomous perception, or related areas
  • Strong publication record at top-tier AI/ML conferences and journals
  • Excellent coding skills and familiarity with modern AI frameworks
  • Hands-on experience with large-scale training, 3D data, multimodal perception, or foundation models is highly desirable
Job Responsibility
Job Responsibility
  • Drive downstream KPI lift for the autonomous driving agent
  • Participate in AI research projects in the areas of VLMs / world modeling, computer vision, 3D perception, multimodal sensor fusion, and others
  • Design, build, train, and evaluate foundation models and large-scale deep learning architectures designed for autonomous driving
  • Collaborate with engineering teams to translate state-of-the-art research into scalable production solutions
  • Work towards external publications in top-tier conferences / journals
  • Track emerging trends in your field
  • Incubate cutting-edge technologies aimed at impacting our L3 autonomous driving technology
  • Build and maintain collaborations with top universities, research labs, and industry experts
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Multimodal Generative AI and Robotics

The research intern will work on cutting edge research problems to innovate nove...
Location
Location
United States , Redmond
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining a PhD degree in the domain of computer-vision, computer graphics, 3D machine perception or deep learning
  • Knowledge in deep learning, computer vision, graphics, generative modeling, LLMs and VLMs
  • Hands-on experience with implementing deep learning algorithms, large-scale training, benchmark and evaluation
  • Experience working within Python environments such as pytorch
  • Experience working in a Unix environment
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Plan and execute cutting-edge research and development to advance the state-of-the-art in machine learning and large-scale training
  • Collaborate with other researchers and engineers across machine perception teams at Meta to develop experiments, prototypes, and concepts that advance the state-of-the-art contextual AI and robotic systems
  • Work with the team to help design, setup, and run practical experiments and prototype systems related to large-scale high-quality sensing and machine reasoning
  • Fulltime
Read More
Arrow Right

Research Scientist – World Models, Robotics & Embodied AI

Meta Reality Labs Research (RL Research) brings together a team of researchers a...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has or is in the process of obtaining a PhD in the field of Computer Vision, Robotics, AI, Computer Science, a related field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Research experience involving 3D Computer Vision, Deep Learning, or Robotics—specifically related to multimodal or 3D generative modeling, predictive world models, scene understanding, or learning-based robotic control
  • Experience with real-world system building and data collection, including design, coding, and evaluation with modern ML methods
  • Experience communicating research for public audiences of peers
  • Experience with deep learning frameworks (e.g., Pytorch, Tensorflow) and Python
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Drive fundamental and applied research at the intersection of multi-modal generative AI, predictive world modeling, embodied reasoning, and robotic manipulation
  • Investigate or invent architectures that deliver a spectrum of embodied behaviors from simulated environments to real robots, and from tactile-driven motor control to high-level, long-horizon intelligence
  • Design research methodologies and lead empirical evaluations, authoring well-tested code for physical hardware and simulators
  • Build prototype systems that drive multi-step, long-horizon robotic perception, reasoning, and action
  • Contribute to and lead high-impact publications and open-sourcing efforts
  • Identify long-term research goals while executing intermediate milestones
  • Collaborate with a wide-ranging set of scientists and engineers across teams
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, AI Research - World Models

Meta is seeking Research Interns to join the SAM team in the Multimedia Percepti...
Location
Location
United States , Menlo Park
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in Computer Vision, Machine Learning, Artificial Intelligence, or relevant technical field
  • Research and/or work experience in Generative Modeling and Computer Vision. In particular: video generation, 3D/4D reconstruction, video and image understanding, vision-language foundation models, representation learning, and related areas
  • Research and/or work experience in Machine Learning or Deep Learning with applications to perception
  • Experience in Python, C++, or other related languages
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Perform research to advance the science and technology of generative AI
  • Perform research that enables learning to predict and condition on multimodal data (video, 3D structures, primarily images, text, and other modalities like audio)
  • Brainstorm with research mentors, review literature and existing solutions of a challenging real-world research problem
  • Develop novel solutions, implement prototypes, and perform extensive experiments to test the proposed solutions in meaningful benchmarks and metrics, analyze the results and verify the conclusions
  • Contribute to ongoing research projects and impactful technology releases
  • Draft and polish research publications
  • Present research outcomes to internal and/or external audiences
Read More
Arrow Right

Research Scientist Intern, AI Research - CoreML - World Models

Meta is seeking Research Interns to join the SAM team in the Multimedia Percepti...
Location
Location
United States , Menlo Park
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in Computer Vision, Machine Learning, Artificial Intelligence, or relevant technical field
  • Research and/or work experience in Generative Modeling and Computer Vision. In particular: video generation, 3D/4D reconstruction, video and image understanding, vision-language foundation models, representation learning, and related areas
  • Research and/or work experience in Machine Learning or Deep Learning with applications to perception
  • Experience in Python, C++, or other related languages
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Perform research to advance the science and technology of generative AI
  • Perform research that enables learning to predict and condition on multimodal data (video, 3D structures, primarily images, text, and other modalities like audio)
  • Brainstorm with research mentors, review literature and existing solutions of a challenging real-world research problem
  • Develop novel solutions, implement prototypes, and perform extensive experiments to test the proposed solutions in meaningful benchmarks and metrics, analyze the results and verify the conclusions
  • Contribute to ongoing research projects and impactful technology releases
  • Draft and polish research publications
  • Present research outcomes to internal and/or external audiences
Read More
Arrow Right

Research Scientist Intern, Audio Quality with AI (PhD)

The Meta Reality Labs Research Team brings together a world-class team of resear...
Location
Location
United States , Redmond
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in the field of Speech and Hearing Science, Auditory Neuroscience, Computational Neuroscience, Computer Science, Artificial Intelligence, Generative AI, Transformer Models, Machine Learning, Signal Processing or Computer vision
  • 3+ years experience with Python, Matlab, or similar
  • 3+ years experience with machine learning software platforms such as PyTorch, TensorFlow, etc
  • Background in speech perception, psychoacoustics, or acoustic phonetics
  • Experience deploying novel audio computational models and LLMs
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Investigate systematic phonemic errors as causal factors in perceived speech quality degradation, and link them to human perceptual judgments
  • Build and curate datasets and benchmarks of speech for phoneme-level analysis
  • Explore and compare the capabilities of audio and video (multimodal) LLMs as tools to support this analysis
  • Relate findings to human perceptual data (quality preference and intelligibility) and translate them into actionable insights for research and engineering teams
  • Where appropriate, adapt multimodal models to the task in a supporting capacity
  • Collaborate with researchers, engineers, and cross-functional partners to define goals, communicate findings, and drive improvements in speech quality
  • Develop tools and infrastructure to streamline and scale the analysis
  • Stay current with research in speech perception and audio quality and intelligibility assessment, and incorporate best practices into Meta's workflows
  • Disseminate results through internal reports and presentations, and, when appropriate, external publications
What we offer
What we offer
  • benefits
  • Fulltime
Read More
Arrow Right

Senior AI/ML Engineer

At General Motors, our product teams are redefining mobility. Through a human-ce...
Location
Location
United States , Mountain View
Salary
Salary:
170600.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. in Machine Learning, Robotics, Computer Science, Electrical Engineering, or a related technical field
  • 2+ years of experience in AI/ML research and applied development
  • Expertise in modern ML architectures (transformers, generative AI, multimodal systems)
  • Strong programming skills in Python
  • Strong communication, collaboration, and mentoring abilities
Job Responsibility
Job Responsibility
  • Research, design, and implement advanced Vision-Language Models and Vision-Language-Action models to enhance the autonomous vehicle's semantic understanding and decision-making capabilities
  • Develop and execute techniques for onboard model optimization, including quantization, distillation, and architecture search, to ensure large foundational models run efficiently on vehicle edge hardware
  • Partner closely with downstream engineering teams (perception, planning, and control) to integrate foundational VLM/VLA outputs into the active vehicle software stack
  • Contribute to the technical roadmap by identifying high-impact research areas and translating strategic machine learning priorities into concrete, actionable prototypes
  • Provide technical mentorship to junior researchers and engineers, fostering a culture of excellence and collaborative innovation
  • Secure intellectual property through patents and represent the company externally by publishing peer-reviewed research at top-tier machine learning conferences
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Staff Research Scientist - VLM / VLA

At General Motors, our product teams are redefining mobility. Through a human-ce...
Location
Location
United States , Mountain View
Salary
Salary:
218800.00 - 335300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. in Machine Learning, Robotics, Computer Science, Electrical Engineering, or a related technical field
  • 5+ years of experience in AI/ML research and applied development
  • Deep expertise in modern ML architectures (transformers, generative AI, multimodal systems)
  • Strong programming skills in Python
  • Excellent communication, collaboration, and mentoring abilities, comfortable influencing technical strategy and guiding ML excellence across the organization
Job Responsibility
Job Responsibility
  • Research, design, and prototype advanced Vision-Language Models and Vision-Language-Action foundational models tailored for real-time semantic understanding and behavioral prediction in autonomous driving
  • Drive the technical strategy for onboard model optimization, leading initiatives in model quantization, pruning, knowledge distillation, and compilation to ensure high-parameter models execute with ultra-low latency on vehicle edge hardware
  • Advance multimodal alignment techniques, ensuring seamless integration of camera, radar, LiDAR, and textual/logical prompts into unified foundational architectures
  • Influence technical roadmaps and shape strategic machine learning priorities that align with safety requirements, core product milestones, and next-generation vehicle launches
  • Provide technical mentorship and long-term vision to a multidisciplinary group of machine learning engineers, software developers, and hardware specialists
  • Foster internal innovation by collaborating closely with perception, planning, and infrastructure teams to integrate foundational models into the core autonomous software stack
  • Represent the company externally to the global scientific community by publishing original research, securing patents, and presenting at top-tier artificial intelligence and robotics conferences
What we offer
What we offer
  • Medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right