Research Scientist: Pretraining Job at Generalist AI (San Mateo)

Sr. Applied Research Scientist

We are building AI to simulate the world through merging art and science. We bel...

Location

United States

Salary:

280000.00 - 380000.00 USD / Year

Runway

Expiration Date

Until further notice

Requirements

4+ years of relevant ML engineering or research experience
Very strong programming skills and ability to write clean and maintainable research code
Deep interest in building human-in-the-loop systems for creativity
Passion for seeing research through from initial conception to eventual application
Experience mentoring and teaching other researchers
Strong communication, collaboration, and documentation skills

Job Responsibility

Lead efforts in pretraining the next generation of Runway’s multimodal models

Fulltime

Research Scientist Intern, AI Research - Multimodal Pretraining

Meta is seeking Research Scientist Interns in the multimodal pretraining team in...

Location

United States , Menlo Park

Salary:

7650.00 - 12134.00 USD / Month

New

Junior Research Infrastructure Engineer

We are seeking a Product-Minded Junior Research Infrastructure Engineer to join ...

Location

United States , Sunnyvale

Salary:

Not provided

Meshy LLC

Expiration Date

Until further notice

Requirements

2+ years of experience in software engineering, backend development, or distributed systems
Strong programming skills in Python (plus Scala/Java/C++ a plus)
Familiarity with distributed frameworks (Spark, Dask, Ray) and cloud platforms (AWS/GCP/Azure)
Experience with workflow orchestration tools (Temporal, Celery, or Airflow)
Proficiency with Infrastructure as Code (Terraform) and CI/CD tools (GitHub Actions)
Experience building web applications or internal tools using React or Next.js
A 'product-first' mindset: an interest in how users interact with infrastructure and a desire to build clean, functional interfaces

Job Responsibility

Participate in the design and implementation of distributed task orchestration systems using Temporal or Celery
Architect pipelines across cloud object storage (S3, GCS), data lakes, and metadata catalogs
Implement partitioning, sharding, and caching strategies to ensure data processing pipelines are resilient, highly available, and consistent
Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data (images, 3D/2D assets, binaries)
Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics
Support preprocessing of unstructured assets (e.g., images, 3D/2D models, video) for training pipelines, including format conversion, normalization, augmentation, and metadata extraction
Implement validation and quality checks to ensure datasets meet ML training requirements
Collaborate with ML researchers to quickly adapt pipelines to evolving pretraining and evaluation needs
Use infrastructure-as-code (Terraform, Kubernetes, etc.) to manage scalable and reproducible environments
Manage data assets using Databricks Asset Bundles (DABs) and build rigorous CI/CD pipelines (GitHub Actions)

What we offer

Competitive salary, equity, and benefits package
Opportunity to work with a talented and passionate team at the forefront of AI and 3D technology
Flexible work environment, with options for remote and on-site work
Opportunities for fast professional growth and development
An inclusive culture that values creativity, innovation, and collaboration
Unlimited, flexible time off
Stock options available for core team members
401(k) plan for employees
Comprehensive health, dental, and vision insurance
The latest and best office equipment

Fulltime

Tech Lead - Pretraining Team, Wayve Foundation Model

This is a rare opportunity to lead foundational work at the intersection of larg...

Location

United States , Sunnyvale

Salary:

Not provided

Wayve

Expiration Date

Until further notice

Requirements

Leadership in data-centric AI: Experience leading research or engineering teams focused on dataset curation, filtering, or enrichment at scale, particularly for large-scale model pretraining.
Contributions to data benchmarks or tools: Involvement in projects like DataComp, LAION, DINO, MOLMO, or equivalent initiatives that define or evaluate pretraining dataset quality.
Deep understanding of distributed data processing: Strong working knowledge of frameworks such as Ray, Spark, Dask, or equivalent, and designing scalable, fault-tolerant data pipelines.
Hands-on deep learning expertise: Strong proficiency in PyTorch and a solid grasp of how data quality, distribution, and structure impact training dynamics and model generalisation.
Experimental mindset: Demonstrated ability to run and interpret data-centric experiments (e.g., small-scale trials, ablations) to inform large-scale model training.
Collaboration with research: Experience working closely with ML researchers and contributing to experimental design, pretraining strategies, or evaluation design.
Minimum 5 years of relevant industry experience: Including at least several years in data-heavy, model-driven environments involving deep learning at scale.

Job Responsibility

Lead data curation, enrichment, and filtering efforts for large-scale pretraining of embodied models
Build and manage distributed data processing and ingestion pipelines across modalities
Partner with research teams to run data-centric experiments and influence model training strategy
Identify, integrate, and leverage third-party datasets to enhance pretraining and evaluation
Manage and mentor a team of engineers and data scientists to deliver scientific and technical impact

What we offer

Attractive compensation with salary and equity
Immersion in a team of world-class researchers, engineers and entrepreneurs
A unique position to shape the future of autonomy and tackle the biggest challenge of our time
Bespoke learning and development opportunities
Relocation support with visa sponsorship
Flexible working hours - we trust you to do your job well, at times that suit you and your time
Benefits such as an onsite chef, workplace nursery scheme, private health insurance, therapy, daily yoga, onsite bar, large social budgets, unlimited L&D requests, enhanced parental leave, and more!

Fulltime

Foundational AI Research Scientist - FAIR

Meta is seeking Research Scientists to join its Fundamental AI Research (FAIR) o...

Location

United States , Bellevue

Salary:

154000.00 - 217000.00 USD / Year

Research Scientist Intern, Embodied Foundation Models

Our team is seeking a talented Applied Scientist Intern to join us for 3-6 month...

Location

United States , Sunnyvale

Salary:

Not provided

Wayve

Expiration Date

Until further notice

Requirements

Currently pursuing a graduate degree in Computer Science, Machine Learning, Robotics, or related technical field
Proficient in at least one backend/systems programming language (e.g. Python, Ruby, Java, etc)
Previous experience in vision-language models, large language models, natural language processing, especially around reasoning
Solid software engineering fundamentals, especially in Python
Previously used PyTorch or a similar library for deep learning (e.g. Tensorflow, JAX)
Experience with multi-node distributed training of large models
Interested in using large-scale multimodal (vision, language, etc.) datasets to improve embodied AI
Previous publications in conferences (e.g., CVPR, ICCV, CoRL, NeurIPS, CoLM, RSS, ICRA, among others)

Job Responsibility

Work on foundation models for embodied AI, including large-scale pretraining, post-training, leveraging language, or improving reasoning capabilities
Train models on large-scale multimodal (vision, language, etc.) data efficiently in a multi-node distributed system, and evaluate their performance on open (and closed) datasets/benchmarks
Lead a high-impact research work and publish at a top tier conference

Research Scientist Intern, Embodied Foundation Models (Evaluation)

Our team is seeking a talented Applied Scientist Intern to join us for 3-6 month...

Location

United States , Sunnyvale

Salary:

Not provided

Wayve

Expiration Date

Until further notice

Requirements

You are currently pursuing a graduate degree in a Computer Science, Machine Learning, Robotics, or related technical field
You are proficient in at least one backend/systems programming language (e.g. Python, Ruby, Java, etc)
You have previous experience in vision-language models, large language models, natural language processing, especially around reasoning
You have solid software engineering fundamentals, especially in Python
You have previously used PyTorch or a similar library for deep learning (e.g. Tensorflow, JAX)
Experience with multi-node distributed training of large models
You are interested in using large-scale multimodal (vision, language, etc.) datasets to improve embodied AI
You have previous publications in the following conferences (e.g., CVPR, ICCV, CoRL, NeurIPS, CoLM, RSS, ICRA, among others)

Job Responsibility

Work on foundation models for embodied AI, including large-scale pretraining, post-training, leveraging language, or improving reasoning capabilities
Train models on large-scale multimodal (vision, language, etc.) data efficiently in a multi-node distributed system, and evaluate their performance on open (and closed) datasets/benchmarks
Lead a high-impact research work and publish at a top tier conference (e.g., CVPR, ICCV, CoRL, NeurIPS, CoLM, RSS, ICRA, among others)

Research Scientist / Engineer – Realtime Interactive

At Luma, the Realtime Interactive team is responsible for building an entirely n...

Location

United States , Palo Alto

Salary:

187500.00 - 395000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Requirements

Experience with fine-tuning large-scale generative models
Proficiency in PyTorch and distributed training frameworks
(Preferred) Strong background in methods for optimizing model inference (distillation, quantization, sparsity, compression, etc.)
(Preferred) Experience in gathering, processing, and annotating datasets

Job Responsibility

Work on top of pretrained multimodal generative models to fine-tune and optimize them for realtime generation
Design novel algorithms and techniques to solve problems with autoregressive visual generation, long-range temporal consistency, and long-term memory
Develop interactive applications with tight latency constraints
Process data to develop advanced interactive capabilities and controls for World Modeling, such as controlling character and camera movement, audio, and more

Fulltime

Research Scientist: Pretraining

Generalist AI

Location:
United States , San Mateo ▼
Somerville

Category:
Research and Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Research Scientist: Pretraining

Sr. Applied Research Scientist

Research Scientist Intern, AI Research - Multimodal Pretraining

Junior Research Infrastructure Engineer

Tech Lead - Pretraining Team, Wayve Foundation Model

Foundational AI Research Scientist - FAIR

Research Scientist Intern, Embodied Foundation Models

Research Scientist Intern, Embodied Foundation Models (Evaluation)

Research Scientist / Engineer – Realtime Interactive

Research Scientist: Pretraining

Generalist AI

Location:United States , San Mateo ▼Somerville

Category:Research and Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Research Scientist: Pretraining

Sr. Applied Research Scientist

Research Scientist Intern, AI Research - Multimodal Pretraining

Junior Research Infrastructure Engineer

Tech Lead - Pretraining Team, Wayve Foundation Model

Foundational AI Research Scientist - FAIR

Research Scientist Intern, Embodied Foundation Models

Research Scientist Intern, Embodied Foundation Models (Evaluation)

Research Scientist / Engineer – Realtime Interactive

Location:
United States , San Mateo ▼
Somerville

Category:
Research and Development

Contract Type:
Not provided

Job Posted:
February 18, 2026