CrawlJobs Logo

Research Scientist / Engineer – Pre-training / Scaling

lumalabs.ai Logo

Luma AI

Location Icon

Location:
United States , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

187500.00 - 395000.00 USD / Year

Job Description:

At Luma, the Pre-Training / Scaling team is responsible for building the core multimodal AI systems that power our entire platform. Working at the forefront of generative AI research, this team develops the fundamental architectures and training methodologies that enable our models to see, hear, understand, and interact with the world across video, image, text, and audio modalities.

Job Responsibility:

  • Lead cutting-edge research in multimodal foundation models spanning video, image, text, and audio
  • Design and implement novel algorithms, architectures, and techniques for large-scale generative AI models
  • Develop training methodologies for foundation models across thousands of GPUs
  • Research and implement state-of-the-art techniques in Autoregressive LLMs, Vision Language Models, and / or Diffusion Models
  • Collaborate with cross-functional teams to transition research into production systems

Requirements:

  • Expertise in Python and PyTorch with experience building ML models from scratch
  • Deep understanding of multimodal generative models and deep learning architectures
  • (Preferred) Strong research track record in generative AI with published work in top-tier venues preferred
  • (Preferred) Experience with large-scale distributed training systems

Additional Information:

Job Posted:
January 13, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Scientist / Engineer – Pre-training / Scaling

Distinguished Applied Researcher

At Capital One, we are creating trustworthy and reliable AI systems, changing ba...
Location
Location
United States , McLean; San Francisco; New York; Cambridge; San Jose
Salary
Salary:
278400.00 - 381300.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Electrical Engineering, Computer Engineering, Computer Science, AI, Mathematics, or related fields plus 4 years of experience in Applied Research or M.S. in Electrical Engineering, Computer Engineering, Computer Science, AI, Mathematics, or related fields plus 6 years of experience in Applied Research
  • PhD in Computer Science, Machine Learning, Computer Engineering, Applied Mathematics, Electrical Engineering or related fields
  • LLM
  • PhD focus on NLP or Masters with 10 years of industrial NLP research experience
  • Core contributor to team that has trained a large language model from scratch (10B + parameters, 500B+ tokens) or through continued pre-training, post training pipeline for alignment and reasoning, LLM optimizations, complex reasoning with multi-agentic LLMs
  • Numerous publications at ACL, NAACL and EMNLP, Neurips, ICML or ICLR on topics related to the pre-training of large language models (e.g. technical reports of pre-trained LLMs, SSL techniques, model pre-training optimization)
  • Has worked on an LLM (open source or commercial) that is currently available for use
  • Demonstrated ability to guide the technical direction of a large-scale model training team
  • Experience with common training optimization frameworks (deep speed, nemo)
  • Experience contributing to the team that has trained a large language model from scratch (10B + parameters, 500B+ tokens) or through continued pre-training, post training pipeline for alignment and reasoning, LLM optimizations, complex reasoning with multi-agentic LLMs
Job Responsibility
Job Responsibility
  • Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI-powered products that change how customers interact with their money
  • Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data
  • Build AI foundation models through all phases of development, from design through training, evaluation, validation, and implementation
  • Engage in high impact applied research to take the latest AI developments and push them into the next generation of customer experiences
  • Flex your interpersonal skills to translate the complexity of your work into tangible business goals
  • Partner with a cross-functional team of scientists, machine learning engineers, software engineers, and product managers to deliver AI-powered platforms and solutions that change how customers interact with their money
What we offer
What we offer
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

CoreAI is at the forefront of Microsoft’s mission to redefine how software is bu...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or master’s degree in computer science or a related field
  • 10+ years designing, developing, and shipping high quality software
  • 4+ years of experience with distributed systems and cloud based infrastructure
  • 2+ year of experience with DevOps practices (CI/CD, automated testing, deployment, etc.)
  • Passionate and self-motivated
  • Strong ability in self-learning, entering new domain, managing through uncertainty in an innovative team environment
  • 10+ years of software development experience in C#, C++, Python, or similar languages
  • 6+ years of experience with containerization tools (e.g., Docker, Kubernetes)
  • Knowledge and hands on experience with production ML systems, large-scale training infrastructure, NCCL, CUDA libraries and tools
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Job Responsibility
Job Responsibility
  • Architect, design, and develop core AI Infrastructure services developed in Go, Rust, Python, C++, and C# deployed on large-scale Kubernetes clusters to support pre-training and post-training of state-of-the-art LLMs, SLMs, multimodal, and code-specific models
  • Design, build, and manage compute, storage and networking sub-system on large-scale GPU clusters to support LLM training, customization, and inference workloads
  • Enhance systems and applications to deliver high stability, low latency, strong security, and maintainability in large-scale complex training environments in Azure and in partner clouds
  • Provide operational support, technical leadership, and vision while contributing to the deployment, monitoring, and continuous improvement of engineering systems and practices
  • Support development and troubleshooting from the frontline, resolving complex issues impacting large-scale services
  • Collaborate closely with engineers, data scientists within the team, internal Microsoft Research teams and external enterprises to build better solutions together
  • Provide vision, expertise, and technical leadership to other team members
  • Help to grow talent in these areas
  • Fulltime
Read More
Arrow Right

AI Research Scientist, Media Data Research

Meta is seeking AI research scientists to help us build the data foundation for ...
Location
Location
United States , Menlo Park
Salary
Salary:
184000.00 - 257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD in Computer Science or a related technical field
  • 2+ years of industry research experience in LLM/NLP, computer vision, or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross-functional impact, and/or influencing strategy across multiple teams
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

AI Research Scientist, Media Data Research - MSL FAIR

Meta is seeking AI research scientists to help us build the data foundation for ...
Location
Location
United States , Menlo Park
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD in Computer Science or a related technical field
  • 1+ year of industry research experience in LLM/LMM, computer vision, or related AI/ML models
  • Experience owning and/or driving complex technical projects from end-to-end
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
  • Fundamentally improve our data velocity across workflows and projects by contributing to quality in data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

AI Research Scientist (Technical Leadership), Data Research - MSL FAIR

Meta is seeking research scientists to help us build the data foundation for Met...
Location
Location
United States , Menlo Park
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD in Computer Science or a related technical field
  • 4+ years of industry research experience in NLP or CV
  • 4+ years as a formal technical lead experience
  • Experience leading major technical initiatives with cross-functional impact and influencing strategy across multiple teams
  • Practical experience with multimodal pre-training or mid-training data curation for large language models, media perception, or media generation models
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

AI Research Scientist, Text Data Research - MSL FAIR

Meta is seeking AI research scientists to help us build the data foundation for ...
Location
Location
United States , Menlo Park
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD in Computer Science or a related technical field
  • 1+ year of industry research experience in LLM/NLP or related AI/ML models
  • Experience owning and/or driving complex technical projects from end-to-end
  • Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs
  • Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

AI Research Scientist, Text Data Research - MSL FAIR

Meta is seeking AI research scientists to help us build the data foundation for ...
Location
Location
United States , Menlo Park
Salary
Salary:
184000.00 - 257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • PhD in Computer Science or a related technical field
  • 2+ years of industry research experience in LLM/NLP or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross-functional impact, and/or influencing strategy across multiple teams
  • Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs
  • Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Architect efficient and scalable data curation systems and pipelines
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

AI Research Scientist, Media Data Research - MSL FAIR

Meta is seeking AI research scientists to help us build the data foundation for ...
Location
Location
United States , Bellevue
Salary
Salary:
122000.00 - 181000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • PhD in Computer Science or a related technical field, plus 1+ years of industry research experience in LLM/LMM, computer vision, or related AI/ML models
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
  • Experience owning and/or driving complex technical projects from end-to-end
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
  • Fundamentally improve our data velocity across workflows and projects by contributing to innovation in data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right