CrawlJobs Logo

Research Engineer, Language Model Pre-Training

United States, Palo Alto · Job Posted January 13, 2026
Apply Position
Job Link Share

Job Description

As a Research Engineer, Language Model Pre-training, you'll shape our language model roadmap through end-to-end pretraining development. You will work extremely closely with our pretraining team, who will integrate your insights into our next-generation models.

Job Responsibility

  • Shape our language model roadmap through end-to-end pretraining development
  • Work across: Large-scale training runs and model parallelization
  • Performance optimization of our pretraining stack
  • Dataset collection, processing, and evaluation
  • Architecture and methodology research, including optimizer ablations

Requirements

  • Strong engineering aptitude for rapidly implementing reliable and robust systems
  • Can rapidly learn new fields and are excited to implement new ideas
  • Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale
  • Deep expertise and intuition for solving machine learning problems and training models
  • Experience with training on large-scale (multi-node) GPU clusters
  • Deep understanding of model training pipelines – including model/data parallelism, distributed optimizers, etc.
  • Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
  • Understanding of large-scale, highly parallel data processing pipelines
  • High proficiency with PyTorch and Python
  • Strong ability to dive into large pre-existing codebases and rapidly get up to speed
  • Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Math, Physics)

Nice to have

Published machine learning research in well-respected venues is a plus

What we offer

  • Comprehensive medical, dental, vision, and FSA plans
  • Competitive compensation and 401(k)
  • Relocation and immigration support on a case-by-case basis
  • On-site meals prepared by a dedicated culinary team
  • Thursday Happy Hours
  • In-person team in Palo Alto, CA, with a collaborative, high-energy environment

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Engineer, Language Model Pre-Training

8 matching positions

Research Scientist / Engineer – Pre-training / Scaling

At Luma, the Pre-Training / Scaling team is responsible for building the core mu...
Location
Location
United States , Palo Alto
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in Python and PyTorch with experience building ML models from scratch
  • Deep understanding of multimodal generative models and deep learning architectures
  • (Preferred) Strong research track record in generative AI with published work in top-tier venues preferred
  • (Preferred) Experience with large-scale distributed training systems
Job Responsibility
Job Responsibility
  • Lead cutting-edge research in multimodal foundation models spanning video, image, text, and audio
  • Design and implement novel algorithms, architectures, and techniques for large-scale generative AI models
  • Develop training methodologies for foundation models across thousands of GPUs
  • Research and implement state-of-the-art techniques in Autoregressive LLMs, Vision Language Models, and / or Diffusion Models
  • Collaborate with cross-functional teams to transition research into production systems
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Pre-Training Data

As a Machine Learning Engineer specializing in pretraining data, you will play a...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with curriculum learning, data mixing and data attribution
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • Knowledge of data quality assessment techniques and experimentation with data mixtures
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Research Engineer (Technical Leadership), FAIR Data - Meta Superintelligence Labs

Meta is seeking Research Engineers to help us build the data foundation for Meta...
Location
Location
United States , Menlo Park
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 4+ years of industry research experience with pre/mid/post-training data curation for large language or large media models
  • 4+ years of formal technical lead experience
  • Experience leading major technical initiatives with cross-functional impact and influencing strategy across multiple teams
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image perception or generation, OCR, agentic data, synthetic data, multilingual data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Principal Research Engineer - Generative AI - AI Frontiers

The AI Frontiers lab at Microsoft Research is charted with ambitious research go...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors in Computer Science or relevant field AND 6+ years related experience
  • OR Master's Degree in Computer Science or related field AND 4+ years related experience
  • OR Doctorate in Computer Science or related field AND 3+ years related experience
  • OR equivalent experience
  • 1+ year(s) experience developing with Python and Pytorch/JAX
  • Familiarity with architecture and optimizations for large language models
  • Hands-on work in debugging and profiling Pytorch distributed code
  • Basic understanding of working of CUDA kernels
  • Familiarity with pre-training, mid-training and/or post-training pipelines for language and/or multimodal models
  • Foundational understanding of reinforcement learning and key challenges in the field
Job Responsibility
Job Responsibility
  • Design, develop, execute, and implement technology research projects in collaboration with other researchers, engineers, and product groups
  • Be a part of research breakthroughs in the field and play a crucial role in developing, improving, and exploring the capabilities of Large Language Models (LLMs), reasoning and agentic AI
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Research Engineer, VLA Models

As a Research Engineer, Vision-Language Action (VLA) Models, you will train the ...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 300000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming experience in Python (and familiarity with tools like Bazel)
  • Experience with frameworks like PyTorch
  • Experience with simulation environments (e.g., Isaac Sim, MuJoCo)
  • Deep understanding of how autonomous systems generalize to new environments
  • Experience designing evaluation metrics and validating models in real or simulated settings
  • Ability to coordinate with cross‑functional teams (controls, QA, data) to bring models into production
Job Responsibility
Job Responsibility
  • Take extreme ownership over autonomous capabilities: reviewing data, designing model architectures, shipping models, and maintaining performance across the fleet
  • Train NEO for whole‑body manipulation and navigation tasks in unseen environments
  • Design robust evaluation metrics to support scaling of model pre‑training
  • Experiment with state‑of‑the‑art techniques from vision–language models and generative model literature to predict actions
  • Collaborate with controls, QA, and data collection teams to deploy reinforcement learning policies to the production fleet
What we offer
What we offer
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

AI Research Engineer, VLA Models

As a Research Engineer on the Vision-Language Action (VLA) team, you will be res...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 300000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming skills in Python and familiarity with build systems like Bazel
  • Experience using deep learning frameworks such as PyTorch
  • Proficiency in simulation environments like Isaac Sim or MuJoCo
  • Deep understanding of generalization in autonomous systems
  • Experience designing and validating evaluation metrics in real or simulated environments
  • Ability to work cross-functionally with controls, QA, and data teams to operationalize models
Job Responsibility
Job Responsibility
  • Take end-to-end ownership of autonomous capability development: data review, model design, deployment, and fleet performance monitoring
  • Train NEO to perform whole-body manipulation and navigation tasks in unfamiliar environments
  • Design robust evaluation metrics to support scalable model pre-training
  • Experiment with cutting-edge vision-language and generative model techniques to predict robot actions
  • Collaborate with controls, QA, and data teams to deploy reinforcement learning policies to the production fleet
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

Principal Research Engineer - Multimodal AI

Microsoft Research (MSR) AI Frontiers lab is seeking applications for the positi...
Location
Location
United States , Redmond; New York City
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • 2+ year(s) experience developing with Python and Pytorch/JAX
  • Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR Master's or Doctorate in Computer Science or relevant field AND 10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Experience with architecture and optimizations for large language models
  • Hands-on work in debugging and profiling Pytorch distributed code
  • Understanding of working of CUDA kernels
  • Experience with pre-training, mid-training and/or post-training pipelines for language and/or multimodal models
Job Responsibility
Job Responsibility
  • Design, develop, execute, and implement technology research projects in collaboration with other researchers, engineers, and product groups
  • Be a part of research breakthroughs in the field and play a crucial role in developing, improving, and exploring the capabilities of Large Language Models (LLMs), reasoning and agentic AI
  • Fulltime
Read More
Arrow Right

Principal/Senior Applied Scientist Security Models Training Team - Next-Gen frontier research

The Security Models Training team is expanding to drive the development of a new...
Location
Location
Israel , Tel Aviv, Herzliya
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • M.Sc. / Ph.D. in Computer Science, Information Systems, Electrical or Computer Engineering or Data Science (Ph.D. strongly preferred)
  • Candidates with M.Sc. / Ph.D. in related fields with proven industry experience or a strong publication record in the areas of LLM, Information Retrieval, Machine Learning, Natural Language Processing, Time Series Forecasting and Deep Learning are considered as well
  • Proven hands-on experience of at least 5 years (including post-grad work) in building and deploying Machine Learning products
  • Key areas of expertise include Natural Language Processing and Large Language Models, along with an understanding of concepts such as Privacy and Responsible AI
  • Candidates are expected to demonstrate a strong history of successfully translating applied research into production-ready solutions, along with a proven track record of delivering projects within large-scale production environments
  • Proven expertise in the LLM and/or time-series forecasting domain, demonstrating comprehensive knowledge of relevant concepts in the domain
  • Ideal applicants should be proficient in areas such as LLM’s pre and post training, including CPT, SFT and RL, LLM benchmarking, agentic flows, and model alignment
  • Hands-on experience in building neural model architectures at the 100M+ scale and the proficiency to adapt them at all abstraction levels down the individual block (e.g. changing the innerworkings of an attention block, introducing new blocks, or changing the routings)
  • Demonstrated proficiency in problem-solving and data analysis, with substantial expertise in evaluating the performance of large language models (LLMs) and/or time-series forecasting models, developing benchmarks tailored to practical scenarios
Job Responsibility
Job Responsibility
  • Technical Leadership & Ownership: set technical direction for major security domain initiatives
  • lead security model programs spanning pre‑training, task tuning, reinforcement learning, and evaluation
  • translate cutting‑edge research into production‑ready capabilities
  • Advanced Model Design – Building and customizing deep learning model architectures (e.g., modifying transformer blocks, attention/memory modules, etc.) at the SLM/LLM scale
  • making principled architectural tradeoffs to improve reliability, robustness, and security‑specific behavior
  • Advanced Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and other modalities, including time-series
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks
  • define objective evaluation frameworks and quality gates
  • run ablation studies to measure impact and optimize data and training effectiveness to support confident product decisions
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets, with attention to privacy, governance, and long‑term reuse across security scenarios
  • Fulltime
Read More
Arrow Right