CrawlJobs Logo

Research Engineering Manager - Model Training

perplexity.ai Logo

Perplexity

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

300000.00 - 470000.00 USD / Year

Job Description:

Perplexity is seeking a Research Engineering Manager to lead the team of all-star AI researchers and engineers responsible for developing the models that drive our products. Our team has developed some of the most advanced models for agentic research, query understanding, and other domains that require accuracy and depth. As we expand our userbase and portfolio of product surfaces, our in-house models are increasingly critical to providing a premium, high-taste experience for the world’s most sophisticated users. You will dive into our rich datasets of conversational and agentic queries, leveraging cutting‑edge training techniques to scale AI model performance. Through hands-on technical and organizational leadership, you will empower your team to develop SotA models for the use cases that matter most to our business and our users.

Job Responsibility:

  • Lead a team of researchers and engineers focused on training SotA models for Perplexity-relevant use cases, leveraging the latest supervised and reinforcement learning techniques
  • Drive research and engineering efforts to develop production models through advanced model training and alignment techniques, including RL, SFT, and other approaches
  • Become deeply familiar with the team’s technical stack, leading from the front through hands-on technical contributions
  • Own the data, training, and eval pipelines required to train and continuously improve LLM models
  • Design and iterate on model training and finetuning algorithms (e.g., preference‑based methods, reinforcement learning from human or AI feedback) through an approach that balances scientific rigor and iteration velocity
  • Design evaluations and improve the production model training pipeline to reliably deliver models that lie on the Pareto frontier of speed and quality
  • Work closely with engineering teams to integrate in-house models into our product and rapidly iterate based on real‑world usage
  • Manage day‑to‑day execution, project planning, and prioritization for the model training team to hit ambitious quality and performance goals

Requirements:

  • Proven experience with large-scale LLMs and Deep Learning systems
  • Strong Python and PyTorch skills
  • Experience leading or managing research or engineering teams working on large-scale AI model development, including driving complex projects from idea to production
  • Self‑starter with a willingness to take ownership of tasks and navigate ambiguity in a fast‑moving environment
  • Passion for tackling challenging problems in AI model quality, speed, safety, and reliability
  • 10+ years of technical experience, with at least 2 of those years as a manager and at least 4 of those years working on large-scale AI model development

Nice to have:

  • PhD in Machine Learning or related areas
  • Experience training very large Transformer-based models with techniques such as SFT, DPO, GRPO, RLHF‑style methods, or related preference‑based optimization approaches
  • Prior experience designing evaluations and production training pipelines for large‑scale models in a high‑growth environment
What we offer:
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Engineering Manager - Model Training

Engineering Manager - Machine Learning

We’re looking for an experienced Engineering Manager to lead the ML Soundtrack t...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
epidemicsound.com Logo
Epidemic Sound
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep ML engineering background with hands-on experience in generative diffusion models for audio/music (including PyTorch and modern training stacks)
  • Proven experience deploying ML systems into production at scale, with a focus on latency, stability, and cost
  • Strong ML system design and architecture skills across the full machine learning lifecycle
  • Track record of managing engineering teams
  • Demonstrated ability to set clear goals, manage performance, and grow engineers through mentorship and feedback
Job Responsibility
Job Responsibility
  • Own the technical roadmap and model strategy for generative music, including diffusion and transformer-based approaches
  • Lead the full lifecycle from research to production, championing training, evaluation, and deployment for real-time inference
  • Drive the productionisation of inference through model optimisation (distillation, quantisation), caching, and cost controls
  • Build and maintain team health through effective rituals, 1:1s, and fostering a psychologically safe, high-ownership culture
  • Manage cross-team dependencies and delivery with data, MLOps, and product engineering teams
  • Fulltime
Read More
Arrow Right

AI Research Engineer, Scaling

As a Research Engineer focused on Scaling, you will design and build robust infr...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 300000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming experience in Python and/or C++
  • Deep intuitive understanding of training and inference speed bottlenecks and scaling laws
  • A mindset aligned with extremely high scaling: belief that scale is foundational to enabling humanoid robotics
  • Degree in Computer Science or a related field
  • Experience with distributed training frameworks (e.g., TorchTitan, DeepSpeed, FSDP/ZeRO), multi-node debugging, and experiment management
  • Proven skills in optimizing inference performance using graph compilers, batching/scheduling, and serving systems like TensorRT or equivalents
  • Familiarity with quantization strategies (PTQ, QAT, INT8/FP8) and tools such as TensorRT and bitsandbytes
  • Experience developing or tuning CUDA or Triton kernels with understanding of hardware-level optimization (vectorization, tensor cores, memory hierarchies)
Job Responsibility
Job Responsibility
  • Own and lead scaling of distributed training and inference systems
  • Ensure compute resources are optimized to make data the primary constraint
  • Enable massive training runs (1000+ GPUs) using robot data, with robust fault tolerance, experiment tracking, and distributed operations
  • Optimize inference throughput for datacenter use cases such as world models and diffusion engines
  • Reduce latency and enhance performance for on-device robot policies using techniques such as quantization, scheduling, and distillation
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

Engineering Manager (Python + Machine Learning)

We are seeking a hands-on Machine Learning Engineering Manager to lead cross-fun...
Location
Location
India , Noida
Salary
Salary:
Not provided
aqusag.com Logo
AquSag Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ yrs of strong background in Machine Learning, NLP, and modern deep learning architectures (Transformers, LLMs)
  • Hands-on experience with frameworks such as PyTorch, TensorFlow, Hugging Face, or DeepSpeed
  • 2+ yrs of proven experience managing teams delivering ML/LLM models in production environments
  • Knowledge of distributed training, GPU/TPU optimization, and cloud platforms (AWS, GCP, Azure)
  • Familiarity with MLOps tools like MLflow, Kubeflow, or Vertex AI for scalable ML pipelines
  • Excellent leadership, communication, and cross-functional collaboration skills
  • Bachelor’s or Master’s in Computer Science, Engineering, or related field
Job Responsibility
Job Responsibility
  • Lead and mentor a cross-functional team of ML engineers, data scientists, and MLOps professionals
  • Oversee the full lifecycle of LLM and ML projects — from data collection to training, evaluation, and deployment
  • Collaborate with Research, Product, and Infrastructure teams to define goals, milestones, and success metrics
  • Provide technical direction on large-scale model training, fine-tuning, and distributed systems design
  • Implement best practices in MLOps, model governance, experiment tracking, and CI/CD for ML
  • Manage compute resources, budgets, and ensure compliance with data security and responsible AI standards
  • Communicate progress, risks, and results to stakeholders and executives effectively
Read More
Arrow Right

Research Engineer

We are looking for a Research Engineer to join the research team at ElevenLabs. ...
Location
Location
Poland
Salary
Salary:
Not provided
elevenlabs.io Logo
ElevenLabs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years industry experience as a Machine Learning Engineer, with a key emphasis on constructing data pipelines, as well as developing and implementing machine learning models
  • Demonstrating the capacity to autonomously evaluate novel concepts or enhance current machine learning projects, with the potential outcome of contributing to published works
  • Extensive background in conducting exploratory research to enhance the excellence of gathered data, particularly within the realm of audio and text-to-speech domains
Job Responsibility
Job Responsibility
  • Creating and upholding a reliable and expandable data management system specialized for text-to-speech projects. This includes establishing guidelines for versioning and ensuring data quality
  • Establishing a streamlined process for autonomously training, assessing, and launching text-to-speech models. This encompasses implementing procedures for dynamic learning, as well as routines for fine-tuning and refreshing validation data
  • Investigating cutting-edge approaches and strategies in machine learning, deep learning, and algorithms pertaining to text-to-speech technology
What we offer
What we offer
  • Innovative culture
  • Growth paths
  • Learning & development: ElevenLabs proactively supports professional development through an annual discretionary stipend
  • Social travel: We also provide an annual discretionary stipend to meet up with colleagues each year, however you choose
  • Annual company offsite
  • Co-working: If you’re not located near one of our main hubs, we offer a monthly co-working stipend
  • Fulltime
Read More
Arrow Right

Research Engineer, Scaling

As a Research Engineer, Scaling, you will design and build infrastructure to sup...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 300000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming experience in Python and/or C++
  • Deep intuitive understanding of what affects training or inference speed: from bottlenecks to scaling laws
  • A mindset aligned with extremely high scaling: belief that scale is foundational to enabling humanoid robotics
  • Degree in Computer Science or a related field
  • Hands‑on experience with distributed training frameworks (e.g., TorchTitan, DeepSpeed, FSDP/ZeRO), multi‑node debugging, experiment management
  • Proven skills optimizing inference performance: graph compilers, batching/scheduling, serving systems (e.g., using TensorRT or equivalents)
  • Familiarity with quantization strategies: PTQ, QAT, INT8/FP8
  • tools like TensorRT, bitsandbytes, etc.
  • Experience writing or tuning CUDA or Triton kernels
  • understanding of hardware features like vectorization, tensor cores, and memory hierarchies
Job Responsibility
Job Responsibility
  • Own and lead scaling of both distributed training and inference systems
  • Ensure compute resources are sufficient so that data, not hardware, is the limiter
  • Enable massive training at scale (1000+ GPUs) on robot data, handling fault tolerance, experiment tracking, distributed operations, and large datasets
  • Optimize inference throughput in datacenter contexts (e.g., for world models and diffusion engines)
  • Reduce latency and optimize performance for on‑device robot policies through techniques like quantization, scheduling, distillation, etc.
What we offer
What we offer
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

Research Engineer AI

The role involves conducting high-quality research in AI and HPC, shaping future...
Location
Location
United Kingdom , Bristol
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A good working knowledge of AI/ML frameworks, at least TensorFlow and PyTorch, as well as the data preparation, handling, and lineage control, as well as model deployment, in particular in a distributed environment
  • At least a B.Sc. equivalent in a Science, Technology, Engineering or Mathematical discipline
  • Development experience in compiled languages such as C, C++ or Fortran and experience with interpreted environments such as Python
  • Parallel programming experience, with relevant programming models such as OpenMP, MPI, CUDA, OpenACC, HIP, PGAS languages is highly desirable
Job Responsibility
Job Responsibility
  • Perform world-class research while also shaping products of the future
  • Enable high performance AI software stacks on supercomputers
  • Provide new environments/abstractions to support application developers to build, deploy, and run AI applications taking advantage of leading-edge hardware at scale
  • Manage modern data-intensive AI training and inference workloads
  • Port and optimize workloads of key research centers like the AI safety institute
  • Support onboarding and scaling of domain-specific applications
  • Foster collaboration with the UK and European research community
What we offer
What we offer
  • Health & Wellbeing benefits that support physical, financial and emotional wellbeing
  • Career development programs catered to achieving career goals
  • Unconditional inclusion in the workplace
  • Flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Research Engineer, Data Infrastructure

As a Research Engineer in Data Infrastructure, you will design and implement a “...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in building data pipelines and ETL systems
  • Ability to design and implement systems that collect, upload, and manage data from robotic fleets
  • Familiarity with architectures combining on‑robot components, on‑premises clusters, and cloud systems
  • Experience with data labeling tools or building tooling for dataset visualization and annotation
  • Skills in creating or applying machine learning models for dataset organization / automated labeling
Job Responsibility
Job Responsibility
  • Optimize operational efficiency of data collection on the NEO fleet
  • Design triggers on the robot to determine if and when data should be uploaded
  • Automate ETL pipelines so fleet‑wide data is easily queryable and available for training
  • Work with external dataset providers to prepare diverse multi-modal pre-training datasets
  • Build frontend tools for visualizing and automating labeling of very large datasets
  • Develop machine learning models to automatically label and organize datasets
What we offer
What we offer
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

AI Research Engineer, Data Infrastructure

As a Research Engineer in Infrastructure, you will design and implement a robust...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in building data pipelines and ETL systems
  • Ability to design and implement systems for data collection and management from robotic fleets
  • Familiarity with architectures that span on-robot components, on-premise clusters, and cloud infrastructure
  • Experience with data labeling tools or building dataset visualization and annotation tooling
  • Proficiency in creating or applying machine learning models for dataset organization and automated labeling
Job Responsibility
Job Responsibility
  • Optimize operational efficiency of data collection across the NEO robot fleet
  • Design intelligent triggers to determine when and what data should be uploaded from the robots
  • Automate ETL pipelines to make fleet-wide data easily queryable and training-ready
  • Collaborate with external dataset providers to prepare diverse multi-modal pre-training datasets
  • Build frontend tools for visualizing and automating the labeling of large datasets
  • Develop machine learning models for automatic dataset labeling and organization
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right