CrawlJobs Logo

Researcher, Post Training

United Kingdom, London · Job Posted May 28, 2026
Apply Position
Job Link Share

Job Description

Lovable lets over 2 million people build software using plain language, and the models behind it need to be exceptional. We're hiring an engineer who has gotten their hands dirty with post-training at scale and wants to do it again for one of the fastest-growing AI products in the world. You'll own our full post-training pipeline: translating the latest research into production training recipes, adapting them for code generation and agent workloads, and putting improved models in front of users fast. The goal is to get promising research into production within days or weeks, not months. This isn't an academic research position - you'll spend as much time in production infrastructure as in training configs, and your success is measured by what ships.

Job Responsibility

  • Own the full lifecycle of Lovable's post-training pipeline - from data curation and training runs through evaluation and deployment
  • Apply and adapt reinforcement learning, preference optimization, and supervised fine-tuning methods to make our models better at generating code, reasoning about user intent, and acting as reliable agents
  • Build the evaluation and experimentation infrastructure that tells us whether a model change actually helps users - covering helpfulness, safety, latency, and reliability
  • Develop and operate the production systems that run training jobs at scale, including GPU orchestration and data pipelines
  • Work across team boundaries with our agent, product, and infrastructure engineers to turn model gains into product improvements users can feel
  • Investigate and resolve failures end-to-end - whether the root cause is in a training recipe, a data issue, or a serving regression
  • Read papers, run experiments, and move fast: the goal is to get promising research into production within days or weeks, not months

Requirements

  • You've personally run post-training jobs on large language models - RFT/RLVR, preference optimization, or similar. Not just called APIs or written prompts, but actually trained and iterated on models
  • You can write solid production code. The systems you build need to run reliably, not just produce interesting research artifacts
  • You're fluent in at least one major ML framework (PyTorch, JAX) and comfortable working with distributed training setups and GPU clusters
  • You understand the math behind preference optimization, reward modeling, and alignment techniques - and can reason about when each approach fits
  • You've built or significantly contributed to evaluation systems that capture real-world quality, not just benchmark scores
  • You can trace a model quality regression from user-facing symptoms back through serving, inference, and training - and you enjoy doing it
  • You want to ship. Research taste matters, but at Lovable the question is always 'how fast can we get this to users?'

Nice to have

  • You've worked on code generation or agentic use cases specifically
  • You've put post-trained models into the hands of real users and seen how they hold up at scale
  • You've owned the full loop: curating data, running training, evaluating results, deploying, and monitoring in production
  • You have a habit of reading a paper on Monday and having a prototype running by Friday
  • You've experimented with speculative decoding or similar techniques to improve model efficiency
  • You have strong views on evaluation methodology and have built evals that actually predict user satisfaction
  • You've published or contributed meaningfully to the open-source ML ecosystem

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Researcher, Post Training

8 matching positions

Applied AI Researcher, Post-Training

The Post-Training team focuses on adapting foundation models to real-world perfo...
Location
Location
United States , San Francisco; New York
Salary
Salary:
130000.00 - 250000.00 USD / Year
distyl.ai Logo
Distyl AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep Understanding of Post-training Techniques: Familiarity with supervised fine-tuning, preference optimization (RLHF/DPO), LoRA/PEFT, and instruction-tuning pipelines
  • Experience Adapting Frontier Models: You’ve tuned or adapted LLMs/SLMs to specialized domains or behaviors through data curation, reward modeling, or continual pretraining
  • Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.)
  • Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done
  • Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow
  • Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI
  • Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize
Job Responsibility
Job Responsibility
  • Researchers develop and evaluate techniques such as supervised fine-tuning, preference optimization (DPO, RLHF, RLAIF), and continual adaptation to align models with Distyl’s enterprise systems
  • Researchers in Post-Training investigate new methods for aligning large models with human and system-level objectives. They explore trade-offs between generalization and specialization, data efficiency and robustness, capability and controllability
What we offer
What we offer
  • 100% covered medical, dental, and vision for employees and dependents
  • 401(k) with additional perks (e.g., commuter benefits, in‑office lunch)
  • Access to state‑of‑the‑art models, generous usage of modern AI tools, and real‑world business problems
  • Ownership of high‑impact projects across top enterprises
  • A mission‑driven, fast‑moving culture that prizes curiosity, pragmatism, and excellence
  • meaningful equity
  • Fulltime
Read More
Arrow Right

Machine Learning Research Scientist / Research Engineer, Post-Training

Scale works with the industry’s leading AI labs to provide high quality data and...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
252000.00 - 315000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field
  • Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning
  • Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning
  • Excellent written and verbal communication skills
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
  • Previous experience in a customer facing role
Job Responsibility
Job Responsibility
  • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities
  • Design and experiment new approaches to preference optimization
  • Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness
  • Publish research findings in top-tier AI conferences
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • commuter stipend
  • Fulltime
Read More
Arrow Right

AI Architect

We’re hiring an AI Architect to sit at the intersection of frontier AI research,...
Location
Location
United States , San Francisco; New York
Salary
Salary:
201600.00 - 241920.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep technical background in applied AI/ML: 5–10+ years in research, engineering, solutions engineering, or technical product roles working on LLMs or multimodal systems, ideally in high-stakes, customer-facing environments
  • Hands-on experience with model improvement workflows: demonstrated experience with post-training techniques, evaluation design, benchmarking, and model quality iteration
  • Ability to work on hard, ambiguous technical problems: proven track record of partnering directly with advanced customers or research teams to scope, reason through, and execute on deep technical challenges involving frontier models
  • Strong technical fluency: you can read papers, interrogate metrics, write or review complex Python/SQL for analysis, and reason about model-data trade-offs
  • Executive presence with world-class researchers and enterprise leaders
  • excellent writing and storytelling
  • Bias to action: you ship, learn, and iterate.
Job Responsibility
Job Responsibility
  • Translate research → product: work with client side researchers on post-training, evals, safety/alignment and build the primitives, data, and tooling they need
  • Partner deeply with core customers and frontier labs: work hands-on with leading AI teams and frontier research labs to tackle hard, open-ended technical problems related to frontier model improvement, performance, and deployment
  • Shape and propose model improvement work: translate customer and research objectives into clear, technically rigorous proposals—scoping post-training, evaluation, and safety work into well-defined statements of work and execution plans
  • Translate research into production impact: collaborate with customer-side researchers on post-training, evaluations, and alignment, and help design the data, primitives, and tooling required to improve frontier models in practice
  • Own the end-to-end lifecycle: lead discovery, write crisp PRDs and technical specs, prioritize trade-offs, run experiments, ship initial solutions, and scale successful pilots into durable, repeatable offerings
  • Lead complex, high-stakes engagements: independently run technical working sessions with senior customer stakeholders
  • define success metrics
  • surface risks early
  • and drive programs to measurable outcomes
  • Partner across Scale: collaborate closely with research (agents, browser/SWE agents), platform, operations, security, and finance to deliver reliable, production-grade results for demanding customers
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity based compensation.
  • Fulltime
Read More
Arrow Right

Research Engineer, Core ML

This is a research engineering role with direct production impact. You will tran...
Location
Location
United States , San Francisco
Salary
Salary:
200000.00 - 280000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience working on ML systems, large‑scale model training, inference, or adjacent areas (or equivalent experience via research / open source)
  • Advanced degree in Computer Science, EE, or a related field, or equivalent practical experience
  • Demonstrated experience owning complex technical projects end‑to‑end
  • Strong expertise in at least one of the following: Large‑scale inference systems (e.g., SGLang, vLLM, FasterTransformer, TensorRT, custom engines, or similar), GPU performance, distributed serving
  • RL / post‑training for LLMs or large models (e.g., GRPO, RLHF/RLAIF, DPO‑like methods, reward modeling)
  • Model architecture design for Transformers or other large neural nets
  • Distributed systems / high‑performance computing for ML
  • Strong coding ability in Python
  • Experience profiling and optimizing performance across GPU, networking, and memory layers
  • Track record of impactful work in ML systems, RL, or large‑scale model training (papers, open‑source projects, or production systems)
Job Responsibility
Job Responsibility
  • Advance inference efficiency end‑to‑end
  • Design and prototype algorithms, architectures, and scheduling strategies for low‑latency, high‑throughput inference
  • Implement and maintain changes in high‑performance inference engines
  • Profile and optimize performance across GPU, networking, and memory layers
  • Unify inference with RL / post‑training
  • Design and operate RL and post‑training pipelines
  • Make RL and post‑training workloads more efficient with inference‑aware training loops
  • Co‑design algorithms and infrastructure
  • Run ablations and scale‑up experiments to understand trade‑offs
  • Own critical systems at production scale
What we offer
What we offer
  • Startup equity
  • Health insurance
  • Competitive benefits
  • Fulltime
Read More
Arrow Right

Research Engineer / Scientist - Post-training

At Luma, the Post-training team is responsible for unlocking creative control in...
Location
Location
United States , Palo Alto
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies
  • Demonstrated ability to do independent research in Academic or Industry settings
  • Substantial industry experience in large-scale deep learning model training, with demonstrated expertise in at least one of Large Language Models, Vision-Language Models, Diffusion Models, or comparable generative AI architectures
  • Comprehensive technical proficiency and practical experience with leading deep learning frameworks, including advanced competency in one of PyTorch, JAX, TensorFlow, or equivalent platforms for model development and optimization
  • Strong orientation toward applied AI implementations with emphasis on translating product requirements into technical solutions, coupled with exceptional visual discrimination and dedicated focus on enhancing visual fidelity and aesthetic quality of generated content
  • Proficiency in accelerated prototyping and demonstration development for emerging features, facilitating efficient iteration cycles and comprehensive stakeholder evaluation prior to production implementation
  • Established track record of effective cross-functional teamwork, including successful partnerships with teams spanning Product, Design, Evaluation, Applied, and creative specialists
Job Responsibility
Job Responsibility
  • Optimize Luma's image and video generative models through targeted fine-tuning to improve visual quality, instruction adherence, and overall performance metrics
  • Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards
  • Partner closely with the Applied Research team to identify product requirements, understand diverse use cases across Luma's platforms, and execute targeted fine-tuning initiatives to address performance gaps and enhance user-facing capabilities
  • Conduct comprehensive side-by-side evaluations comparing model performance against leading market competitors, systematically analyzing the impact of post-training techniques on downstream performance metrics and identifying areas for improvement
  • Develop advanced post-training capabilities for Luma’s video models including Camera control, Object & character Reference, Image & Video Editing, Human Performance & Motion Transfer Approaches
  • Architect data processing pipelines for large-scale video and image datasets, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories
  • Research and deploy cutting-edge diffusion sampling methodologies and hyperparameter optimization strategies to achieve superior performance on established visual quality benchmarks
  • Research emerging post-training methodologies in generative AI, evaluate their applicability to Luma's product ecosystem, and integrate promising techniques into our Post-training recipe
  • Fulltime
Read More
Arrow Right

AI Research Lead

Perplexity is seeking an exceptional AI Research Tech Lead to drive our research...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 470000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years of experience working on relevant AI/ML projects with 3+ years in a technical leadership role
  • Proven track record of leading and mentoring technical and research teams
  • A Computer Science graduate degree at a premier academic institution
  • Deep expertise with large-scale LLMs and Deep Learning systems
  • Strong programming skills with versatility across multiple languages and frameworks
  • Demonstrated ability to set technical vision and drive execution
  • Experience with pre-training and post-training techniques (self-supervised learning along with SFT/DPO/GRPO/PPO)
  • Self-starter with exceptional ownership mentality and ability to work in ambiguous environments
  • Passion for solving challenging problems and pushing the boundaries of AI research
Job Responsibility
Job Responsibility
  • Define and execute the macro research direction across multiple modalities, including post-training LLMs for agent trajectories and future mid-training initiatives
  • Lead strategic research planning and roadmap development to advance Sonar model capabilities
  • Drive innovation in supervised and reinforcement learning techniques for query answering
  • Collaborate with leadership to align research priorities with product and business objectives
  • Coach and mentor a team of AI research scientists and engineers, fostering their technical and professional growth
  • Establish the long-term macro research direction across the team, including our direction across different modalities
  • Lead hiring and onboarding of new research talent
  • Create a collaborative environment that encourages knowledge sharing and innovation
  • Post-train SOTA LLMs on query answering using cutting-edge supervised and reinforcement learning techniques
  • Own and optimize the full stack data, training, and evaluation pipelines required for LLM post-training
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Post-Training

Advance the state of the art for model post training, ship state of the art mode...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extremely strong software engineering skills
  • Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR
  • Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray)
  • Experience using large-scale distributed training strategies
  • Hands on experience on training large model at scale
  • Hands on experience with the post training phase of model training, with a strong emphasis on performance optimisation
Job Responsibility
Job Responsibility
  • Design and write high-performant and scalable software for training models
  • Consistently post-train the models to reach SOTA level performance
  • Coordinate with other specialist teams (Agentic, Code…) to produce models that have strong all encompassing performance
  • Craft and implement techniques to improve the performance and results of our training cycles both on the SFT and the RL regime
  • Research, implement, and experiment with ideas on our supercompute and data infrastructure
  • Learn from and work with the best researchers in the field
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

AI Research Scientist, Post-Training - Meta Superintelligence Labs

Meta is seeking Research Scientists to join the Post-Training team within Meta S...
Location
Location
United States , Menlo Park
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Ph.D. in Computer Science, Machine Learning, or a related technical field
  • 3+ years of experience in machine learning research, with a focus on deep learning, data alignment, NLP, or related areas
  • Demonstrated ability to lead technical research projects from conception to production
  • Effective communication skills and experience collaborating with technical leadership
Job Responsibility
Job Responsibility
  • Design novel methodologies for post-training data collection, curation, and synthetic data generation
  • Define data quality frameworks and alignment strategies that guide capability development across MSL, particularly for complex reasoning and agentic behaviors
  • Drive the scientific vision for eliciting high-quality data in expert domains (finance, legal, health, STEM) and complex agentic trajectories (Deep research, computer use, UI generation)
  • Conduct research to develop and optimize post-training recipes that directly improve model quality
  • Partner with cross-functional research teams across product and model training to identify and prioritize gaps in model capabilities
  • Contribute to research workstreams that shape the long-term direction of data-centric AI at MSL, working independently while also contributing to team goals and organizational priorities
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right