CrawlJobs Logo

Ai Researcher, Post Training

lovable.dev Logo

Lovable

Location Icon

Location:
Sweden; United Kingdom , Stockholm

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Lovable lets over 2 million people build software using plain language, and the models behind it need to be exceptional. We're hiring an engineer who has gotten their hands dirty with post-training at scale and wants to do it again for one of the fastest-growing AI products in the world. You'll own our full post-training pipeline: translating the latest research into production training recipes, adapting them for code generation and agent workloads, and putting improved models in front of users fast. The goal is to get promising research into production within days or weeks, not months. This isn't an academic research position - you'll spend as much time in production infrastructure as in training configs, and your success is measured by what ships.

Job Responsibility:

  • Own the full lifecycle of Lovable's post-training pipeline - from data curation and training runs through evaluation and deployment
  • Apply and adapt reinforcement learning, preference optimization, and supervised fine-tuning methods to make our models better at generating code, reasoning about user intent, and acting as reliable agents
  • Build the evaluation and experimentation infrastructure that tells us whether a model change actually helps users - covering helpfulness, safety, latency, and reliability
  • Develop and operate the production systems that run training jobs at scale, including GPU orchestration and data pipelines
  • Work across team boundaries with our agent, product, and infrastructure engineers to turn model gains into product improvements users can feel
  • Investigate and resolve failures end-to-end - whether the root cause is in a training recipe, a data issue, or a serving regression
  • Read papers, run experiments, and move fast: the goal is to get promising research into production within days or weeks, not months

Requirements:

  • You've personally run post-training jobs on large language models - RFT/RLVR, preference optimization, or similar. Not just called APIs or written prompts, but actually trained and iterated on models
  • You can write solid production code. The systems you build need to run reliably, not just produce interesting research artifacts
  • You're fluent in at least one major ML framework (PyTorch, JAX) and comfortable working with distributed training setups and GPU clusters
  • You understand the math behind preference optimization, reward modeling, and alignment techniques - and can reason about when each approach fits
  • You've built or significantly contributed to evaluation systems that capture real-world quality, not just benchmark scores
  • You can trace a model quality regression from user-facing symptoms back through serving, inference, and training - and you enjoy doing it
  • You want to ship. Research taste matters, but at Lovable the question is always 'how fast can we get this to users?'

Nice to have:

  • You've worked on code generation or agentic use cases specifically
  • You've put post-trained models into the hands of real users and seen how they hold up at scale
  • You've owned the full loop: curating data, running training, evaluating results, deploying, and monitoring in production
  • You have a habit of reading a paper on Monday and having a prototype running by Friday
  • You've experimented with speculative decoding or similar techniques to improve model efficiency
  • You have strong views on evaluation methodology and have built evals that actually predict user satisfaction
  • You've published or contributed meaningfully to the open-source ML ecosystem

Additional Information:

Job Posted:
May 05, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Ai Researcher, Post Training

Research Scientist - Generative AI

As a Research Scientist in the Emergent Machine Intelligence Team at Hewlett Pac...
Location
Location
United States , Santa Barbara
Salary
Salary:
101900.00 - 234500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Artificial Intelligence, Machine Learning, Physics, Mathematics, or other related fields
  • 3-5 years working experience with training and fine-tuning generative AI models including LLMs, diffusion models, or Energy-Based Models
  • Proven track record of research in generative models, demonstrated through publications, patents, or publicly available projects
  • Proficiency in programming languages commonly used in AI research, such as Python, and experience with AI/ML frameworks (e.g., TensorFlow, PyTorch)
  • Deep understanding of machine learning algorithms and principles, especially in the context of generative AI
  • Strong mathematical background, with excellent skills in areas such as statistics, probability, linear algebra
  • Creative and analytical thinking abilities, with a passion for solving complex problems
  • Excellent communication skills, capable of conveying complex ideas clearly and engaging with both technical and non-technical audiences.
Job Responsibility
Job Responsibility
  • Conduct high-quality research in generative AI, including but not limited to designing algorithms for pre-training and post-training current autoregressive and diffusion models for multimodal data
  • Design, implement, and validate new algorithms and models for augmented LLMs, pushing the boundaries of AI capabilities
  • Developing and prototyping novel algorithms for fine-tuning, retrieval augmented generation, and in-context learning for various generative models
  • Developing algorithms for training and inference in Energy-Based Models
  • Collaborate with cross-functional teams to apply research findings to develop new products or enhance existing ones
  • Publish research papers in top-tier journals and conferences, sharing findings with the broader scientific community
  • Stay abreast of the latest AI research and trends, identifying opportunities for innovation and improvement
  • Mentor junior researchers and engineers, fostering a culture of knowledge sharing and collaboration
  • Develop prototypes and proof-of-concept implementations to demonstrate the potential of research findings
  • Engage with the academic community by attending conferences, workshops, and seminars.
What we offer
What we offer
  • A competitive salary and extensive social benefits
  • Diverse and dynamic work environment
  • Work-life balance and support for career development.
  • Fulltime
Read More
Arrow Right

Research Scientist - Generative AI

This role involves conducting high-quality research in generative AI, designing ...
Location
Location
United States
Salary
Salary:
101900.00 - 234500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Artificial Intelligence, Machine Learning, Physics, Mathematics, or other related fields
  • 3-5 years working experience with training and fine-tuning generative AI models including LLMs, diffusion models, or Energy-Based Models
  • Proven track record of research in generative models, demonstrated through publications, patents, or publicly available projects
  • Proficiency in programming languages commonly used in AI research, such as Python, and experience with AI/ML frameworks (e.g., TensorFlow, PyTorch)
  • Deep understanding of machine learning algorithms and principles, especially in the context of generative AI
  • Strong mathematical background, with excellent skills in areas such as statistics, probability, linear algebra
  • Creative and analytical thinking abilities, with a passion for solving complex problems
  • Excellent communication skills, capable of conveying complex ideas clearly and engaging with both technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Conduct high-quality research in generative AI, including but not limited to designing algorithms for pre-training and post-training current autoregressive and diffusion models for multimodal data
  • Design, implement, and validate new algorithms and models for augmented LLMs, pushing the boundaries of AI capabilities
  • Developing and prototyping novel algorithms for fine-turning, retrieval augmented generation, and in-context learning for various generative models
  • Developing algorithms for training and inference in Energy-Based Models
  • Collaborate with cross-functional teams to apply research findings to develop new products or enhance existing ones
  • Publish research papers in top-tier journals and conferences, sharing findings with the broader scientific community
  • Stay abreast of the latest AI research and trends, identifying opportunities for innovation and improvement
  • Mentor junior researchers and engineers, fostering a culture of knowledge sharing and collaboration
  • Develop prototypes and proof-of-concept implementations to demonstrate the potential of research findings
  • Engage with the academic community by attending conferences, workshops, and seminars
What we offer
What we offer
  • A competitive salary and extensive social benefits
  • Diverse and dynamic work environment
  • Work-life balance and support for career development
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Applied AI Researcher, Post-Training

The Post-Training team focuses on adapting foundation models to real-world perfo...
Location
Location
United States , San Francisco; New York
Salary
Salary:
130000.00 - 250000.00 USD / Year
distyl.ai Logo
Distyl AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep Understanding of Post-training Techniques: Familiarity with supervised fine-tuning, preference optimization (RLHF/DPO), LoRA/PEFT, and instruction-tuning pipelines
  • Experience Adapting Frontier Models: You’ve tuned or adapted LLMs/SLMs to specialized domains or behaviors through data curation, reward modeling, or continual pretraining
  • Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.)
  • Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done
  • Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow
  • Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI
  • Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize
Job Responsibility
Job Responsibility
  • Researchers develop and evaluate techniques such as supervised fine-tuning, preference optimization (DPO, RLHF, RLAIF), and continual adaptation to align models with Distyl’s enterprise systems
  • Researchers in Post-Training investigate new methods for aligning large models with human and system-level objectives. They explore trade-offs between generalization and specialization, data efficiency and robustness, capability and controllability
What we offer
What we offer
  • 100% covered medical, dental, and vision for employees and dependents
  • 401(k) with additional perks (e.g., commuter benefits, in‑office lunch)
  • Access to state‑of‑the‑art models, generous usage of modern AI tools, and real‑world business problems
  • Ownership of high‑impact projects across top enterprises
  • A mission‑driven, fast‑moving culture that prizes curiosity, pragmatism, and excellence
  • meaningful equity
  • Fulltime
Read More
Arrow Right

Machine Learning Research Scientist / Research Engineer, Post-Training

Scale works with the industry’s leading AI labs to provide high quality data and...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
252000.00 - 315000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field
  • Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning
  • Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning
  • Excellent written and verbal communication skills
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
  • Previous experience in a customer facing role
Job Responsibility
Job Responsibility
  • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities
  • Design and experiment new approaches to preference optimization
  • Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness
  • Publish research findings in top-tier AI conferences
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • commuter stipend
  • Fulltime
Read More
Arrow Right

Member of Technical Staff – Model Training

At Inflection AI, our public benefit mission is to harness the power of AI to im...
Location
Location
United States , Palo Alto
Salary
Salary:
175000.00 - 350000.00 USD / Year
inflection.ai Logo
Inflection AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have hands-on experience training and fine-tuning large transformer models on multi-GPU / multi-node clusters
  • Are fluent in PyTorch and its ecosystem tools (Torchtune, FSDP, DeepSpeed) and enjoy digging into distributed-training internals, mixed precision, and memory-efficiency tricks
  • Have shipped or published work in RLHF, DPO, GRPO, or RLAIF and understand their practical trade-offs
  • Care deeply about training tools, pipelines, and reproducibility—you automate the boring parts so you can iterate on the fun parts
  • Balance research curiosity with product pragmatism—you know when to run an ablation and when to ship
  • Communicate crisply with both technical and non-technical teammates
  • Have a bachelor’s degree or equivalent in a related field to the offered position requirements
Job Responsibility
Job Responsibility
  • Contribute to end-to-end post-training workflows—dataset curation, hyper-parameter search, evaluation, and rollout—using PyTorch, Torchtune, FSDP/DeepSpeed, and our internal orchestration stack
  • Prototype and compare alignment techniques (e.g., curriculum RL, multi-objective reward modeling, tool-use fine-tuning) and push the best ideas into production
  • Automate training at scale: build robust pipeline components, tools, scripts, and dashboards so experiments are reproducible and easy to trace
  • Define the metrics that matter
  • run A/B tests and iterate quickly to meet aggressive quality targets
  • Collaborate with inference, safety, and product teams to land improvements in customer-facing systems
What we offer
What we offer
  • Diverse medical, dental and vision options
  • 401k matching program
  • Unlimited paid time off
  • Parental leave and flexibility for all parents and caregivers
  • Support of country-specific visa needs for international employees living in the Bay Area
  • Competitive stock options
Read More
Arrow Right

Technical Advisor

The Microsoft Superintelligence team is building Humanist Superintelligence - ad...
Location
Location
United States , Mountain View; Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science or related technical field AND 4+ years technical engineering experience with coding OR equivalent experience
  • 8+ years of experience in software engineering, machine learning, or AI research
  • Advanced degree (MS/PhD) in Computer Science, Machine Learning, or related field
  • Software engineering skills with the ability to rapidly prototype and evaluate technical solutions
  • Understanding modern ML techniques, including large language models, multimodal systems, agents, or related areas
  • Written and verbal communication skills, with the ability to distill complex technical concepts for executive audiences
  • Track record of operating effectively in fast-paced, ambiguous environments
  • Ability to context-switch across diverse technical domains
  • Published research at top ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.)
  • Experience at a leading AI research lab or startup
Job Responsibility
Job Responsibility
  • Provide direct technical and strategic support to the CEO of Microsoft AI—identifying opportunities and risks across our research portfolio
  • Engage deeply with research teams on pre-training, post-training, multimodal systems, infrastructure, and safety/alignment work
  • Prototype and evaluate models, techniques, and infrastructure to develop insights and inform strategic decisions
  • Analyze technical tradeoffs and synthesize findings into clear, actionable recommendations for leadership
  • Stay at the forefront of AI research, tracking emerging techniques and evaluating their applicability to our work
  • Drive critical technical initiatives from conception through execution, ensuring alignment with strategic objectives
  • Serve as a bridge between the CEO and research teams, ensuring clear communication and alignment on priorities
  • Fulltime
Read More
Arrow Right

AI Research Lead

Perplexity is seeking an exceptional AI Research Tech Lead to drive our research...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 470000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years of experience working on relevant AI/ML projects with 3+ years in a technical leadership role
  • Proven track record of leading and mentoring technical and research teams
  • A Computer Science graduate degree at a premier academic institution
  • Deep expertise with large-scale LLMs and Deep Learning systems
  • Strong programming skills with versatility across multiple languages and frameworks
  • Demonstrated ability to set technical vision and drive execution
  • Experience with pre-training and post-training techniques (self-supervised learning along with SFT/DPO/GRPO/PPO)
  • Self-starter with exceptional ownership mentality and ability to work in ambiguous environments
  • Passion for solving challenging problems and pushing the boundaries of AI research
Job Responsibility
Job Responsibility
  • Define and execute the macro research direction across multiple modalities, including post-training LLMs for agent trajectories and future mid-training initiatives
  • Lead strategic research planning and roadmap development to advance Sonar model capabilities
  • Drive innovation in supervised and reinforcement learning techniques for query answering
  • Collaborate with leadership to align research priorities with product and business objectives
  • Coach and mentor a team of AI research scientists and engineers, fostering their technical and professional growth
  • Establish the long-term macro research direction across the team, including our direction across different modalities
  • Lead hiring and onboarding of new research talent
  • Create a collaborative environment that encourages knowledge sharing and innovation
  • Post-train SOTA LLMs on query answering using cutting-edge supervised and reinforcement learning techniques
  • Own and optimize the full stack data, training, and evaluation pipelines required for LLM post-training
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

Research Engineer

As a Research Engineer at Mercor, you’ll work at the intersection of engineering...
Location
Location
United States , San Francisco
Salary
Salary:
130000.00 - 500000.00 USD / Year
mercor.com Logo
Mercor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong applied research background, with a focus on post-training and/or model evaluation
  • Strong coding proficiency and hands-on experience working with machine learning models
  • Strong understanding of data structures, algorithms, backend systems, and core engineering fundamentals
  • Familiarity with APIs, SQL/NoSQL databases, and cloud platforms
  • Ability to reason deeply about model behavior, experimental results, and data quality
  • Excitement to work in person in San Francisco, five days a week (with optional remote Saturdays), and thrive in a high-intensity, high-ownership environment
Job Responsibility
Job Responsibility
  • Work on post-training and RLVR pipelines to understand how datasets, rewards, and training strategies impact model performance
  • Design and run reward-shaping experiments and algorithmic improvements (e.g., GRPO, DAPO) to improve LLM tool-use, agentic behavior, and real-world reasoning
  • Quantify data usability, quality, and performance uplift on key benchmarks
  • Build and maintain data generation and augmentation pipelines that scale with training needs
  • Create and refine rubrics, evaluators, and scoring frameworks that guide training and evaluation decisions
  • Build and operate LLM evaluation systems, benchmarks, and metrics at scale
  • Collaborate closely with AI researchers, applied AI teams, and experts producing training data
  • Operate in a fast-paced, experimental research environment with rapid iteration cycles and high ownership
What we offer
What we offer
  • Generous equity grant vested over 4 years
  • A $20K relocation bonus (if moving to the Bay Area)
  • A $10K housing bonus (if you live within 0.5 miles of our office)
  • A $1K monthly stipend for meals
  • Free Equinox membership
  • Health insurance
  • Fulltime
Read More
Arrow Right