CrawlJobs Logo

Applied Research - Evals & Data

Prime Intellect

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Prime Intellect builds the infrastructure that frontier AI labs build internally, and makes it available to everyone. Our platform, Lab, unifies environments, evaluations, sandboxes, and high-performance training into a single full-stack system for post-training at frontier scale, from RL and SFT to tool use, agent workflows, and deployment. This is a customer facing role at the intersection of cutting-edge RL/post-training methods, applied data, and agent systems. You’ll have a direct impact on shaping how advanced models are aligned, evaluated, deployed, and used in the real world.

Job Responsibility:

  • Advancing Agent Capabilities: Designing and iterating on next-generation AI agents that tackle real workloads—workflow automation, reasoning-intensive tasks, and decision-making at scale
  • Building Robust Infrastructure: Developing the distributed systems, evaluation pipelines, and coordination frameworks that enable these agents to operate reliably, efficiently, and at massive scale
  • Bridge Between Customers & Research: Translating customer needs and insights from applied data into clear technical requirements that guide product and research priorities
  • Prototype in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions
  • Customer-Facing Engineering: Work side-by-side with customers to deeply understand workflows, data sources, and bottlenecks
  • Post-training & Reinforcement Learning: Design and implement novel RL and post-training methods (RLHF, RLVR, GRPO, etc.) to align large models with domain-specific tasks
  • Agent Development & Infrastructure: Rapidly prototype and iterate on AI agents for automation, workflow orchestration, and decision-making

Requirements:

  • Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment
  • Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines)
  • Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate)
  • Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform)
  • Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL
  • Passion for advancing the state-of-the-art in reasoning, measurement, and building practical, agentic AI systems
What we offer:
  • Competitive Compensation + equity incentives
  • Flexible Work (remote or San Francisco)
  • Visa Sponsorship & relocation support
  • Professional Development budget
  • Team Off-sites & conference attendance

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Applied Research - Evals & Data

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right
New

Principal Group Product Manager - SharePoint

SharePoint powers content, collaboration, and knowledge for the enterprise. It w...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND in-depth experience in leading product/service/program management or software development OR equivalent experience
  • In-depth people management and leadership experience
  • Relevant experience working with LLM/ML and building/shipping AI products to market
  • Experience presenting to leadership and executive audiences
Job Responsibility
Job Responsibility
  • Lead a team of product managers focused on SharePoint AI solutions
  • Work with designers, researchers, data science, applied science, marketing, and business partners to expand SharePoint’s role as a leader in Enterprise Content Management
  • Overall accountability to grow the OneDrive and SharePoint AI usage by developing solutions to highlight the capabilities of SharePoint, OneDrive and Copilot and other Microsoft AI applications
  • Drive our core AI feature investments towards our mission to deliver state-of-the-art AI solutions for customers
  • Accountability for coaching the team and raising the bar on evals to help ensure delivery of AI capabilities of the highest quality
  • Lead key partnerships with a diverse set of organizations, helping customers have the best Microsoft 365 experience they can
  • Grow talented PMs across all level bands and skillsets, improving and upgrading our talent as a team and creating the next generation of leaders
  • Engage with customers, both directly & indirectly, in formal and informal opportunities from conferences like Ignite to calls with customers to connect directly with them
  • Partner with Applied Science & Research, as well as your peers in PM across Microsoft to deliver seamless end-to-end experiences
  • Partner across our engineering teams to build a shared understanding and help organize our work across teams and services
  • Fulltime
Read More
Arrow Right
New

AI Architect

We’re hiring an AI Architect to sit at the intersection of frontier AI research,...
Location
Location
United States , San Francisco; New York
Salary
Salary:
201600.00 - 241920.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep technical background in applied AI/ML: 5–10+ years in research, engineering, solutions engineering, or technical product roles working on LLMs or multimodal systems, ideally in high-stakes, customer-facing environments
  • Hands-on experience with model improvement workflows: demonstrated experience with post-training techniques, evaluation design, benchmarking, and model quality iteration
  • Ability to work on hard, ambiguous technical problems: proven track record of partnering directly with advanced customers or research teams to scope, reason through, and execute on deep technical challenges involving frontier models
  • Strong technical fluency: you can read papers, interrogate metrics, write or review complex Python/SQL for analysis, and reason about model-data trade-offs
  • Executive presence with world-class researchers and enterprise leaders
  • excellent writing and storytelling
  • Bias to action: you ship, learn, and iterate.
Job Responsibility
Job Responsibility
  • Translate research → product: work with client side researchers on post-training, evals, safety/alignment and build the primitives, data, and tooling they need
  • Partner deeply with core customers and frontier labs: work hands-on with leading AI teams and frontier research labs to tackle hard, open-ended technical problems related to frontier model improvement, performance, and deployment
  • Shape and propose model improvement work: translate customer and research objectives into clear, technically rigorous proposals—scoping post-training, evaluation, and safety work into well-defined statements of work and execution plans
  • Translate research into production impact: collaborate with customer-side researchers on post-training, evaluations, and alignment, and help design the data, primitives, and tooling required to improve frontier models in practice
  • Own the end-to-end lifecycle: lead discovery, write crisp PRDs and technical specs, prioritize trade-offs, run experiments, ship initial solutions, and scale successful pilots into durable, repeatable offerings
  • Lead complex, high-stakes engagements: independently run technical working sessions with senior customer stakeholders
  • define success metrics
  • surface risks early
  • and drive programs to measurable outcomes
  • Partner across Scale: collaborate closely with research (agents, browser/SWE agents), platform, operations, security, and finance to deliver reliable, production-grade results for demanding customers
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity based compensation.
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist - Security Research

Security represents the most critical priorities for our customers in a world aw...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Model development & optimization. Design, develop, fine‑tune, and evaluate models, summarization, and reasoning
  • Data & evaluation at scale. Build/extend data pipelines for curation/labeling/feature stores
  • author offline eval harnesses
  • run A/Bs
  • define guardrails and success metrics
  • Production ML engineering. contribute to service code and configs
  • add monitoring, tracing, dashboards, and auto‑scaling
  • participate in on‑call and postmortems to improve live‑site reliability
  • Collaboration & mentoring. Partner across PM/ENG/Research teams and beyond
  • identify AI technologies to create an adaptive and scalable solution to provide protection for our customers, share methods and code, review PRs, improve reproducibility and documentation
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, Physics, or a related field and 4 or more years in applied ML or AI research and product engineering
  • Master’s degree and 3 or more years in applied ML or AI research and product engineering
  • PhD in a relevant field and 2 or more years with generative AI, LLMs, or related ML algorithms
  • Proficiency in Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer, and government security screening requirements
  • Microsoft Cloud Background Check upon hire or transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Bringing State-of-the-Art Research to Products
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • End-to-End System Development
  • ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - AI Engineer

Join our AI Engineering team to explore the boundaries of LLMs and AI, shaping T...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
tessl.ai Logo
Tessl
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience as a Software Engineer
  • Equally comfortable contributing to a mature codebase with strict CI criteria or hacking up a quick notebook to prove/disprove something
  • Proven experience collaborating with researchers and bridging between research-focused and engineering-focused teams
  • Experience with the applied use of data and statistics, likely to spot and avoid bad data when you see it
  • Deeply curious about AI and excited about its potential to transform software engineering
Job Responsibility
Job Responsibility
  • Use a bit of jq and grep to quickly navigate a dataset, but recognise when it’s time to use a more robust approach and move the team to something like dbt or duckdb
  • Tune a prompt in our generation workflow, eval the results and write an experiment report on your findings. Leave the eval tooling better than when you found it
  • Rapidly prototype a new language integration for our code generation pipeline, then develop a plan for a scalable implementation
  • Factor out a piece of our pipeline to use FaaS, unlocking 1,000x larger evals to run in nearly constant time
  • Add support for a new model in our elegant model abstraction library, or rewrite it when new model capabilities prove our existing design wrong
  • Work with our platform team on the next generation of eval facilities, based on your understanding of what researchers need and where the platform is heading
What we offer
What we offer
  • 25 days holiday
  • health insurance, including dental and vision, which extends to partners and dependents
  • company-matched pension
  • commuting stipend for those who live outside London
  • cycle to work scheme
  • Fulltime
Read More
Arrow Right
New

Applied Data Scientist

As an Applied Data Scientist on our Insights team, you will help pioneer the nex...
Location
Location
United States
Salary
Salary:
150000.00 - 200000.00 USD / Year
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years building and shipping models for real-world business applications, ideally in NLP and LLM-based systems
  • Strong proficiency in Python and standard ML / data tooling (e.g., SQL, data pipelines, experiment frameworks)
  • World-class first principles thinking and ML intuition
  • Ability to turn ambiguous product asks into crisp problem statements, eval specs, metrics, and hypotheses
  • Experience working directly with customers or internal stakeholders to understand constraints, explain tradeoffs, and iterate on solutions
  • Comfort working with design-partner style engagements where requirements evolve rapidly and you’re expected to co-create the solution
  • Track record of building evaluation suites that go beyond single scalar metrics to capture reliability, safety, and qualitative user experience
  • Strong written and verbal communication skills
  • able to clearly explain complex technical work to both engineers and non-technical partners
Job Responsibility
Job Responsibility
  • Co-develop new capabilities with a small number of high-impact enterprise customers along with our product, engineering, and design teams
  • using their real workflows and constraints as your testbed
  • Communicate effectively across all levels of the organization
  • Plan and run short, focused design-partner engagements (days to weeks) where you ship early versions, collect structured feedback, and iterate quickly
  • Generalize learnings from each design partner into reusable, productized capabilities rather than one-off bespoke models
  • Partner with domain experts to curate high-quality eval guidelines and datasets for domains such as CSAT prediction and outcome prediction (across both human<>human and human<>AI interactions)
  • Use the best tools + models for the job (simple and interpretable where it matters, sophisticated where it can drive outsized value
  • Write clear specs and experiment reports that make tradeoffs and assumptions explicit
  • Stay close to the research frontier in ML/AI, LLMs, and evals, translating promising ideas into pragmatic, shippable improvements
  • Where applicable, help translate your solutions into publications, whitepapers, technical blogs, etc.
What we offer
What we offer
  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO to take the time you need, when you need it
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan to help you plan for the future
  • Remote work setup budget to help you create a productive home office
  • Monthly wellness and communication stipend to keep you connected and balanced
  • In-office meal program and commuter benefits provided for onsite employees
  • Offers Equity
  • Fulltime
Read More
Arrow Right