CrawlJobs Logo

Applied Research - Evals & Data

Prime Intellect

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Prime Intellect builds the infrastructure that frontier AI labs build internally, and makes it available to everyone. Our platform, Lab, unifies environments, evaluations, sandboxes, and high-performance training into a single full-stack system for post-training at frontier scale, from RL and SFT to tool use, agent workflows, and deployment. This is a customer facing role at the intersection of cutting-edge RL/post-training methods, applied data, and agent systems. You’ll have a direct impact on shaping how advanced models are aligned, evaluated, deployed, and used in the real world.

Job Responsibility:

  • Advancing Agent Capabilities: Designing and iterating on next-generation AI agents that tackle real workloads—workflow automation, reasoning-intensive tasks, and decision-making at scale
  • Building Robust Infrastructure: Developing the distributed systems, evaluation pipelines, and coordination frameworks that enable these agents to operate reliably, efficiently, and at massive scale
  • Bridge Between Customers & Research: Translating customer needs and insights from applied data into clear technical requirements that guide product and research priorities
  • Prototype in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions
  • Customer-Facing Engineering: Work side-by-side with customers to deeply understand workflows, data sources, and bottlenecks
  • Post-training & Reinforcement Learning: Design and implement novel RL and post-training methods (RLHF, RLVR, GRPO, etc.) to align large models with domain-specific tasks
  • Agent Development & Infrastructure: Rapidly prototype and iterate on AI agents for automation, workflow orchestration, and decision-making

Requirements:

  • Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment
  • Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines)
  • Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate)
  • Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform)
  • Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL
  • Passion for advancing the state-of-the-art in reasoning, measurement, and building practical, agentic AI systems
What we offer:
  • Competitive Compensation + equity incentives
  • Flexible Work (remote or San Francisco)
  • Visa Sponsorship & relocation support
  • Professional Development budget
  • Team Off-sites & conference attendance

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Applied Research - Evals & Data

New

Senior Applied Scientist

As a Senior Applied Scientist, do you enjoy solving problems, looking at problem...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • equivalent experience
  • Experience with agentic frameworks and orchestration, agentic retrieval/search (multi-step, adaptive re-search) for enterprise environments, and systematic evals using rubrics + quantitative metrics to measure and improve agent quality end-to-end
  • 3+ years experience creating publications (e.g., patents, libraries, peer-reviewed academic papers)
  • Experience developing AI Agents
  • Experience presenting at conferences or other events in the outside research/industry community as an invited speaker
  • 3+ years experience conducting research as part of a research program (in academic or industry settings)
  • 1+ year(s) experience developing and deploying live production systems, as part of a product team
Job Responsibility
Job Responsibility
  • Bringing the State of the Art to Products
  • Establishes collaborative relationships with relevant product and business groups inside or outside of Microsoft and provides expertise or technology to create business impact
  • Takes initiative and drives activities such as technology transfers attempts, standards organizations, filing patents, authoring white papers, developing or maintaining tools/services for internal Microsoft use, or consulting for product or business groups
  • May publish research to promote receiving new intellectual property for business impact
  • Brings new technology and approaches into production by applying long-term research efforts to solve immediate product needs
  • Collaborates with and bridges the gap between researchers (in community across the company, Microsoft Research [MSR], or in their own organizations) and development teams
  • Begins to negotiate across teams to ensure cutting edge technology is being applied to products in a practical way that meets key business objectives
  • Develops an understanding of research approaches used across a group or organization to leverage (and not re-invent) solutions
  • Independently works to create product impact
  • Identifies approach, and applies, improves, or creates a research-backed solution (e.g., novel, data driven, scalable, extendable) to positively impact a Microsoft product or service
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Applied AI Engineer

We’re hiring a Applied AI Engineer to join a fast‑moving, high‑ownership team bu...
Location
Location
United States , Mountain View
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.
  • 2+ years shipping production-level code, models, or data analysis.
  • 1+ years using AI-assisted coding and analysis techniques.
  • Experience working on small teams and mid-stage startup environments.
  • Experience working on AI products.
  • PhD in engineering, applied math, statistics, or related analytical field.
  • 4+ years shipping production-level code, models, or data analysis.
  • Deep experience building from zero-to-one.
  • Hands on work hillclimbing AI evaluations.
Job Responsibility
Job Responsibility
  • LLM Feature & Agent Development
  • Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions.
  • Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.
  • Prototype new capabilities rapidly and iterate based on user signals and evaluation data.
  • Evaluation, Hillclimbing & Quality Systems
  • Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality.
  • Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance.
  • Analyze failure modes, design mitigations, and drive systematic improvements across the stack.
  • LLM Tooling & Internal Infrastructure
  • Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines
  • Fulltime
Read More
Arrow Right

Senior Data Scientist

M365 Copilot Cadets (Customer & Analytics‑Driven Eval Team) turns real‑world cus...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience
  • OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience
  • OR equivalent experience
  • Experience with building data pipelines, performing large-scale analysis, and implementing ML workflows using Python and SQL
  • Experience in developing models or designing evaluation frameworks, including A/B testing or prompt-based assessments for LLMs
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Evaluation & Feedback Analysis: Convert multi‑source feedback (dogfood, VIP customers, production traces) into a prioritized dataset of 10–100 tasks per scenario, each with prompts and golden outputs
  • maintain a living failure taxonomy prioritized by volume × impact × fixability
  • Rubrics & LLM‑as‑Judge: Author crisp, binary‑first rubrics across 7–30 dimensions (e.g., correctness/completeness, refusal calibration, tool‑use quality, formatting/contract, persona/tone, trace hygiene)
  • Build grader prompts (with few‑shots and counter‑examples) that achieve ≥80% human‑match rate, track TPR/TNR on held‑out sets, and prevent reward hacking
  • Synthetic & Human‑Labeled Data: Design structured tuples to scale high‑signal synthetic data
  • orchestrate vendor/partner annotation sprints and live calibrations to align shared judgment
  • Ensure datasets are reproducible with linked artifacts and robust metadata/trace hygiene
  • Customer‑Grounded Scenarios: Partner with PMs/solution architects to co‑develop evals with VIP customers so tasks reflect real outcomes and workflows
  • quantify lift from fixes and inform the next hill‑climb
  • Team Leadership & Ways of Working: Co‑own the Cadets “feedback flywheel” with PM/Eng (instrumentation, taxonomy, guardrails vs. evaluators) and help operationalize weekly checklists, change logs, and judge refresh cadence
  • Fulltime
Read More
Arrow Right

User Experience Researcher - AI

We are looking for a researcher to help shape next-gen compliance systems, exper...
Location
Location
United States , Menlo Park
Salary
Salary:
266000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 13+ years of relevant experience in user experience, applied research and/or product research and development or a Master’s degree and 11+ years relevant experience, or PhD and 8+ years relevant experience
  • Experience driving research direction and serving as a thought partner to leadership
  • Experience in working with Design, Data, Product, and Engineering to empower product development through foundational research and AI evaluation
  • Experience running or designing AI model evals from a UX perspective
  • Experience with code/scripting to prototype research tools or analyze data programmatically
Job Responsibility
Job Responsibility
  • Thought partner to Risk organizational leadership on research direction
  • Determine foundational questions for the orginization with a holistic view
  • Navigate ambiguity—shape direction in AI pods with minimal context
  • Act as direction lead or "unblocker" in fast-moving environments
  • Run evals and measurement (large scale benchmarking, evals, risk flywheel measurement)
  • Work with engineering and cross functional partners on evals to inform how we can improve models
  • Inform designing and architecting AI products from the start—not just evaluating after the fact
  • Develop new methods and innovate new approaches (stimuli generation, upper funnel work, etc.)
  • Apply systems thinking to connect the dots holistically across the organization
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Applied Scientist - Security Research

Security represents the most critical priorities for our customers in a world aw...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 2+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 1+ year(s) experience creating publications (e.g., patents, peer-reviewed academic papers)
  • 1+ years experience bringing new technology and approaches into production by applying long-term research efforts to solve immediate product needs
  • 4+ years applied ML/NLP experience delivering models and features to production at scale
  • Languages and Technology - Python, R, C, SQL, Kusto / Azure Data Explorer, Power BI
Job Responsibility
Job Responsibility
  • Model development & optimization. Design, develop, fine‑tune, and evaluate models, summarization, and reasoning
  • Data & evaluation at scale. Build/extend data pipelines for curation/labeling/feature stores
  • author offline eval harnesses
  • run A/Bs
  • define guardrails and success metrics
  • Production ML engineering. contribute to service code and configs
  • add monitoring, tracing, dashboards, and auto‑scaling
  • participate in on‑call and postmortems to improve live‑site reliability
  • Collaboration & mentoring. Partner across PM/ENG/Research teams and beyond
  • identify AI technologies to create an adaptive and scalable solution to provide protection for our customers, share methods and code, review PRs, improve reproducibility and documentation
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right

Principal Group Product Manager - SharePoint

SharePoint powers content, collaboration, and knowledge for the enterprise. It w...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND in-depth experience in leading product/service/program management or software development OR equivalent experience
  • In-depth people management and leadership experience
  • Relevant experience working with LLM/ML and building/shipping AI products to market
  • Experience presenting to leadership and executive audiences
Job Responsibility
Job Responsibility
  • Lead a team of product managers focused on SharePoint AI solutions
  • Work with designers, researchers, data science, applied science, marketing, and business partners to expand SharePoint’s role as a leader in Enterprise Content Management
  • Overall accountability to grow the OneDrive and SharePoint AI usage by developing solutions to highlight the capabilities of SharePoint, OneDrive and Copilot and other Microsoft AI applications
  • Drive our core AI feature investments towards our mission to deliver state-of-the-art AI solutions for customers
  • Accountability for coaching the team and raising the bar on evals to help ensure delivery of AI capabilities of the highest quality
  • Lead key partnerships with a diverse set of organizations, helping customers have the best Microsoft 365 experience they can
  • Grow talented PMs across all level bands and skillsets, improving and upgrading our talent as a team and creating the next generation of leaders
  • Engage with customers, both directly & indirectly, in formal and informal opportunities from conferences like Ignite to calls with customers to connect directly with them
  • Partner with Applied Science & Research, as well as your peers in PM across Microsoft to deliver seamless end-to-end experiences
  • Partner across our engineering teams to build a shared understanding and help organize our work across teams and services
  • Fulltime
Read More
Arrow Right