CrawlJobs Logo

Data Scientist, Evals

perplexity.ai Logo

Perplexity

Location Icon

Location:
United States; United Kingdom; Serbia; Germany , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

210000.00 - 385000.00 USD / Year

Job Description:

Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

Job Responsibility:

  • Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
  • Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
  • Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
  • Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
  • Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality

Requirements:

  • PhD or MS in a technical field or equivalent experience
  • 4+ years of experience in data science or machine learning
  • Strong proficiency in Python and SQL (expected to write production-grade code)
  • Experience building within a modern cloud data stack, specifically AWS and Databricks
  • Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster

Nice to have:

  • 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
  • Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
  • A strong research background, with experience applying research methods to real-world ML problems
  • Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets
What we offer:
  • Offers Equity
  • Full-time U.S. employees enjoy a comprehensive benefits program including equity, health, dental, vision, retirement, fitness, commuter and dependent care accounts, and more
  • Full-time employees outside the U.S. enjoy a comprehensive benefits program tailored to their region of residence

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Data Scientist, Evals

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right

Data Scientist

Hunter Douglas is in a multi-year transformation to modernize operations, digita...
Location
Location
Netherlands , Rotterdam
Salary
Salary:
Not provided
hunterdouglas.com Logo
Hunter Douglas
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in DS/ML with technical leadership
  • consistent record of business impact
  • Fluent in Python & SQL
  • strong ML fundamentals (statistical learning, gradient boosting, time series
  • working knowledge of deep learning)
  • Hands-on with GenAI: prompt design, embeddings/vector stores, fine‑tuning/adapters, agent frameworks (e.g., LangChain/LlamaIndex), and LLM eval
  • Consulting-grade problem solving & communication: engage executives, structure ambiguity, tell a quantified story
  • Solid software & data practices: version control, testing, code review
  • ETL/ELT & APIs
  • cloud (GCP/Azure)
Job Responsibility
Job Responsibility
  • Partner with executive management on high‑impact initiatives across pricing, CX, and growth
  • bring structured, fact‑based recommendations and own delivery
  • Translate strategy to impact: convert business goals into AI/ML workstreams with clear metrics (revenue, margin, CX)
  • track outcomes
  • Own end-to-end delivery: problem framing → data strategy → modeling → experimentation/causal inference → deployment → monitoring
  • Build GenAI products: RAG systems, tool-using agents, workflow orchestration, guardrails/safety, and systematic LLM evaluation
  • Design high-leverage models: e.g., product optimization, demand forecasting, segmentation/CLV, recommendations, attribution
  • Integrate & ship: partner with engineering to connect legacy/modern systems
  • deliver quick wins that scale
  • Coach & influence: mentor data scientists/analysts
What we offer
What we offer
  • Direct exposure to senior leadership and the chance to shape a global transformation
  • Significant ownership with the stability of a market leader
  • Competitive compensation, performance bonus, and rapid growth opportunities
  • supportive environment for professional growth
Read More
Arrow Right

Director, AI Data Strategy and Operations

At Microsoft AI, we are on a mission to train the world’s most capable AI fronti...
Location
Location
United States , Mountain View
Salary
Salary:
106400.00 - 203600.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Business Administration or related field AND 4+ years strategy or finance, data operations, program/technical program management, procurement, or finance integration
  • OR Bachelor's Degree in Business, Finance, Economics, Computer Science, or related field AND 6+ years strategy or finance, data operations, program/technical program management, procurement, or finance integration
  • OR equivalent experience
  • Bachelor’s Degree AND 10+ years' experience in technical data operations, procurement program management, or finance systems integration OR equivalent experience
  • Proven experience in leading large, cross-functional programs and strategically managing AI data vendor relationships
  • Expertise in managing data operations for AI research organizations and working closely with scientists on data collections
  • Demonstrated ability to maintain oversee budget/impact tracking systems and cross-functional programs
  • Experience leading large-scale multimodal data programs for AI research organizations
Job Responsibility
Job Responsibility
  • Lead post-training data collection (e.g. human evals, training data), including vendor selection, contract negotiation, and management
  • Act as primary data operations POC for key research efforts and pillars, including Multimodality (images, audio, video) and multilingual
  • Support data operations across human data vendors, including tracking of POs, spend, amortization schedules, and milestone delivery
  • Coordinate cross-functional initiatives across Business Development, Procurement, Legal, Compliance, Finance, and Engineering
  • Advance the AI frontier responsibly and embody Microsoft’s culture and values
  • Fulltime
Read More
Arrow Right
New

Data Scientist

As a Data Scientist on Codex, you will measure and accelerate product-market fit...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in a quantitative role at a developer-facing or high-growth product
  • Fluency in SQL and Python
  • comfort with experiment design and causal inference
  • Experience defining product metrics tied to user value
  • Ability to communicate clearly with PM, Eng, and Design—and to influence product direction
Job Responsibility
Job Responsibility
  • Embed with the Codex product team to discover opportunities that improve developer outcomes and growth
  • Design and interpret A/B tests and staged rollouts of new coding models and product features
  • Define and operationalize metrics such as suggestion acceptance, edit distance, compile/test pass rates, task completion, latency, and session productivity
  • Build dashboards and analyses that help the team self-serve answers to product questions (by language, framework, repo size, task type)
  • Diagnose failure modes and partner with Research on targeted improvements (model quality signals, user feedback, evals)
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick and safe time (1 hour per 30 hours worked)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right
New

Research Engineer II

As a Research Engineer II at Microsoft you will apply both software engineering ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • 2+ years of professional experience working with generative artificial intelligence, large language models, or agent-based systems
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design and develop highly usable, scalable application capabilities, integrating AI models and enhancing existing features to meet evolving customer needs
  • Build and debug production-grade code in distributed systems
  • Translate business requirements into AI solutions, collaborating with data scientists, research scientists, product managers, and engineering teams to ensure alignment and impact
  • Optimize AI model performance and reliability in production environments, including retraining, evaluation, and continuous monitoring
  • Own deployment, quality and operation of AI systems, including automated evals, CI/CD pipelines, deployment, and monitoring with strong MLOps and DevOps practices
  • Troubleshoot live site issues as part of both product development and live site support rotations, ensuring rapid resolution and learning
  • Fulltime
Read More
Arrow Right
New

Data Scientist, Platform and B2B Products

As a Data Scientist on the Platform team, you will drive a data-driven culture f...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in a quantitative role in ambiguous, high-growth environments (platforms, APIs, or B2B products a plus)
  • Depth in SQL and Python, with a track record proposing, designing, and running rigorous experiments
  • Experience defining and operationalizing metrics from scratch (including reliability/latency/cost and safety)
  • Strong cross-functional communication with PMs, engineers, and executives
  • Strategic instincts beyond p-values—clear thinking about tradeoffs and business impact
Job Responsibility
Job Responsibility
  • Embed with the Platform product team as a trusted partner, uncovering ways to improve developer experience, reliability, and usage growth
  • Define north-star metrics across the developer funnel (activation, retention, growth), as well as latency/cost guardrails for new features and models
  • Design and interpret A/B tests and controlled rollouts (e.g., new model versions, pricing/limits, new API features, new B2B products)
  • Build source-of-truth dashboards and self-serve data tools for product, engineering, and go-to-market teams
  • Translate product learnings into actionable feedback for Research (e.g., failure modes, eval gaps, model response quality)
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick and safe time (1 hour per 30 hours worked)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Data Scientist

We’re looking for data scientists to help build the next generation of post-trai...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Job Responsibility
Job Responsibility
  • Design evaluations of advanced model capabilities and use them to drive rapid, high-signal iteration loops
  • Work with vendors to produce high quality evaluation and training data
  • Build data pipelines to produce high quality evaluation and training data
  • Build data flywheels to hill-climb on model weaknesses, using data from various surfaces where our models are deployed
  • Ensure optimal quality, quantity and coverage of data across our post-training stages
  • Run post-training experiments and ablations to produce models that climb our evals
  • Embody our culture and values.
  • Fulltime
Read More
Arrow Right