CrawlJobs Logo

Data Scientist, Evals

perplexity.ai Logo

Perplexity

Location Icon

Location:
United States; United Kingdom; Serbia; Germany , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

210000.00 - 385000.00 USD / Year

Job Description:

Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

Job Responsibility:

  • Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
  • Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
  • Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
  • Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
  • Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality

Requirements:

  • PhD or MS in a technical field or equivalent experience
  • 4+ years of experience in data science or machine learning
  • Strong proficiency in Python and SQL (expected to write production-grade code)
  • Experience building within a modern cloud data stack, specifically AWS and Databricks
  • Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster

Nice to have:

  • 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
  • Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
  • A strong research background, with experience applying research methods to real-world ML problems
  • Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets
What we offer:
  • Offers Equity
  • Full-time U.S. employees enjoy a comprehensive benefits program including equity, health, dental, vision, retirement, fitness, commuter and dependent care accounts, and more
  • Full-time employees outside the U.S. enjoy a comprehensive benefits program tailored to their region of residence

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Data Scientist, Evals

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary, annual bonus and equity
  • Regular compensation reviews
  • Generous paid time off above statutory minimum
  • Hybrid working
  • MacBooks are our standard, but we also offer Windows for certain roles when needed
  • Fun events for Intercomrades, friends, and family
  • Fulltime
Read More
Arrow Right

Senior Data Scientist - AI Tooling

The Research, Analytics & Data Science (RAD) team turns insight into action. We ...
Location
Location
Ireland; United Kingdom , Dublin; London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of applied data science with measurable GTM impact
  • LLM/ML application experience - familiarity with RAG, prompt and tool design, vector search, evals and have leveraged AI for development
  • Excellent SQL skills and fluency in Python or R, with experience applying analytical and statistical methods to business problems
  • Experience with orchestration tools (e.g., DBT, Airflow) for deploying reliable data workflows
  • Strong communication and empathy - ability to translate complex data concepts for non-technical stakeholders
  • Collaborative product mindset - comfort working closely with Sales and Success teams to turn ambiguity into clear deliverables
Job Responsibility
Job Responsibility
  • Design, evaluate, and ship AI-powered internal tools for GTM use cases - including account research & summaries, next-best-action recommendations, renewal propensity, pipeline risk detection, QBR/autobrief generation, and post-call summarization & follow-ups
  • Work end-to-end: Own the full lifecycle, from problem definition and data modeling to building production-ready tools, including writing Python backends and React frontends
  • Prototype fast, ship to learn: Rapidly build with users, then productionize quickly to iterate and deliver impact
  • Instrument for adoption and outcomes: Define success through real usage and measurable business impact (e.g., improved win rate, conversion, expansion)
  • Evangelize and enable: Document playbooks, run enablement sessions, and help leaders operationalize new tooling across teams
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews - we reward great work
  • Peace of mind with life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Open vacation policy and flexible holidays so you can take time off when you need it
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers, to let you spend valuable time with your loved ones
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
Read More
Arrow Right

Data Scientist

Hunter Douglas is in a multi-year transformation to modernize operations, digita...
Location
Location
Netherlands , Rotterdam
Salary
Salary:
Not provided
hunterdouglas.com Logo
Hunter Douglas
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in DS/ML with technical leadership
  • consistent record of business impact
  • Fluent in Python & SQL
  • strong ML fundamentals (statistical learning, gradient boosting, time series
  • working knowledge of deep learning)
  • Hands-on with GenAI: prompt design, embeddings/vector stores, fine‑tuning/adapters, agent frameworks (e.g., LangChain/LlamaIndex), and LLM eval
  • Consulting-grade problem solving & communication: engage executives, structure ambiguity, tell a quantified story
  • Solid software & data practices: version control, testing, code review
  • ETL/ELT & APIs
  • cloud (GCP/Azure)
Job Responsibility
Job Responsibility
  • Partner with executive management on high‑impact initiatives across pricing, CX, and growth
  • bring structured, fact‑based recommendations and own delivery
  • Translate strategy to impact: convert business goals into AI/ML workstreams with clear metrics (revenue, margin, CX)
  • track outcomes
  • Own end-to-end delivery: problem framing → data strategy → modeling → experimentation/causal inference → deployment → monitoring
  • Build GenAI products: RAG systems, tool-using agents, workflow orchestration, guardrails/safety, and systematic LLM evaluation
  • Design high-leverage models: e.g., product optimization, demand forecasting, segmentation/CLV, recommendations, attribution
  • Integrate & ship: partner with engineering to connect legacy/modern systems
  • deliver quick wins that scale
  • Coach & influence: mentor data scientists/analysts
What we offer
What we offer
  • Direct exposure to senior leadership and the chance to shape a global transformation
  • Significant ownership with the stability of a market leader
  • Competitive compensation, performance bonus, and rapid growth opportunities
  • supportive environment for professional growth
Read More
Arrow Right

AI Data Scientist

You'll be Gamma's first Data Scientist, turning massive volumes of user data and...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 310000.00 USD / Year
gamma.app Logo
Gamma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience as a data scientist at product-focused tech companies, with experience managing or mentoring other data scientists
  • Strong statistical foundations with hands-on experience designing and analyzing A/B tests and experiments at scale
  • Experience working with large-scale data and building metrics frameworks from scratch, including modern data stack tools like dbt and Snowflake and comfort analyzing unstructured or text data
  • Experience working with AI/ML products, especially LLMs or generative AI, with familiarity evaluating model performance in production settings
  • Ability to communicate complex technical concepts to non-technical stakeholders and influence product decisions with clarity and conviction
  • A clear perspective on how agentic coding is transforming the data science role, and genuine excitement about applying AI to your own work (Nice to have)
Job Responsibility
Job Responsibility
  • Build frameworks that help the team understand AI model performance, user behavior, and product health across Gamma's platform
  • Dig into AI model outputs across user cohorts to identify quality gaps and create evals and metrics to measure improvement
  • Partner with engineering and product to define quality metrics for AI-generated content and user satisfaction
  • Develop statistical models and frameworks that empower product teams to make data-informed decisions independently
  • Design and analyze large-scale A/B tests and experiments with statistical rigor to measure product impact and guide prioritization
What we offer
What we offer
  • equity
  • Fulltime
Read More
Arrow Right

Director, AI Data Strategy and Operations

At Microsoft AI, we are on a mission to train the world’s most capable AI fronti...
Location
Location
United States , Mountain View
Salary
Salary:
106400.00 - 203600.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Business Administration or related field AND 4+ years strategy or finance, data operations, program/technical program management, procurement, or finance integration
  • OR Bachelor's Degree in Business, Finance, Economics, Computer Science, or related field AND 6+ years strategy or finance, data operations, program/technical program management, procurement, or finance integration
  • OR equivalent experience
  • Bachelor’s Degree AND 10+ years' experience in technical data operations, procurement program management, or finance systems integration OR equivalent experience
  • Proven experience in leading large, cross-functional programs and strategically managing AI data vendor relationships
  • Expertise in managing data operations for AI research organizations and working closely with scientists on data collections
  • Demonstrated ability to maintain oversee budget/impact tracking systems and cross-functional programs
  • Experience leading large-scale multimodal data programs for AI research organizations
Job Responsibility
Job Responsibility
  • Lead post-training data collection (e.g. human evals, training data), including vendor selection, contract negotiation, and management
  • Act as primary data operations POC for key research efforts and pillars, including Multimodality (images, audio, video) and multilingual
  • Support data operations across human data vendors, including tracking of POs, spend, amortization schedules, and milestone delivery
  • Coordinate cross-functional initiatives across Business Development, Procurement, Legal, Compliance, Finance, and Engineering
  • Advance the AI frontier responsibly and embody Microsoft’s culture and values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Scientist

We’re looking for data scientists to help build the next generation of post-trai...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands‑on experience with large language models, including training or applying them in production (not just prompting)
  • Designing and running post‑training experiments (evals, ablations, preference tuning / RLHF‑style methods)
  • Building and owning scalable data pipelines for training and evaluation data
  • Strong Python skills for ML experimentation, data processing, and analysis
  • Solid statistical, experimental, and general engineering fundamentals
Job Responsibility
Job Responsibility
  • Design evaluations of advanced model capabilities and use them to drive rapid, high-signal iteration loops
  • Work with vendors to produce high quality evaluation and training data
  • Build data pipelines to produce high quality evaluation and training data
  • Build data flywheels to hill-climb on model weaknesses, using data from various surfaces where our models are deployed
  • Ensure optimal quality, quantity and coverage of data across our post-training stages
  • Run post-training experiments and ablations to produce models that climb our evals
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

AI Product Manager

As our AI Product Manager, you’ll be defining and delivering the next generation...
Location
Location
Serbia , Belgrade
Salary
Salary:
Not provided
sokin.com Logo
Sokin
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fintech & Payments Expertise: Experience building products in fintech space — specifically in payments, treasury, FX, or financial infrastructure
  • Strong understanding of traditional payment rails and protocols (SEPA, Faster Payments, ACH, SWIFT) and how they intersect with emerging technologies like stablecoins and blockchain-based settlement
  • Understanding of how card acquiring works, the key players in the payment processing chain, and the mechanics of cross-border money movement
  • Understanding of how stablecoins work across different setups and blockchains
  • Comfort working with compliance, risk, and legal stakeholders, and understand the regulatory landscape for financial services, including PCI-DSS, PSD2/PSD3, GDPR, and SOC 2
  • Agentic AI & Technical Depth: Hands-on experience with the latest LLMs and AI models (from Anthropic, OpenAI, Google, and open-source providers), with a deep understanding of their capabilities, limitations, and cost/performance trade-offs
  • Understanding of the architecture and design patterns behind agentic AI systems — multi-agent orchestration, tool use and function calling, retrieval-augmented generation (RAG), and human-in-the-loop workflows
  • Familiarity with emerging agent interoperability standards such as the Model Context Protocol (MCP) and Agent-to-Agent (A2A) protocol, and understand their implications for building composable, production-grade AI products
  • Ability to design and evaluate AI systems using modern evaluation frameworks (evals, benchmarks, red-teaming) and understand the difference between measuring traditional software and probabilistic AI outputs
  • Experience with or strong understanding of prompt engineering, fine-tuning strategies, and the trade-offs between different AI deployment approaches (API-based, self-hosted, edge)
Job Responsibility
Job Responsibility
  • Define and communicate the AI product roadmap aligned to the company’s vision across payments, treasury, compliance automation, and agentic commerce
  • Identify and prioritise high-impact use cases for agentic AI across Sokin’s product suite, from autonomous reconciliation and intelligent payment routing to AI-driven KYC/AML workflows
  • Work directly with a cross-functional team of AI/ML engineers, data scientists, and platform engineers to deliver value iteratively and often
  • Conduct market research and competitive analysis to stay ahead of trends in LLMs, AI agents, agentic payments, and fintech automation
  • Spend time with customers and internal stakeholders to discover new AI product opportunities, identify improvements, and validate solutions
  • Collaborate with engineering, design, compliance, marketing, and sales teams to assess the value, feasibility, viability, and safety of AI-powered solutions
  • Define product requirements for AI systems, including model selection criteria, evaluation frameworks (evals), guardrails, latency targets, and human-in-the-loop checkpoints
  • Own the responsible AI framework for your products, ensuring fairness, explainability, auditability, and compliance with the EU AI Act and relevant financial regulations
  • Work closely with product marketing and sales to articulate the value of AI features and launch new products to clients
  • Monitor AI product performance using both traditional product metrics and AI-specific evaluations (accuracy, hallucination rates, agent task completion, cost-per-inference) to drive continuous improvement
What we offer
What we offer
  • Agile, flexible working culture
  • Inclusive work environment
  • Opportunity to shape the future of payments
  • Fulltime
Read More
Arrow Right

Data Scientist

As a Data Scientist on Codex, you will measure and accelerate product-market fit...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in a quantitative role at a developer-facing or high-growth product
  • Fluency in SQL and Python
  • comfort with experiment design and causal inference
  • Experience defining product metrics tied to user value
  • Ability to communicate clearly with PM, Eng, and Design—and to influence product direction
Job Responsibility
Job Responsibility
  • Embed with the Codex product team to discover opportunities that improve developer outcomes and growth
  • Design and interpret A/B tests and staged rollouts of new coding models and product features
  • Define and operationalize metrics such as suggestion acceptance, edit distance, compile/test pass rates, task completion, latency, and session productivity
  • Build dashboards and analyses that help the team self-serve answers to product questions (by language, framework, repo size, task type)
  • Diagnose failure modes and partner with Research on targeted improvements (model quality signals, user feedback, evals)
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick and safe time (1 hour per 30 hours worked)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right