CrawlJobs Logo

Senior Software Engineer, AI Eval

United States, San Francisco 240000.00 - 280000.00 USD / Year · Job Posted January 22, 2026
Apply Position
Job Link Share

Job Description

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for building the evaluation infrastructure that measures the accuracy, reliability, and real-world performance of our AI systems. This role is critical to ensuring that our debugging agents and AI-powered features behave correctly, safely, and predictably as they scale. You’ll design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals, helping the team ship AI with confidence.

Job Responsibility

  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring

Requirements

  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)

Nice to have

Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer tools

What we offer

  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Software Engineer, AI Eval

8 matching positions

Senior Software Engineer and Principal Software Engineer - Power Point AI Team

The PowerPoint team is embarking on an exciting new chapter - evolving a product...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 8+ years of experience in backend service engineering, including work on high-scale infrastructures
  • Proficiency in one or more systems programming languages such as C#, C++
  • 1+ years of experience in software engineering, designing and developing systems (and APIs) that deploy and integrate with AI models
  • 2+ years of experience working with rich telemetry, making data driven decisions, and carrying out rapid experimentation
  • 2+ years of experience building software for scale, performance, and reliability
  • Academic or industry experience with building, finetuning, deploying or building eval-driven systems utilizing the models (any category)
Job Responsibility
Job Responsibility
  • Lead design and delivery of complex, scalable AI features ensuring resilience and exceptional user experience
  • Drive technical strategy and architecture decisions across multiple services, influencing partner teams and aligning with compliance and security requirements
  • Champion modern engineering practices, including AI-driven approaches, automation, and cloud-native patterns, across the full development lifecycle
  • Mentor and guide engineers, fostering technical excellence and continuous improvement in security, reliability, and performance
  • Collaborate cross-org to solve challenging technical problems, streamline processes, and reduce operational costs while improving live-site health
  • Design and implement scalable backend services optimized for machine learning workflows and large language model integration
  • Develop and maintain evaluation-driven systems that leverage text and multimodal inputs (e.g., images) to power visual-creation experiences
  • Build and optimize APIs and infrastructure to support high-performance model inference and experimentation at scale
  • Collaborate with product, ML, and design teams to integrate models into user-facing features, ensuring seamless functionality and performance
  • Conduct model evaluations and experiments, analyze results, and iterate on improvements to enhance accuracy and user experience
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Product

As a Senior Applied AI Engineer at Vanta, you will play a crucial role in shapin...
Location
Location
United States
Salary
Salary:
207000.00 - 244000.00 USD / Year
vanta.com Logo
Vanta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7 years of industry experience as a software engineer
  • You’ve shipped LLM-backed products and have experience with prompting, RAG, and/or agent frameworks
  • You have experience designing, building, and scaling full-stack applications, including backend systems, APIs, and frontend interfaces
  • You have familiarity with TypeScript, React, and Node.js, or a willingness to learn
  • You have experience improving AI systems, creating eval sets, and driving quality hill-climbing
  • You have experience mentoring other engineers and collaborating with product and design
  • You have worked at rapidly scaling startups and large companies, especially with environments that prioritize a bias for action
  • You are action-driven, willing to roll up your sleeves and engage directly with users
  • You aren’t afraid to put on your product hat
  • While you bring strong opinions, you prioritize building a platform that meets users where they are
Job Responsibility
Job Responsibility
  • Work cross-functionally to design and implement AI-powered features to deliver customer value and integrate LLMs with Vanta’s existing products and systems
  • Instrument evaluations, guardrails, and monitoring, and review customer usage to continually improve quality
  • Collaborate with AI Platform engineers shaping foundational AI systems and tooling that accelerate product teams
  • Make pragmatic tradeoffs that consider business priorities, user experience, and a sustainable technical foundation
  • Mentor engineers, champion good technical and product instincts, and model a collaborative, high-ownership engineering culture
What we offer
What we offer
  • Offers Equity
  • medical benefits
  • 401(k) plan
  • other company perk programs
  • Comprehensive medical, dental, and vision coverage, with 100% of employee-only benefit premiums covered for most medical plans
  • 16 weeks fully-paid Parental Leave for all new parents
  • Health & wellness stipend
  • Remote workspace, internet, and cellphone stipend
  • Commuter benefits for team members who report to the SF and NYC office
  • Family planning benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Evals

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for ...
Location
Location
United States , San Francisco
Salary
Salary:
240000.00 - 280000.00 USD / Year
sentry.io Logo
Sentry
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)
Job Responsibility
Job Responsibility
  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring
What we offer
What we offer
  • Offers Equity
  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Studio - Java, AI

As a Senior Software Engineer, you’ll build the backend that powers AI features ...
Location
Location
United States , New York
Salary
Salary:
175000.00 - 240000.00 USD / Year
clearstreet.io Logo
Clear Street
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7+ years of strong proficiency in enterprise Java
  • Experience designing and deploying AI/ML or LLM-backed systems in production
  • Familiarity with LLM tooling and patterns: (e.g. tool calling, RAG pipelines and knowledge bases, evals, cost/latency tradeoffs, basic red-teaming)
  • Experience in supporting and running systems in a production environment
  • Comfortable working in a dynamic environment, partnering with cross-functional teams, and moving from prototype to reliable production
Job Responsibility
Job Responsibility
  • Design, implement, and productionize reliable AI workflows to augment the Studio trading platform
  • Build tooling to monitor, tune, and evaluate models and workflows, as well as applicable guardrails to ensure outputs meet quality and regulatory requirements
  • Collaborate with technical and non-technical teams across the firm to identify high ROI AI opportunities
  • Build rapid prototypes and translate them into production-grade systems. Utilize the latest AI-powered development tools to iterate quickly
  • Create reusable libraries, SDKs and tooling to enable AI development throughout the firm
  • Stay current on the latest in applied AI. Read papers, evaluate new models, test out new tools
  • Participate in code review and architecture design, manage deployments, and support and contribute to the success of the overall Studio platform
What we offer
What we offer
  • Competitive compensation, benefits, and perks
  • Company equity
  • 401k matching
  • Gender neutral parental leave
  • Full medical, dental and vision insurance
  • Lunch stipends
  • Fully stocked kitchens
  • Happy hours
  • Fulltime
Read More
Arrow Right

Senior AI Engineer - MSC AI Innovation

MSC AI Innovation is an AI-first team that incubates, builds, and accelerates so...
Location
Location
Israel , Tel Aviv, Herzliya
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years professional software development
  • 4+ years of software engineering experience in the AI space (e.g., building and shipping AI/ML or GenAI features in production)
  • Proven experience with building AI agents
  • Hands-on experience with evaluation methodologies and integrating quality standards/guardrails into delivery
  • Proficiency in Python and/or C#, with experience using REST APIs and SDKs
  • Deep understanding of AI system design, including ML fundamentals, Generative AI concepts, and cloud-native architectures
Job Responsibility
Job Responsibility
  • Design and build AI agents that plan, use tools/APIs, manage state/memory, and reliably complete multi-step workflows
  • Own AI features from design through production, including deployment, monitoring, and live‑site reliability, with an eval-first development lifecycle: define success criteria, build evaluation datasets and automated harnesses, and run human-in-the-loop reviews where needed
  • Develop and maintain prompt, retrieval, and memory strategies (system prompts, few-shot examples, tool schemas, retrieval context) with proper versioning and evaluation coverage
  • Debug AI behavior using prompt analysis, data inspection, and model/tool-call traces, and translate failure patterns into targeted improvements
  • Establish and track AI quality metrics (e.g., accuracy, groundedness, relevance, hallucination rate) and integrate them into CI/CD release gates
  • Optimize runtime performance and economics (token usage, inference cost, latency, caching, model selection/routing, batching) and implement monitoring and continuous improvement loops (online signals, drift detection, structured user feedback)
  • Partner with product, design, and domain stakeholders to define use cases, acceptance criteria, and rollout plans for AI features
  • Live site responsibility
  • Fulltime
Read More
Arrow Right

Senior AI Engineer - Teams Messaging AI

Are you interested in joining one of the most exciting teams and working on the ...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, implementation, and shipping of multiple new messaging and large language models (LLM) agentic features
  • Building end-to-end user experiences that work across multiple devices and browsers
  • Writing and maintaining unit tests, large language models (LLM) eval and automated integration or end-to-end tests
  • Building web and AI applications in enterprise and/or consumer markets
  • Collaborating with partner teams to meet engineering goals
  • Managing individual projects or feature priorities, deadlines, and deliverables
  • Fulltime
Read More
Arrow Right

Senior AI Engineer

We're looking for a Senior AI Engineer to help us build the next generation of A...
Location
Location
Netherlands; Germany; Romania; United Kingdom; Spain; Italy; Poland , Amsterdam; Berlin; Hannover; Iași; London; Madrid; Milano; München; Warsaw
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional software engineering experience, with a track record of shipping production systems
  • Strong Python and backend system design
  • Real, hands-on experience building and running production LLM-based systems not just prototypes
  • Deep experience with agent-based architectures using frameworks such as LangGraph, the Deep Agents or equivalents with multi-step reasoning, tool use, sub-agent orchestration, state and streaming
  • Strong working knowledge of retrieval and context construction (RAG, embeddings, chunking, ranking) and good judgement about when to use those patterns inside an agentic system
  • A solid track record of evaluating and debugging AI systems: structured evals, regression tests and tracing or observability with tools such as LangSmith or similar
  • Familiarity with emerging agent orchestration standards (MCP or similar)
  • Experience with vector databases and hybrid retrieval
  • A clear understanding of LLM failure modes like prompt injection, content safety, context leakage and how to mitigate them in real systems
  • Strong fundamentals in distributed systems, API design and cloud-native architectures on AWS (ECS, Lambda, S3, API Gateway)
Job Responsibility
Job Responsibility
  • Design, build and operate customer-facing AI features end-to-end, from conversational experiences to intelligent automation
  • Architect agent-based systems: multi-step reasoning, tool execution, sub-agent orchestration, state management and streaming, structured outputs, persistent memory and human-in-the-loop approval flows
  • Decide how the system grounds, retrieves and constructs context
  • Pick the right pattern (retrieval, structured tools, agentic flows) for each problem and be able to defend the choice
  • Treat prompt and context engineering as first-class engineering work
  • Own AI quality
  • Build evaluation datasets, regression tests for prompts and agents and the debugging discipline needed to get to root cause across model outputs, retrieval and tool calls
  • Make AI production-ready: observability, tracing, reliability, cost, latency and the safety controls that protect against hallucination, prompt injection and context leakage
  • Make and explain the trade-offs between accuracy, latency, cost and safety within engineering and product partners
  • Partner within Product Trio to shape what we build, not just how we build it
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and wellbeing, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Meal Vouchers: You will be supported with a certain net sum to spend it on a variety of lunches
  • Health & Wellbeing: The insurance covers several types of health, vision and / or dental treatments for you and for up to one additional family member
  • Remote Working Furniture Package: After 3 months of employment, you will be eligible for a furniture package, which should enable you to set up a proper workplace at your remote working location
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

We are looking for a Senior Software Engineer to join our team to drive all aspe...
Location
Location
Canada , Vancouver
Salary
Salary:
114400.00 - 203900.00 CAD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Understanding of building engineering tools on the server side for scale
  • 2+ years of experience on engineering tooling or eval development
  • Prior experience in working on services at scale
  • Prior experience in driving fundamentals for AI features within web apps
  • Prior experience in working closely with AI feature teams and improving fundamentals like performance and reliability is a major plus
  • Experience solving challenging problems and cross team/organization collaboration skills
Job Responsibility
Job Responsibility
  • Build and evolve the Real-Time Intelligence evaluations platform: implement offline and online eval pipelines, including golden datasets, human review workflows, and LLM-as-judge / auto-raters for agents, anomaly detectors, and decisioning systems
  • Instrument agentic solutions for observability by wiring up telemetry, tracing, structured logging, and dashboards so quality, safety, latency, and cost are easy to monitor and debug
  • Integrate evals into the development lifecycle by connecting pipelines to CI/CD, canary and A/B experiments, and phased rollouts, making it simple for partner teams to run and interpret evaluations
  • Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features
  • Fulltime
Read More
Arrow Right