CrawlJobs Logo

Senior Software Engineer, AI Evals

sentry.io Logo

Sentry

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

240000.00 - 280000.00 USD / Year

Job Description:

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for building the evaluation infrastructure that measures the accuracy, reliability, and real-world performance of our AI systems. This role is critical to ensuring that our debugging agents and AI-powered features behave correctly, safely, and predictably as they scale. You’ll design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals, helping the team ship AI with confidence.

Job Responsibility:

  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring

Requirements:

  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)

Nice to have:

Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer tools

What we offer:
  • Offers Equity
  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer, AI Evals

Senior Software Engineer - Studio - Java, AI

As a Senior Software Engineer, you’ll build the backend that powers AI features ...
Location
Location
United States , New York
Salary
Salary:
175000.00 - 240000.00 USD / Year
clearstreet.io Logo
Clear Street
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7+ years of strong proficiency in enterprise Java
  • Experience designing and deploying AI/ML or LLM-backed systems in production
  • Familiarity with LLM tooling and patterns: (e.g. tool calling, RAG pipelines and knowledge bases, evals, cost/latency tradeoffs, basic red-teaming)
  • Experience in supporting and running systems in a production environment
  • Comfortable working in a dynamic environment, partnering with cross-functional teams, and moving from prototype to reliable production
Job Responsibility
Job Responsibility
  • Design, implement, and productionize reliable AI workflows to augment the Studio trading platform
  • Build tooling to monitor, tune, and evaluate models and workflows, as well as applicable guardrails to ensure outputs meet quality and regulatory requirements
  • Collaborate with technical and non-technical teams across the firm to identify high ROI AI opportunities
  • Build rapid prototypes and translate them into production-grade systems. Utilize the latest AI-powered development tools to iterate quickly
  • Create reusable libraries, SDKs and tooling to enable AI development throughout the firm
  • Stay current on the latest in applied AI. Read papers, evaluate new models, test out new tools
  • Participate in code review and architecture design, manage deployments, and support and contribute to the success of the overall Studio platform
What we offer
What we offer
  • Competitive compensation, benefits, and perks
  • Company equity
  • 401k matching
  • Gender neutral parental leave
  • Full medical, dental and vision insurance
  • Lunch stipends
  • Fully stocked kitchens
  • Happy hours
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer, AI Evaluation

We’re looking for an AI Platform Engineer to evolve and extend our internal eval...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 2+ of those years working on the evaluation of generative AI systems
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Familiarity with the architecture of large language models and their industry-standard APIs
Job Responsibility
Job Responsibility
  • Evolve and extend our internal evaluation framework for assessing the quality of our AI-driven experiences
  • Work closely with ML data engineers and platform developers to help internal teams adopt an eval-driven development process incorporating offline benchmark tests and online experiments
  • Gather internal requirements, getting buy-in for changes, and then developing documentation and training materials
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026
  • Remote-first culture
  • Generous parental leave
  • 401(k) + 4% matching
  • Comprehensive insurance, including medical, dental, vision, and life
  • Fulltime
Read More
Arrow Right

Senior AI Engineer - MSC AI Innovation

MSC AI Innovation is an AI-first team that incubates, builds, and accelerates so...
Location
Location
Israel , Tel Aviv, Herzliya
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years professional software development
  • 4+ years of software engineering experience in the AI space (e.g., building and shipping AI/ML or GenAI features in production)
  • Proven experience with building AI agents
  • Hands-on experience with evaluation methodologies and integrating quality standards/guardrails into delivery
  • Proficiency in Python and/or C#, with experience using REST APIs and SDKs
  • Deep understanding of AI system design, including ML fundamentals, Generative AI concepts, and cloud-native architectures
Job Responsibility
Job Responsibility
  • Design and build AI agents that plan, use tools/APIs, manage state/memory, and reliably complete multi-step workflows
  • Own AI features from design through production, including deployment, monitoring, and live‑site reliability, with an eval-first development lifecycle: define success criteria, build evaluation datasets and automated harnesses, and run human-in-the-loop reviews where needed
  • Develop and maintain prompt, retrieval, and memory strategies (system prompts, few-shot examples, tool schemas, retrieval context) with proper versioning and evaluation coverage
  • Debug AI behavior using prompt analysis, data inspection, and model/tool-call traces, and translate failure patterns into targeted improvements
  • Establish and track AI quality metrics (e.g., accuracy, groundedness, relevance, hallucination rate) and integrate them into CI/CD release gates
  • Optimize runtime performance and economics (token usage, inference cost, latency, caching, model selection/routing, batching) and implement monitoring and continuous improvement loops (online signals, drift detection, structured user feedback)
  • Partner with product, design, and domain stakeholders to define use cases, acceptance criteria, and rollout plans for AI features
  • Live site responsibility
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Product

As a Senior Applied AI Engineer at Vanta, you will play a crucial role in shapin...
Location
Location
United States
Salary
Salary:
207000.00 - 244000.00 USD / Year
vanta.com Logo
Vanta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7 years of industry experience as a software engineer
  • You’ve shipped LLM-backed products and have experience with prompting, RAG, and/or agent frameworks
  • You have experience designing, building, and scaling full-stack applications, including backend systems, APIs, and frontend interfaces
  • You have familiarity with TypeScript, React, and Node.js, or a willingness to learn
  • You have experience improving AI systems, creating eval sets, and driving quality hill-climbing
  • You have experience mentoring other engineers and collaborating with product and design
  • You have worked at rapidly scaling startups and large companies, especially with environments that prioritize a bias for action
  • You are action-driven, willing to roll up your sleeves and engage directly with users
  • You aren’t afraid to put on your product hat
  • While you bring strong opinions, you prioritize building a platform that meets users where they are
Job Responsibility
Job Responsibility
  • Work cross-functionally to design and implement AI-powered features to deliver customer value and integrate LLMs with Vanta’s existing products and systems
  • Instrument evaluations, guardrails, and monitoring, and review customer usage to continually improve quality
  • Collaborate with AI Platform engineers shaping foundational AI systems and tooling that accelerate product teams
  • Make pragmatic tradeoffs that consider business priorities, user experience, and a sustainable technical foundation
  • Mentor engineers, champion good technical and product instincts, and model a collaborative, high-ownership engineering culture
What we offer
What we offer
  • Offers Equity
  • medical benefits
  • 401(k) plan
  • other company perk programs
  • Comprehensive medical, dental, and vision coverage, with 100% of employee-only benefit premiums covered for most medical plans
  • 16 weeks fully-paid Parental Leave for all new parents
  • Health & wellness stipend
  • Remote workspace, internet, and cellphone stipend
  • Commuter benefits for team members who report to the SF and NYC office
  • Family planning benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, Physics, or a related field and 4 or more years in applied ML or AI research and product engineering
  • Master’s degree and 3 or more years in applied ML or AI research and product engineering
  • PhD in a relevant field and 2 or more years with generative AI, LLMs, or related ML algorithms
  • Proficiency in Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer, and government security screening requirements
  • Microsoft Cloud Background Check upon hire or transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Bringing State-of-the-Art Research to Products
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • End-to-End System Development
  • ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Fulltime
Read More
Arrow Right

Senior AI Frontend Engineer (Developer Productivity)

We're seeking a Senior Frontend Engineer with a strong React/TypeScript backgrou...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise (5–10+ years) building modern frontend applications with React and TypeScript
  • Proficiency in JavaScript, React (or another UI framework), and TypeScript
  • Experience with state management libraries (redux, context API, zustand), for building wellstructured applications
  • Experience with storybook or componentised development
  • Proficiency in implementing streaming and real-time experiences (e.g., word/token streaming, live updates, progress/status indicators)
  • Strong understanding of frontend architectures, state management, performance optimisation, and responsive design
  • Hands-on experience with any tools like LangChain / LangGraph / Vercel AI SDK / Google ADK (Agent Development Kit)
  • Familiarity with CI/CD tools (e.g.: Jenkins, Tekton, ArgoCD, Harness, etc)
Job Responsibility
Job Responsibility
  • Own the user-facing layer of our nextgeneration Developer Productivity platform @ Citi, transforming complex AI capabilities - from chat interfaces to rich data visualizations - into intuitive, trustworthy experiences
  • Collaborate closely with other AI, Software Engineers and the Product team to leverage bleeding-edge Generative AI
  • Challenge, change, modernise & enhance the experience of our 50,000 engineers globally throughout Citi's SDLC (Software Development Life Cycle)
  • Release to production a small new or enhanced AI-first user interface that will have positively impacted the lives of thousands of Software Engineers and Business Analysts working in Software Requirements Engineering
  • Start raising the bar in our React.JS codebase introducing better componentisation, testing, storybook
  • Establish network of UI engineers across the organisation to contribute and learn about best practice
  • Get buy in from the team on architectural principles, ways of working and system requirements
  • Own and champion the implementation of best practices for interaction design within the team, establishing clear guidelines for AI-specific UX patterns
  • Mentor junior engineers on best practices for designing and implementing AI-driven user interfaces
  • Design & implement production-grade features for AI solutions
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Senior Software Engineer and Principal Software Engineer - Power Point AI Team

The PowerPoint team is embarking on an exciting new chapter - evolving a product...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 8+ years of experience in backend service engineering, including work on high-scale infrastructures
  • Proficiency in one or more systems programming languages such as C#, C++
  • 1+ years of experience in software engineering, designing and developing systems (and APIs) that deploy and integrate with AI models
  • 2+ years of experience working with rich telemetry, making data driven decisions, and carrying out rapid experimentation
  • 2+ years of experience building software for scale, performance, and reliability
  • Academic or industry experience with building, finetuning, deploying or building eval-driven systems utilizing the models (any category)
Job Responsibility
Job Responsibility
  • Lead design and delivery of complex, scalable AI features ensuring resilience and exceptional user experience
  • Drive technical strategy and architecture decisions across multiple services, influencing partner teams and aligning with compliance and security requirements
  • Champion modern engineering practices, including AI-driven approaches, automation, and cloud-native patterns, across the full development lifecycle
  • Mentor and guide engineers, fostering technical excellence and continuous improvement in security, reliability, and performance
  • Collaborate cross-org to solve challenging technical problems, streamline processes, and reduce operational costs while improving live-site health
  • Design and implement scalable backend services optimized for machine learning workflows and large language model integration
  • Develop and maintain evaluation-driven systems that leverage text and multimodal inputs (e.g., images) to power visual-creation experiences
  • Build and optimize APIs and infrastructure to support high-performance model inference and experimentation at scale
  • Collaborate with product, ML, and design teams to integrate models into user-facing features, ensuring seamless functionality and performance
  • Conduct model evaluations and experiments, analyze results, and iterate on improvements to enhance accuracy and user experience
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Eval

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for ...
Location
Location
United States , San Francisco
Salary
Salary:
240000.00 - 280000.00 USD / Year
sentry.io Logo
Sentry
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)
Job Responsibility
Job Responsibility
  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring
What we offer
What we offer
  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage
  • Fulltime
Read More
Arrow Right