CrawlJobs Logo

Software Engineer, Product (Agents / Evals)

Sweden, Stockholm · Job Posted May 28, 2026
Apply Position
Job Link Share

Job Description

We're looking for strong engineers (backend, frontend, or full-stack) who are excited about building agents. You'll help shape how we build, evaluate, orchestrate, and scale LLM-powered agents in production - and define what it means to create truly lovable AI products.

Job Responsibility

  • Build, tune, and scale agents that power lovable products
  • Add new agent skills and tools
  • Improve agent reasoning, orchestration, and efficiency
  • Design how multiple agents collaborate
  • Select the right models for different task types
  • Push the limits of what agents can reliably do in real products
  • Analyze agent behavior and performance
  • Hill-climb toward better helpfulness, safety, and reliability
  • Build evaluation frameworks and benchmarks
  • Create experimentation pipelines and feedback loops
  • Ensure agents perform well across real-world use cases

Requirements

  • Strong engineering fundamentals
  • ability to build high-quality production systems

Nice to have

  • Have built AI agents yourself (side projects count)
  • Are deeply curious about how AI systems behave and improve
  • Have worked with LLMs or AI systems in production
  • Are excited about experimenting with new models and techniques
  • Shipped ML or AI features to real users with uptime requirements
  • Built evaluation systems or ML experimentation pipelines
  • Strong opinions on safety, latency, and helpfulness - but open to testing and learning

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer, Product (Agents / Evals)

8 matching positions

Software Engineer, Agents

At Harvey, we’re transforming how legal and professional services operate — not ...
Location
Location
United States , San Francisco
Salary
Salary:
165000.00 - 312000.00 USD / Year
harvey.ai Logo
Harvey
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Passion for building effective domain-specific agents
  • Iterative mindset: you develop proof of concepts, make decisions quickly, and ship v0s
  • Comfortable with when and how to use evaluations to drive quality
  • Humble and adaptable about code and frameworks. We expect you to drive adoption of new best practices as they develop
  • 3+ years (post-BS/MS) of software engineering experience
  • Proficiency in Python and experience working with LLM APIs and agent frameworks
  • Experience with shipping user-facing products, either on the backend or full-stack
Job Responsibility
Job Responsibility
  • Partner with customers and PMs to understand legal workflows, design practical evaluations that capture what “excellent” means, and ship agents that get the job done
  • Optimize agent performance through prompt engineering, model selection, tool design, skill writing, context window management, and eval harness development
  • Work with our model infra team to design and implement infrastructure for low-latency agent execution, including caching strategies, parallel tool calls, or subagent patterns
  • Improve our observability and instrumentation to profile agent behavior, identify bottlenecks, and drive optimization decisions
  • Stay current on new developments in agentic systems and bring those learnings back to the products we build
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits (401k match up to 4%)
  • flexible PTO
  • equity plan
  • bonus
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Product

As a Senior Applied AI Engineer at Vanta, you will play a crucial role in shapin...
Location
Location
United States
Salary
Salary:
207000.00 - 244000.00 USD / Year
vanta.com Logo
Vanta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7 years of industry experience as a software engineer
  • You’ve shipped LLM-backed products and have experience with prompting, RAG, and/or agent frameworks
  • You have experience designing, building, and scaling full-stack applications, including backend systems, APIs, and frontend interfaces
  • You have familiarity with TypeScript, React, and Node.js, or a willingness to learn
  • You have experience improving AI systems, creating eval sets, and driving quality hill-climbing
  • You have experience mentoring other engineers and collaborating with product and design
  • You have worked at rapidly scaling startups and large companies, especially with environments that prioritize a bias for action
  • You are action-driven, willing to roll up your sleeves and engage directly with users
  • You aren’t afraid to put on your product hat
  • While you bring strong opinions, you prioritize building a platform that meets users where they are
Job Responsibility
Job Responsibility
  • Work cross-functionally to design and implement AI-powered features to deliver customer value and integrate LLMs with Vanta’s existing products and systems
  • Instrument evaluations, guardrails, and monitoring, and review customer usage to continually improve quality
  • Collaborate with AI Platform engineers shaping foundational AI systems and tooling that accelerate product teams
  • Make pragmatic tradeoffs that consider business priorities, user experience, and a sustainable technical foundation
  • Mentor engineers, champion good technical and product instincts, and model a collaborative, high-ownership engineering culture
What we offer
What we offer
  • Offers Equity
  • medical benefits
  • 401(k) plan
  • other company perk programs
  • Comprehensive medical, dental, and vision coverage, with 100% of employee-only benefit premiums covered for most medical plans
  • 16 weeks fully-paid Parental Leave for all new parents
  • Health & wellness stipend
  • Remote workspace, internet, and cellphone stipend
  • Commuter benefits for team members who report to the SF and NYC office
  • Family planning benefits
  • Fulltime
Read More
Arrow Right

Software Engineer, Applied Evals

Applied Evals defines what good looks like for safe, advanced AI systems. We tur...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 325000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
  • Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
  • Familiarity with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context
  • Familiarity with deep learning concepts or prior exposure to training models
  • Ability to communicate clearly across technical and non-technical audiences across levels
  • Motivated by high-impact collaboration with research and product teams and thrive in ambiguity
Job Responsibility
Job Responsibility
  • Define the core evaluation signals that drive model improvement at OpenAI, turning vague product gaps into crisp, defensible measures of quality
  • Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
  • Prototype solutions with real workflows and convert them into scalable feedback loops
  • Connect evaluation signals directly to research and training systems so product improvements show up in what users experience
  • Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured
  • Build reusable systems and tools that enable contributions from across the company and steadily raise the quality bar
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Evals

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for ...
Location
Location
United States , San Francisco
Salary
Salary:
240000.00 - 280000.00 USD / Year
sentry.io Logo
Sentry
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)
Job Responsibility
Job Responsibility
  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring
What we offer
What we offer
  • Offers Equity
  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer

Wells Fargo is seeking a Senior Software Engineer. In this role, you will: Lead ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
June 09, 2026
Flip Icon
Requirements
Requirements
  • 4+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Experience in backend application software development, with ability to quickly adapt to Java, C# and Python code bases
  • Strong understanding of Retrieval-Augmented Generation (RAG), prompt engineering and agentic workflows
  • Deep knowledge of implementing guardrails and advanced techniques for query enrichment and re-writing
  • Expertise in test or eval driven development including data and error analysis ensuring robust and scalable AI software
  • Experience architecting and implementing agentic frameworks for autonomous multi-step reasoning and planning
  • Solid grasp of parsing, chunking, indexing and re-ranking of multiple file formats
  • Experience with Generative AI Operations, and enterprise-scale AI adoption strategies
  • Familiarity with enterprise-scale software systems and their integration within large organizations
  • Experience in enterprise AI model lifecycle management, AI compliance, and risk mitigation strategies
Job Responsibility
Job Responsibility
  • Lead moderately complex initiatives and deliverables within technical domain environments
  • Contribute to large scale planning of strategies
  • Design, code, test, debug, and document for projects and programs associated with technology domain, including upgrades and deployments
  • Review moderately complex technical challenges that require an in-depth evaluation of technologies and procedures
  • Resolve moderately complex issues and lead a team to meet existing client needs or potential new clients needs while leveraging solid understanding of the function, policies, procedures, or compliance requirements
  • Collaborate and consult with peers, colleagues, and mid-level managers to resolve technical challenges and achieve goals
  • Lead projects and act as an escalation point, provide guidance and direction to less experienced staff
  • Fulltime
Read More
Arrow Right

Ai Qa Engineer (Agents)

An AI QA Engineer (Agents) is responsible for ensuring the quality, reliability,...
Location
Location
Ireland , Cork
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years' total experience, including 1+ year testing AI/ML applications, LLM integrations, or conversational interfaces
  • Hands-on experience with end-to-end testing and automation for AI/agentic products
  • 3+ years of experience in software quality assurance or testing
  • 1+ years of experience testing AI/ML applications, LLM integrations, or conversational interfaces
  • Strong understanding of software testing principles, methodologies, and best practices
  • Experience writing and maintaining automated tests (unit, integration, or end‑to‑end)
  • Proficiency in at least one programming language (Python, TypeScript, JavaScript, Java, etc.)
  • Experience with API testing tools (Postman, REST Assured, etc.) or frameworks
  • Strong analytical and problem‑solving skills
  • Excellent attention to detail and ability to identify edge cases
Job Responsibility
Job Responsibility
  • Design and execute test plans for AI agents and agentic experiences
  • Write and maintain automated test suites for agent functionality (unit tests, evals integration tests, end‑to‑end tests)
  • Perform (minimal)manual testing of agent interactions, workflows, and business logic
  • Test agent responses, accuracy, and behavior across various scenarios and edge cases
  • Identify, document, and track bugs through resolution
  • Collaborate with engineers, product managers, and business stakeholders to understand requirements and acceptance criteria
  • Participate in test planning, test case design, and test strategy discussions
  • Create and maintain test data, test scenarios, and test environments for agents
  • Participate in feature design sessions, highlighting key testing scenarios and fault zones
  • Execute performance and load testing to ensure agent scalability and response times
  • Fulltime
Read More
Arrow Right

Agios AI Foundation Software Engineer

At Meta Reality Labs Research (RL-R), our goal is to explore, innovate and desig...
Location
Location
United States , Redmond
Salary
Salary:
121992.00 - 181000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Experience translating fast-moving prototypes and research artifacts into production systems, working directly with researchers or AI partners
  • 3+ years of software engineering experience, with a systems and Python background
  • Experience shipping LLM-powered or agentic features to production, not just prototypes
  • Experience designing developer-facing libraries, SDKs, or platforms that other engineers build on top of
  • Hands-on familiarity with modern LLM and agent tooling: prompt design, tool use, retrieval, structured output, evaluation
Job Responsibility
Job Responsibility
  • Design and build shared agent and LLM tooling: framework abstractions for tool use, retrieval, memory, and orchestration that other teams consume
  • Build evaluation infrastructure for LLM-powered features: offline eval harnesses, regression detection, prompt and model versioning, observability for agent behavior in production
  • Partner with product-facing teams to take AI prototypes into production-ready systems with clear quality, latency, and cost budgets
  • Set technical patterns and quality bars for AI work: how teams structure agents, evaluate them, ship them, and monitor them
  • Collaborate with researchers and AI engineers across RL-R to land new capabilities into the runtime in a way other engineers can build on
  • Contribute to planning, design, and code reviews across the team and adjacent groups
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Software Engineer, AI

At Monarch, AI is the engine powering intelligent, personalized financial experi...
Location
Location
United States
Salary
Salary:
Not provided
monarchmoney.com Logo
Monarch Money
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in software engineering
  • at least 2 years focused on building and operating production ML/AI systems
  • proven track record of shipping LLM-powered features
  • deep, hands-on expertise in prompt engineering, RAG systems, and evaluation techniques
  • strong fundamentals in machine learning: embeddings, similarity search, classification, and probabilistic reasoning
  • demonstrated experience building and using AI evaluation tooling (e.g., golden sets, rubric scoring, LLM-as-judge)
  • excellent Python skills
  • history of building production-grade AI features and services
  • strong collaboration and communication skills with a sharp product sensibility
  • strategic mindset, comfortable making build-vs-buy decisions and designing features for long-term reliability
Job Responsibility
Job Responsibility
  • Apply AI to Real Financial Problems: Use GenAI and ML to help users make sense of their money, understand spending patterns, surface actionable insights, or automate tedious financial tasks
  • Choose the Right Tool for Each Problem: Navigate the AI toolkit thoughtfully, know when a well-crafted prompt suffices, when retrieval systems add value, and when custom models are worth the investment
  • Ship with Confidence: Leverage and enhance our sophisticated evaluation framework to ensure AI quality, design test datasets, implement new scorers, and use our Braintrust-based eval system to validate changes before they reach users
  • AI feature development, agent design and orchestration, ML model improvements, evaluation datasets and scorers, prompt engineering, and feature-level quality
What we offer
What we offer
  • Work wherever you want! As a fully remote company
  • Competitive cash and equity compensation
  • Stipend to set-up your ideal working environment
  • Competitive Benefit Plans for employees based on your location (e.g. in the US we offer: Medical, dental and vision benefits and the ability to contribute to a 401k plan)
  • Unlimited PTO
  • 3 day weekend every month! We take off the “First Friday” every month
  • Fulltime
Read More
Arrow Right