CrawlJobs Logo

Engineer - Agents & Evals

Sweden, Stockholm · Job Posted March 03, 2026
Apply Position
Job Link Share

Job Description

We’re looking for strong engineers (backend, frontend, or full-stack) who are excited about building agents. You’ll help shape how we build, evaluate, orchestrate, and scale LLM-powered agents in production - and define what it means to create truly lovable AI products.

Job Responsibility

  • Build, tune, and scale agents that power lovable products
  • Add new agent skills and tools
  • Improve agent reasoning, orchestration, and efficiency
  • Design how multiple agents collaborate
  • Select the right models for different task types
  • Push the limits of what agents can reliably do in real products
  • Analyze agent behavior and performance
  • Hill-climb toward better helpfulness, safety, and reliability
  • Build evaluation frameworks and benchmarks
  • Create experimentation pipelines and feedback loops
  • Ensure agents perform well across real-world use cases

Requirements

  • Strong engineering fundamentals
  • Ability to build high-quality production systems
  • Backend, frontend, or full-stack engineering background

Nice to have

  • Have built AI agents yourself (side projects count)
  • Are deeply curious about how AI systems behave and improve
  • Have worked with LLMs or AI systems in production
  • Are excited about experimenting with new models and techniques
  • Shipped ML or AI features to real users with uptime requirements
  • Built evaluation systems or ML experimentation pipelines
  • Strong opinions on safety, latency, and helpfulness - but open to testing and learning

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Engineer - Agents & Evals

8 matching positions

Software Engineer, Product (Agents / Evals)

We're looking for strong engineers (backend, frontend, or full-stack) who are ex...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
lovable.dev Logo
Lovable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering fundamentals
  • ability to build high-quality production systems
Job Responsibility
Job Responsibility
  • Build, tune, and scale agents that power lovable products
  • Add new agent skills and tools
  • Improve agent reasoning, orchestration, and efficiency
  • Design how multiple agents collaborate
  • Select the right models for different task types
  • Push the limits of what agents can reliably do in real products
  • Analyze agent behavior and performance
  • Hill-climb toward better helpfulness, safety, and reliability
  • Build evaluation frameworks and benchmarks
  • Create experimentation pipelines and feedback loops
  • Fulltime
Read More
Arrow Right

Ai Qa Engineer (Agents)

An AI QA Engineer (Agents) is responsible for ensuring the quality, reliability,...
Location
Location
Ireland , Cork
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years' total experience, including 1+ year testing AI/ML applications, LLM integrations, or conversational interfaces
  • Hands-on experience with end-to-end testing and automation for AI/agentic products
  • 3+ years of experience in software quality assurance or testing
  • 1+ years of experience testing AI/ML applications, LLM integrations, or conversational interfaces
  • Strong understanding of software testing principles, methodologies, and best practices
  • Experience writing and maintaining automated tests (unit, integration, or end‑to‑end)
  • Proficiency in at least one programming language (Python, TypeScript, JavaScript, Java, etc.)
  • Experience with API testing tools (Postman, REST Assured, etc.) or frameworks
  • Strong analytical and problem‑solving skills
  • Excellent attention to detail and ability to identify edge cases
Job Responsibility
Job Responsibility
  • Design and execute test plans for AI agents and agentic experiences
  • Write and maintain automated test suites for agent functionality (unit tests, evals integration tests, end‑to‑end tests)
  • Perform (minimal)manual testing of agent interactions, workflows, and business logic
  • Test agent responses, accuracy, and behavior across various scenarios and edge cases
  • Identify, document, and track bugs through resolution
  • Collaborate with engineers, product managers, and business stakeholders to understand requirements and acceptance criteria
  • Participate in test planning, test case design, and test strategy discussions
  • Create and maintain test data, test scenarios, and test environments for agents
  • Participate in feature design sessions, highlighting key testing scenarios and fault zones
  • Execute performance and load testing to ensure agent scalability and response times
  • Fulltime
Read More
Arrow Right

Software Engineer, Agents

At Harvey, we’re transforming how legal and professional services operate — not ...
Location
Location
United States , San Francisco
Salary
Salary:
165000.00 - 312000.00 USD / Year
harvey.ai Logo
Harvey
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Passion for building effective domain-specific agents
  • Iterative mindset: you develop proof of concepts, make decisions quickly, and ship v0s
  • Comfortable with when and how to use evaluations to drive quality
  • Humble and adaptable about code and frameworks. We expect you to drive adoption of new best practices as they develop
  • 3+ years (post-BS/MS) of software engineering experience
  • Proficiency in Python and experience working with LLM APIs and agent frameworks
  • Experience with shipping user-facing products, either on the backend or full-stack
Job Responsibility
Job Responsibility
  • Partner with customers and PMs to understand legal workflows, design practical evaluations that capture what “excellent” means, and ship agents that get the job done
  • Optimize agent performance through prompt engineering, model selection, tool design, skill writing, context window management, and eval harness development
  • Work with our model infra team to design and implement infrastructure for low-latency agent execution, including caching strategies, parallel tool calls, or subagent patterns
  • Improve our observability and instrumentation to profile agent behavior, identify bottlenecks, and drive optimization decisions
  • Stay current on new developments in agentic systems and bring those learnings back to the products we build
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits (401k match up to 4%)
  • flexible PTO
  • equity plan
  • bonus
  • Fulltime
Read More
Arrow Right

LLM Inference Performance & Evals Engineer

Join the inference model team dedicated to bring up the state-of-the-art models,...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years building high-performance ML or systems software
  • Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
  • Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration
  • Prior experience in modeling, compilers or crafting benchmarks or performance studies
  • not just black-box QA tests
  • Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity
Job Responsibility
Job Responsibility
  • Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
  • Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
  • Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
  • Keep pace with the latest open- and closed-source models
  • run them first on wafer scale to expose new optimization opportunities
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Software Engineer, Applied Evals

Applied Evals defines what good looks like for safe, advanced AI systems. We tur...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 325000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
  • Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
  • Familiarity with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context
  • Familiarity with deep learning concepts or prior exposure to training models
  • Ability to communicate clearly across technical and non-technical audiences across levels
  • Motivated by high-impact collaboration with research and product teams and thrive in ambiguity
Job Responsibility
Job Responsibility
  • Define the core evaluation signals that drive model improvement at OpenAI, turning vague product gaps into crisp, defensible measures of quality
  • Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
  • Prototype solutions with real workflows and convert them into scalable feedback loops
  • Connect evaluation signals directly to research and training systems so product improvements show up in what users experience
  • Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured
  • Build reusable systems and tools that enable contributions from across the company and steadily raise the quality bar
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Evals

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for ...
Location
Location
United States , San Francisco
Salary
Salary:
240000.00 - 280000.00 USD / Year
sentry.io Logo
Sentry
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)
Job Responsibility
Job Responsibility
  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring
What we offer
What we offer
  • Offers Equity
  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage
  • Fulltime
Read More
Arrow Right

Rag + Agentic Ai Engineer

We are looking for a highly skilled RAG + LLM-based Agentic AI Engineer to build...
Location
Location
India , Pune; Mumbai; Bengaluru; Indore; Jaipur
Salary
Salary:
Not provided
Codvo AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Smart/component-aware chunking
  • Hybrid retrieval (BM25 + dense)
  • Reranking
  • Graph-RAG basics
  • Query rewriting & multi-hop retrieval
  • LangGraph
  • MCP tools
  • Multi-agent orchestration
  • Tool-calling workflows
  • Qdrant or Weaviate
Job Responsibility
Job Responsibility
  • Build end-to-end RAG pipelines: smart chunking, hybrid retrieval, reranking, query rewriting
  • Implement LangGraph-based agentic workflows with tools, planning, and self-healing behaviors
  • Develop and optimize vector search systems (Qdrant/Weaviate/OpenSearch)
  • Design multi-tenant, ontology-driven knowledge layers for enterprise data
  • Implement guardrails, hallucination reduction, and automated evals
  • Build and maintain document ingestion pipelines (PDF, DOCX, OCR, images)
  • Architect and deploy FastAPI-based AI microservices with caching and async execution
  • Integrate LLM systems with enterprise apps (SharePoint, CRM, databases, MCP tools)
  • Ensure security, PII safety, auditability, and governance requirements are met
  • Collaborate with solution architects and domain SMEs to translate use cases into working AI systems
What we offer
What we offer
  • Work on cutting-edge agentic AI systems at production scale
  • Ownership over enterprise-grade architectures
  • Fast growth environment
  • Influence Codvo's platform and accelerators
  • Fulltime
Read More
Arrow Right
New

Product Manager, Enterprise

Luma has built frontier multimodal capabilities — foundation models, Canvas, and...
Location
Location
United States , SF Bay Area
Salary
Salary:
225000.00 - 325000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Staff or Principal-level product management experience (L5/L6 seniority or above) with a track record of building enterprise products in high-ambiguity environments
  • You’ve built successful enterprise AI applications — especially agentic products with strong real-world adoption, not just traditional SaaS
  • You possess functional AI/ML fluency — not just academic. You can reason about model capabilities, evals, latency/cost tradeoffs, fine-tuning vs. prompting vs. tool use, and the shape of a frontier roadmap. You are technical enough to prototype a solution on a whiteboard with a customer’s ML team and credible enough that a research lead actually wants you in the room
  • You operate effectively across complex cross-functional environments involving product, engineering, research, go-to-market, and customer-facing teams
  • You have a general manager mindset with P&L visibility and experience — you think in terms of business tradeoffs, commercial viability, and revenue outcomes, not just product metrics
  • You thrive in significant ambiguity with extremely high agency and a strong self-starter mentality — this is the highest-priority trait for this role
  • You bring consultative PM skills and the ability to play an advisory role with enterprise customers, asking clarifying questions about needs and commercial viability rather than immediately saying yes to requests
  • You have a mix of large-company and earlier-stage startup experience, and can work in extremely unstructured environments with limited scaffolding
  • You can partner with and influence research teams in product development environments, helping ensure frontier model capabilities are shaped by market needs
Job Responsibility
Job Responsibility
  • Define product opportunities across enterprise segments by identifying customer profiles, workflows, jobs-to-be-done, and unmet needs in marketing, advertising, and entertainment
  • Translate Luma’s capabilities across models, Canvas, Agents and APIs into clear product definitions that serve real commercial customers and generate revenue
  • Drive cross-functional execution across research, product, engineering, go-to-market, and forward-deployed teams within a pod-based team structure
  • Partner directly with enterprise customers to understand requirements, validate opportunities, and shape product direction — showing up in customer meetings to represent research, product strategy, and vision
  • Play a consultative product role that connects customer feedback to product strategy and roadmap decisions, complementing forward-deployed engineers and creatives rather than duplicating their work
  • Bridge market signals back to research and product development teams, establishing feedback loops so that model prioritization and evaluation are informed by real commercial needs
  • Build agentic products — not just traditional SaaS — that serve clear enterprise needs and can generate revenue immediately while scaling to serve broader demand
  • Operate with a GM-style mindset, making product choices that align with revenue goals, commercial viability, and practical business outcomes
  • Fulltime
Read More
Arrow Right