CrawlJobs Logo

Engineer - Agents & Evals

lovable.dev Logo

Lovable

Location Icon

Location:
Sweden , Stockholm

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We’re looking for strong engineers (backend, frontend, or full-stack) who are excited about building agents. You’ll help shape how we build, evaluate, orchestrate, and scale LLM-powered agents in production - and define what it means to create truly lovable AI products.

Job Responsibility:

  • Build, tune, and scale agents that power lovable products
  • Add new agent skills and tools
  • Improve agent reasoning, orchestration, and efficiency
  • Design how multiple agents collaborate
  • Select the right models for different task types
  • Push the limits of what agents can reliably do in real products
  • Analyze agent behavior and performance
  • Hill-climb toward better helpfulness, safety, and reliability
  • Build evaluation frameworks and benchmarks
  • Create experimentation pipelines and feedback loops
  • Ensure agents perform well across real-world use cases

Requirements:

  • Strong engineering fundamentals
  • Ability to build high-quality production systems
  • Backend, frontend, or full-stack engineering background

Nice to have:

  • Have built AI agents yourself (side projects count)
  • Are deeply curious about how AI systems behave and improve
  • Have worked with LLMs or AI systems in production
  • Are excited about experimenting with new models and techniques
  • Shipped ML or AI features to real users with uptime requirements
  • Built evaluation systems or ML experimentation pipelines
  • Strong opinions on safety, latency, and helpfulness - but open to testing and learning

Additional Information:

Job Posted:
March 03, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineer - Agents & Evals

Software Engineer, Agent

We are building the next-generation AI-powered platform and web application for ...
Location
Location
United States , San Francisco
Salary
Salary:
174000.00 - 286000.00 USD / Year
descript.com Logo
Descript
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of professional product engineering or fullstack experience
  • Worked with Typescript, React, and RESTful APIs (or similar)
  • Solid CS fundamentals, including data structures, algorithms, databases (Postgres, Redis)
  • High ownership and growth mindset
  • You thrive in collaborative environments and enjoy working with other functions (e.g. product, design, AI research)
Job Responsibility
Job Responsibility
  • Experimentation to push the limits on quality for our video editing agent (eg. harness and tool design, token/context optimizations, RL/new model development, multimodal, etc)
  • Building a best-in-class product experience for agentic video editing interactions and improving user retention (eg. tuning prompt templates and other key user workflows, addressing user feedback, building key product capabilities like chat history, etc)
  • Laying the foundations for a best-in-class developer experience for building agents (eg. logging, evals framework, online monitoring and feedback loops, etc)
What we offer
What we offer
  • generous healthcare package
  • 401k matching program
  • catered lunches
  • flexible vacation time
  • Fulltime
Read More
Arrow Right

Principal Product Manager

We are looking for a deeply technical and forward thinking Principal Product Man...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Bachelor's Degree AND 12+ years experience in product/service/program management or software development OR equivalent experience
  • 4+ years experience taking a product, feature, or experience to market (e.g., design, addressing product market fit, and launch, internal tool/framework)
  • 6+ years experience improving product metrics for a product, feature, or experience in a market (e.g., growing customer base, expanding customer usage, avoiding customer churn)
  • 6+ years experience disrupting a market for a product, feature, or experience (e.g., competitive disruption, taking the place of an established competing product)
  • Demonstrated technical depth across LLMs and line of business systems, with proven experience leading AI/LLM evaluation strategy—including offline/online eval frameworks, rubric and AI judge design, and defining measurable quality bars for agentic tools and orchestration workflows
  • Cross-functional collaboration skills, with the ability to influence across engineering, research, design and business teams
  • Exceptional written and verbal communication skills, with a knack for storytelling and clear articulation of complex ideas
Job Responsibility
Job Responsibility
  • Define and own the evaluation strategy for all 1P and 3P Agentic tools like MCP servers, skills etc. including tool invocation success, tool quality, trajectory evaluation, intent detection, and scenario‑level scoring
  • Develop a unified framework covering offline evals, online evals, AI‑judge‑based evals, and assertion‑based rubric design
  • Partner with engineering to evolve internal platforms like Agent 365 Evals, Agent Arena, dashboards, CI/CD‑integrated nightly evals, and metrics pipelines
  • Create grading frameworks, mapping strategies, and ground truth generation mechanisms, including automation for user‑intent derivation
  • Establish Cross‑Model, Cross‑Orchestrator Eval Infrastructure i.e. ensure agentic tools reliably work across all major LLMs and orchestrators
  • Design and maintain evaluation suites that capture model regressions, tool invocation drift, and scenario fidelity as products evolve
  • Drive alignment with internal partners and ISV teams to ensure consistent evaluation approaches, shared pipelines, and consolidated quality dashboards
  • Define product readiness criteria for 1P/3P tools, aligning certification requirements for partner‑built agentic tools
  • Partner with responsible AI, security, governance, and compliance teams to ensure eval frameworks respect enterprise boundaries and safety constraints
  • Track the latest developments in multi‑agent evaluation frameworks, trajectory alignment research, and AI behavioral evals
  • Fulltime
Read More
Arrow Right
New

AI/ML Engineer

COME BUILD THE FIRST AI-NATIVE TAX SOFTWARE. Tax software hasn't had a real shak...
Location
Location
United States , San Francisco
Salary
Salary:
145000.00 - 165000.00 USD / Year
helpcare.ai Logo
Helpcare AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep fluency in the modern AI toolkit — LLMs, RAG, agents, fine-tuning, prompt engineering, evals
  • Strong engineering chops — highly proficient in Python and comfortable picking up whatever the job demands: JS/TS, GCP infrastructure, new ML tooling
  • Obsessively product-minded
  • Fast and iterative
  • Ownership-oriented
  • Collaborative
  • Applicants must be a U.S. citizen or green card holder
Job Responsibility
Job Responsibility
  • Design, build, and ship the core intelligence behind Keeper's product
  • Work across the full stack of modern AI, from model development to prompt engineering to production infrastructure
  • Agentic tax filing — orchestrating LLM-powered agents that can reason about a user's full tax picture and take action on their behalf
  • Document intelligence — extending our best-in-class transformer parser to extract holistic tax context from uploaded documents
  • AI tax assistant — pushing our already industry-leading assistant further with better retrieval, richer tool use, and deeper tax reasoning
  • Evals and reliability — building the testing and evaluation infrastructure that lets us ship AI features with confidence
  • Core ML models — iterating on our NLP and classification models for transaction processing and write-off detection
  • Fulltime
Read More
Arrow Right

Software Engineer, Agents

At Harvey, we’re transforming how legal and professional services operate — not ...
Location
Location
United States , San Francisco
Salary
Salary:
165000.00 - 312000.00 USD / Year
harvey.ai Logo
Harvey
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Passion for building effective domain-specific agents
  • Iterative mindset: you develop proof of concepts, make decisions quickly, and ship v0s
  • Comfortable with when and how to use evaluations to drive quality
  • Humble and adaptable about code and frameworks. We expect you to drive adoption of new best practices as they develop
  • 3+ years (post-BS/MS) of software engineering experience
  • Proficiency in Python and experience working with LLM APIs and agent frameworks
  • Experience with shipping user-facing products, either on the backend or full-stack
Job Responsibility
Job Responsibility
  • Partner with customers and PMs to understand legal workflows, design practical evaluations that capture what “excellent” means, and ship agents that get the job done
  • Optimize agent performance through prompt engineering, model selection, tool design, skill writing, context window management, and eval harness development
  • Work with our model infra team to design and implement infrastructure for low-latency agent execution, including caching strategies, parallel tool calls, or subagent patterns
  • Improve our observability and instrumentation to profile agent behavior, identify bottlenecks, and drive optimization decisions
  • Stay current on new developments in agentic systems and bring those learnings back to the products we build
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits (401k match up to 4%)
  • flexible PTO
  • equity plan
  • bonus
  • Fulltime
Read More
Arrow Right

Applied AI Engineer - Agent

We’re hiring an Applied AI Engineer to push the boundaries of our Cofounder agen...
Location
Location
United States , New York
Salary
Salary:
250000.00 - 300000.00 USD / Year
The General Intelligence Company of New York
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years backend engineering experience, preferably Python
  • Hands-on LLM experience: prompt engineering, function-calling, retrieval, embeddings, evaluation design
  • you’ve shipped LLM features to production
  • Track record building evaluation harnesses and using them to drive improvements (regression suites, task success metrics, cost/runtime tradeoffs)
  • Solid distributed systems fundamentals: concurrency, reliability, performance, data modeling, lifecycle management
  • Pragmatic experimentation: hypothesis → prototype → measured improvement → rollout
  • Excellent debugging and instrumentation skills
  • you enjoy finding and fixing edge cases in the wild
Job Responsibility
Job Responsibility
  • Design and implement agent improvements end-to-end: prompting strategies, tool selection, action planning, memory usage, safety/guardrails, and recovery paths
  • Build robust evaluation pipelines for the agent: offline evals (golden tasks, regression suites, behavior tests), online metrics (latency, success rate, fallout modes, cost efficiency), and experimentation frameworks (A/B, canaries, guardrail thresholds)
  • Productionize applied LLM techniques: function/tool-calling orchestration, self-reflection, retrieval/RAG, multi-agent handoffs, caching/embedding strategies, and hallucination reduction
  • Improve core backend systems: reliable job orchestration, retries/backoff, idempotency, and auditability
  • scalable memory and context routing
  • data pipelines across Gmail, Slack, Notion, Linear, Google Workspace, etc.
  • observability and tracing for agent actions/outcomes
  • Partner with product and infra to define success metrics and ship fast, safe iterations
  • Write clean, well-tested code
  • document design decisions and runbooks
What we offer
What we offer
  • Competitive salary and meaningful equity
  • Comprehensive benefits and flexible work setup
  • Fulltime
Read More
Arrow Right

AI Engineer

Our next frontier is a strategic shift: We're evolving beyond traditional analyt...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
mvfglobal.com Logo
MVF
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Python and service development: write clean, typed, production-ready code
  • comfortable with Pydantic, Asyncio, and FastAPI
  • treat prompts as code: versioned, tested, and decoupled from business logic
  • Cloud-native experience: hands-on experience deploying and operating containerised services on AWS (or GCP/Azure) using CI/CD platforms (Jenkins, GitHub Actions, CircleCI, BuildKite), cloud monitoring tools (Datadog, Sumologic, NewRelic), and container orchestrators (EKS, ECS)
  • comfortable with Terraform for infrastructure as code
  • Hands-on LLM experience: built something real with language models, whether production systems, serious side projects, or internal tools
  • understand that prompting is engineering, not magic
Job Responsibility
Job Responsibility
  • Architect & Engineer Agentic Systems: Build agents that act, not just answer
  • design agents that perform deterministic actions based on probabilistic reasoning
  • build systems that can reliably analyse data, execute function calls, and manage state across multi-step workflows without getting stuck in loops
  • Production-Grade RAG: go beyond basic vector search
  • implement hybrid search (keyword + semantic), re-ranking strategies, and metadata filtering
  • Structured Data Extraction: build pipelines that turn unstructured conversations into structured data that our downstream systems can use
  • Establish AI Engineering Foundations: Observability First: implement the "nervous system" of our AI
  • choose and set up tools (e.g., LangSmith, LangFuse, ADK, or custom) to trace execution chains
  • Evals as a Service: build the testing harness
  • create automated evaluation pipelines that test prompts against "Golden Datasets"
What we offer
What we offer
  • Summer Fridays
  • Competitive holiday benefits - 25 days a year paid holiday, plus 8 bank holidays (increases 1 day a year up to 30 days)
  • Hybrid working - 3 days a week in the office
  • Closed for Christmas holidays - Extra days not taken from your annual holiday allowance
  • Work from anywhere for 2 weeks a year
  • Life Assurance and Income Protection to protect your loved ones
  • Benefits allowance for health, dental, and vision coverage
  • Six months paid maternity leave, and one month paid paternity leave (subject to qualifying conditions) inclusive of same-sex and adoptive parents
  • Defined Contribution Pension and Salary Sacrifice Scheme
  • Be Well: Our award-winning wellbeing and mental health programme to support all MVFers and their families
  • Fulltime
Read More
Arrow Right

Software Engineer, Applied Evals

Applied Evals defines what good looks like for safe, advanced AI systems. We tur...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 325000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
  • Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
  • Familiarity with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context
  • Familiarity with deep learning concepts or prior exposure to training models
  • Ability to communicate clearly across technical and non-technical audiences across levels
  • Motivated by high-impact collaboration with research and product teams and thrive in ambiguity
Job Responsibility
Job Responsibility
  • Define the core evaluation signals that drive model improvement at OpenAI, turning vague product gaps into crisp, defensible measures of quality
  • Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
  • Prototype solutions with real workflows and convert them into scalable feedback loops
  • Connect evaluation signals directly to research and training systems so product improvements show up in what users experience
  • Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured
  • Build reusable systems and tools that enable contributions from across the company and steadily raise the quality bar
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Founding AI Engineer

Flint is defining the next generation of websites: ones that can constantly buil...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 250000.00 USD / Year
tryflint.com Logo
Flint
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You are a product-first engineer
  • You have built and deployed AI products that real users depend on
  • You have extensive hands-on experience with prompt engineering, context optimization, and model fine-tuning
  • You have strong full-stack engineering skills with experience integrating AI into web applications
  • You regularly use AI IDEs like Cursor or Windsurf to code
  • You take ownership of problems from start to finish and are eager to learn whatever you need to succeed
  • You’re comfortable in a fast-paced environment with evolving requirements
  • You have enthusiasm for building MVPs and iterating based on user feedback
Job Responsibility
Job Responsibility
  • Designing and implementing production-grade agentic systems that power web experiences and marketing workflows, leveraging advanced prompt engineering and multi-agent architectures to do so
  • Architecting and optimizing RAG (Retrieval-Augmented Generation) systems with context engineering
  • Strengthening our engineering processes through robust evals, observability, and testing
  • Developing and maintaining the core application across frontend, backend, and infrastructure
  • Shaping our technical vision and leading architectural discussions. You'll make foundational decisions that influence both Flint's technical architecture and engineering culture.
  • Creating and testing prototypes to gather rapid feedback from users
What we offer
What we offer
  • Historic office in Jackson Square (with 2 rooftops and a shower)
  • A crazy upside-down battlestation
  • 20 PTO days a year and full health benefits
  • Lunch and a fully stocked snack bar at the office, incl. Cometeer coffee pods
  • Dog friendly
  • Competitive salary + equity
  • Fulltime
Read More
Arrow Right