Engineer - Agents & Evals Job at Lovable (Stockholm)

Software Engineer, Product (Agents / Evals)

We're looking for strong engineers (backend, frontend, or full-stack) who are ex...

Location

Sweden , Stockholm

Salary:

Not provided

Lovable

Expiration Date

Until further notice

Requirements

Strong engineering fundamentals
ability to build high-quality production systems

Job Responsibility

Build, tune, and scale agents that power lovable products
Add new agent skills and tools
Improve agent reasoning, orchestration, and efficiency
Design how multiple agents collaborate
Select the right models for different task types
Push the limits of what agents can reliably do in real products
Analyze agent behavior and performance
Hill-climb toward better helpfulness, safety, and reliability
Build evaluation frameworks and benchmarks
Create experimentation pipelines and feedback loops

Fulltime

Ai Qa Engineer (Agents)

An AI QA Engineer (Agents) is responsible for ensuring the quality, reliability,...

Location

Ireland , Cork

Salary:

Not provided

Marriott Bonvoy

Expiration Date

Until further notice

Requirements

4+ years' total experience, including 1+ year testing AI/ML applications, LLM integrations, or conversational interfaces
Hands-on experience with end-to-end testing and automation for AI/agentic products
3+ years of experience in software quality assurance or testing
1+ years of experience testing AI/ML applications, LLM integrations, or conversational interfaces
Strong understanding of software testing principles, methodologies, and best practices
Experience writing and maintaining automated tests (unit, integration, or end‑to‑end)
Proficiency in at least one programming language (Python, TypeScript, JavaScript, Java, etc.)
Experience with API testing tools (Postman, REST Assured, etc.) or frameworks
Strong analytical and problem‑solving skills
Excellent attention to detail and ability to identify edge cases

Job Responsibility

Design and execute test plans for AI agents and agentic experiences
Write and maintain automated test suites for agent functionality (unit tests, evals integration tests, end‑to‑end tests)
Perform (minimal)manual testing of agent interactions, workflows, and business logic
Test agent responses, accuracy, and behavior across various scenarios and edge cases
Identify, document, and track bugs through resolution
Collaborate with engineers, product managers, and business stakeholders to understand requirements and acceptance criteria
Participate in test planning, test case design, and test strategy discussions
Create and maintain test data, test scenarios, and test environments for agents
Participate in feature design sessions, highlighting key testing scenarios and fault zones
Execute performance and load testing to ensure agent scalability and response times

Fulltime

Software Engineer, Agents

At Harvey, we’re transforming how legal and professional services operate — not ...

Location

United States , San Francisco

Salary:

165000.00 - 312000.00 USD / Year

Harvey

Expiration Date

Until further notice

Requirements

Passion for building effective domain-specific agents
Iterative mindset: you develop proof of concepts, make decisions quickly, and ship v0s
Comfortable with when and how to use evaluations to drive quality
Humble and adaptable about code and frameworks. We expect you to drive adoption of new best practices as they develop
3+ years (post-BS/MS) of software engineering experience
Proficiency in Python and experience working with LLM APIs and agent frameworks
Experience with shipping user-facing products, either on the backend or full-stack

Job Responsibility

Partner with customers and PMs to understand legal workflows, design practical evaluations that capture what “excellent” means, and ship agents that get the job done
Optimize agent performance through prompt engineering, model selection, tool design, skill writing, context window management, and eval harness development
Work with our model infra team to design and implement infrastructure for low-latency agent execution, including caching strategies, parallel tool calls, or subagent patterns
Improve our observability and instrumentation to profile agent behavior, identify bottlenecks, and drive optimization decisions
Stay current on new developments in agentic systems and bring those learnings back to the products we build

What we offer

Comprehensive health, dental and vision coverage
retirement benefits (401k match up to 4%)
flexible PTO
equity plan
bonus

Fulltime

LLM Inference Performance & Evals Engineer

Join the inference model team dedicated to bring up the state-of-the-art models,...

Location

Canada , Toronto

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

3+ years building high-performance ML or systems software
Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
Strong debugging skills across performance, numerical accuracy, and runtime integration
Prior experience in modeling, compilers or crafting benchmarks or performance studies
not just black-box QA tests
Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Job Responsibility

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
Keep pace with the latest open- and closed-source models
run them first on wafer scale to expose new optimization opportunities

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Software Engineer, Applied Evals

Applied Evals defines what good looks like for safe, advanced AI systems. We tur...

Location

United States , San Francisco

Salary:

230000.00 - 325000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
Familiarity with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context
Familiarity with deep learning concepts or prior exposure to training models
Ability to communicate clearly across technical and non-technical audiences across levels
Motivated by high-impact collaboration with research and product teams and thrive in ambiguity

Job Responsibility

Define the core evaluation signals that drive model improvement at OpenAI, turning vague product gaps into crisp, defensible measures of quality
Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
Prototype solutions with real workflows and convert them into scalable feedback loops
Connect evaluation signals directly to research and training systems so product improvements show up in what users experience
Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured
Build reusable systems and tools that enable contributions from across the company and steadily raise the quality bar

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Senior Software Engineer, AI Evals

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for ...

Location

United States , San Francisco

Salary:

240000.00 - 280000.00 USD / Year

Sentry

Expiration Date

Until further notice

Requirements

Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
Comfort writing production-quality code (we use Python and TypeScript)
Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)

Job Responsibility

Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring

What we offer

Offers Equity
incentive compensation
equity grants
paid time off
group health insurance coverage

Fulltime

Rag + Agentic Ai Engineer

We are looking for a highly skilled RAG + LLM-based Agentic AI Engineer to build...

Location

India , Pune; Mumbai; Bengaluru; Indore; Jaipur

Salary:

Not provided

Codvo AI

Expiration Date

Until further notice

Requirements

Smart/component-aware chunking
Hybrid retrieval (BM25 + dense)
Reranking
Graph-RAG basics
Query rewriting & multi-hop retrieval
LangGraph
MCP tools
Multi-agent orchestration
Tool-calling workflows
Qdrant or Weaviate

Job Responsibility

Build end-to-end RAG pipelines: smart chunking, hybrid retrieval, reranking, query rewriting
Implement LangGraph-based agentic workflows with tools, planning, and self-healing behaviors
Develop and optimize vector search systems (Qdrant/Weaviate/OpenSearch)
Design multi-tenant, ontology-driven knowledge layers for enterprise data
Implement guardrails, hallucination reduction, and automated evals
Build and maintain document ingestion pipelines (PDF, DOCX, OCR, images)
Architect and deploy FastAPI-based AI microservices with caching and async execution
Integrate LLM systems with enterprise apps (SharePoint, CRM, databases, MCP tools)
Ensure security, PII safety, auditability, and governance requirements are met
Collaborate with solution architects and domain SMEs to translate use cases into working AI systems

What we offer

Work on cutting-edge agentic AI systems at production scale
Ownership over enterprise-grade architectures
Fast growth environment
Influence Codvo's platform and accelerators

Fulltime

New

Product Manager, Enterprise

Luma has built frontier multimodal capabilities — foundation models, Canvas, and...

Location

United States , SF Bay Area

Salary:

225000.00 - 325000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Requirements

Staff or Principal-level product management experience (L5/L6 seniority or above) with a track record of building enterprise products in high-ambiguity environments
You’ve built successful enterprise AI applications — especially agentic products with strong real-world adoption, not just traditional SaaS
You possess functional AI/ML fluency — not just academic. You can reason about model capabilities, evals, latency/cost tradeoffs, fine-tuning vs. prompting vs. tool use, and the shape of a frontier roadmap. You are technical enough to prototype a solution on a whiteboard with a customer’s ML team and credible enough that a research lead actually wants you in the room
You operate effectively across complex cross-functional environments involving product, engineering, research, go-to-market, and customer-facing teams
You have a general manager mindset with P&L visibility and experience — you think in terms of business tradeoffs, commercial viability, and revenue outcomes, not just product metrics
You thrive in significant ambiguity with extremely high agency and a strong self-starter mentality — this is the highest-priority trait for this role
You bring consultative PM skills and the ability to play an advisory role with enterprise customers, asking clarifying questions about needs and commercial viability rather than immediately saying yes to requests
You have a mix of large-company and earlier-stage startup experience, and can work in extremely unstructured environments with limited scaffolding
You can partner with and influence research teams in product development environments, helping ensure frontier model capabilities are shaped by market needs

Job Responsibility

Define product opportunities across enterprise segments by identifying customer profiles, workflows, jobs-to-be-done, and unmet needs in marketing, advertising, and entertainment
Translate Luma’s capabilities across models, Canvas, Agents and APIs into clear product definitions that serve real commercial customers and generate revenue
Drive cross-functional execution across research, product, engineering, go-to-market, and forward-deployed teams within a pod-based team structure
Partner directly with enterprise customers to understand requirements, validate opportunities, and shape product direction — showing up in customer meetings to represent research, product strategy, and vision
Play a consultative product role that connects customer feedback to product strategy and roadmap decisions, complementing forward-deployed engineers and creatives rather than duplicating their work
Bridge market signals back to research and product development teams, establishing feedback loops so that model prioritization and evaluation are informed by real commercial needs
Build agentic products — not just traditional SaaS — that serve clear enterprise needs and can generate revenue immediately while scaling to serve broader demand
Operate with a GM-style mindset, making product choices that align with revenue goals, commercial viability, and practical business outcomes

Fulltime

Select Country

Engineer - Agents & Evals

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?