CrawlJobs Logo

Manager, Agent Evaluation

comcastadvertising.com Logo

Comcast Advertising

Location Icon

Location:
United States , Washington D.C.

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

183063.62 - 274595.42 USD / Year

Job Description:

The Agent Evaluation team is responsible for testing whether AI agents return the correct and expected responses. We build the framework, metrics, and test cases that validate agent behavior, accuracy, and reliability before release. Our goal is to ensure agents perform consistently and meet product and user expectations. The Manager, Agent Evaluation will lead the team responsible for building and scaling the evaluation framework that tests whether AI agents return accurate, reliable, and expected responses across real-world scenarios.

Job Responsibility:

  • Lead and grow a team focused on agent and model evaluation
  • Define the strategy, roadmap, and standards for agent testing and validation
  • Oversee development of metrics, benchmarks, and testing frameworks to measure response quality, accuracy, safety, and performance
  • Ensure evaluation coverage aligns with product, UX, and business requirements
  • Partner closely with Product, Engineering, Research, and Platform teams to integrate evaluation into the development lifecycle
  • Drive experimentation and continuous improvement of evaluation methodologies
  • Establish reporting mechanisms to clearly communicate evaluation results and trade-offs to leadership
  • Implement best practices for model versioning, monitoring, and release validation
  • Stay current with advancements in LLMs, AI agents, and evaluation techniques

Requirements:

  • Strong foundation in machine learning fundamentals and applied ML systems
  • Hands-on experience with model and agent evaluation methodologies
  • Familiarity with LLMs, AI agents, and prompt-driven systems
  • Proficiency in Python and modern ML frameworks (e.g., PyTorch, TensorFlow)
  • Experience defining metrics, benchmarks, and experimentation frameworks
  • Solid understanding of MLOps practices, including model versioning, monitoring, and CI/CD
  • Ability to collaborate effectively with product, platform, and research teams
  • Clear communicator of technical trade-offs, evaluation insights, and results
  • Master's Degree
  • 5-7 Years Relevant Work Experience
What we offer:
  • Paid Time off
  • Physical Wellbeing benefits
  • Financial Wellbeing benefits
  • Emotional Wellbeing benefits
  • Life Events + Family Support benefits

Additional Information:

Job Posted:
February 13, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Manager, Agent Evaluation

Senior Product Manager, AI Agents

This role owns AI research, messaging, and context—spanning both the user experi...
Location
Location
United States
Salary
Salary:
187000.00 - 250000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in product management
  • 2+ years experience launching AI/ML new products and scaling existing products
  • Track record of shipping AI features that drove measurable business outcomes
  • Experience with LLM-powered applications, prompt engineering, evaluation frameworks, and model selection tradeoffs
  • Comfortable working in Python/SQL to analyze data, prototype prompts, and evaluate outputs
  • Understanding of LLM architectures, RAG pipelines, agent frameworks, and inference optimization
  • Obsession with quality over speed
  • GTM or sales tech experience (strongly preferred)
  • Familiarity with sales workflows, prospecting tools, or CRM systems
  • Understanding of why sales teams are skeptical of AI tools and what it takes to earn their trust
Job Responsibility
Job Responsibility
  • Develop and execute a strategic roadmap for AI research, messaging, and context capabilities
  • Enhance Apollo's AI research agents to surface actionable insights from the web
  • Define how AI understands each user's business
  • Own AI-powered messaging tools that create personalized, context-aware emails at scale
  • Build and scale evaluation infrastructure across accuracy, relevance, clarity, and tone
  • Partner with engineering, design, prompt writers, and sales to deliver cohesive AI experiences
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

AI Engineering Manager - Internal AI Agent

We are looking for an AI Engineering Manager to drive Mirakl's internal AI trans...
Location
Location
France , Paris
Salary
Salary:
Not provided
mirakl.com Logo
Mirakl
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in AI/ML or software engineering
  • Proven track record building AI agents using LLMs, RAG, MCP and related technologies
  • Strong technical proficiency in Python and multiple programming languages, with architectural design experience
  • Production deployment expertise - you've shipped AI solutions to real users
  • Technical pragmatism - ability to match the right technology to the use case
  • Curiosity and continuous learning - you stay current with AI/ML trends
  • 1+ years experience as a Lead or management roles (team management or technical leadership)
  • Strong leadership skills - you inspire and develop high-performing engineering teams
  • Cross-functional stakeholder management - you build relationships and excel at working with all organizational levels & functions
  • Strong communication & presentation skills - in both English and French
Job Responsibility
Job Responsibility
  • Partner closely with Mirakl teams & leadership to identify & prioritize opportunities, redesign workflows around AI agents, and drive adoption at scale
  • Lead and mentor a team of cross-functional AI engineers, defining your team’s roadmap to support strategic AI initiatives
  • Build advanced Mirakl-specific AI agents centrally, owning the complete delivery cycle from discovery to production deployment and operations
  • Foster organization-wide AI adoption by animating internal communities, providing self-service tools, training & support to empower teams as autonomous AI builders
  • Establish & scale technical standards & stack to ensure secure, compliant & high-quality deliverables across all internal AI projects
  • Explore emerging AI paradigms, evaluate new tools and technologies, and maintain active technology watch
Read More
Arrow Right

AI Engineering Manager - Internal AI Agent

We are looking for an AI Engineering Manager to drive Mirakl's internal AI trans...
Location
Location
France
Salary
Salary:
Not provided
mirakl.com Logo
Mirakl
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in AI/ML or software engineering
  • Proven track record building AI agents using LLMs, RAG, MCP and related technologies
  • Strong technical proficiency in Python and multiple programming languages, with architectural design experience
  • Production deployment expertise - you've shipped AI solutions to real users
  • Technical pragmatism - ability to match the right technology to the use case
  • Curiosity and continuous learning
  • 1+ years experience as a Lead or management roles (team management or technical leadership)
  • Strong leadership skills
  • Cross-functional stakeholder management
  • Strong communication & presentation skills - in both English and French
Job Responsibility
Job Responsibility
  • Partner closely with Mirakl teams & leadership to identify & prioritize opportunities, redesign workflows around AI agents, and drive adoption at scale
  • Lead and mentor a team of cross-functional AI engineers, defining your team’s roadmap to support strategic AI initiatives
  • Build advanced Mirakl-specific AI agents centrally, owning the complete delivery cycle from discovery to production deployment and operations
  • Foster organization-wide AI adoption by animating internal communities, providing self-service tools, training & support to empower teams as autonomous AI builders
  • Establish & scale technical standards & stack to ensure secure, compliant & high-quality deliverables across all internal AI projects
  • Explore emerging AI paradigms, evaluate new tools and technologies, and maintain active technology watch
Read More
Arrow Right

AI Engineering Manager - Internal AI Agent

We are looking for an AI Engineering Manager to drive Mirakl's internal AI trans...
Location
Location
France , Bordeaux
Salary
Salary:
Not provided
mirakl.com Logo
Mirakl
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in AI/ML or software engineering
  • Proven track record building AI agents using LLMs, RAG, MCP and related technologies
  • Strong technical proficiency in Python and multiple programming languages, with architectural design experience
  • Production deployment expertise - you've shipped AI solutions to real users
  • Technical pragmatism - ability to match the right technology to the use case
  • Curiosity and continuous learning
  • 1+ years experience as a Lead or management roles (team management or technical leadership)
  • Strong leadership skills
  • Cross-functional stakeholder management
  • Strong communication & presentation skills - in both English and French
Job Responsibility
Job Responsibility
  • Partner closely with Mirakl teams & leadership to identify & prioritize opportunities, redesign workflows around AI agents, and drive adoption at scale
  • Lead and mentor a team of cross-functional AI engineers, defining your team’s roadmap to support strategic AI initiatives
  • Build advanced Mirakl-specific AI agents centrally, owning the complete delivery cycle from discovery to production deployment and operations
  • Foster organization-wide AI adoption by animating internal communities, providing self-service tools, training & support to empower teams as autonomous AI builders
  • Establish & scale technical standards & stack to ensure secure, compliant & high-quality deliverables across all internal AI projects
  • Explore emerging AI paradigms, evaluate new tools and technologies, and maintain active technology watch
Read More
Arrow Right

Assistant Construction Project Manager

This role focuses on managing a range of Private Residential projects, from priv...
Location
Location
United Kingdom , London
Salary
Salary:
28000.00 - 38000.00 GBP / Year
https://brandonjames.co.uk Logo
Brandon James
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Holds a degree in Project Management or an equivalent qualification
  • Aspiring to achieve chartership in the future
  • Strong communication skills, both written and verbal
  • Keen interest in the field of high-end private residential construction
  • Able to effectively support senior team members in project management tasks
Job Responsibility
Job Responsibility
  • Assist in the setup and governance of high-end private residential projects
  • Monitor project processes, ensuring compliance and efficiency
  • Conduct due diligence and quality assurance checks
  • Assist in financial monitoring and progress reporting of projects
  • Participate in project audits and post-project evaluations
What we offer
What we offer
  • 25 Days holiday + Bank holidays
  • Hybrid working
  • Pension contribution
  • APC Support
  • Clear progression pathway
  • Supportive culture
  • Internal training programmes
  • Flexible working conditions
  • Birthday off
  • Company phone and laptop
  • Fulltime
Read More
Arrow Right

Sr. Software Engineer (Agentic Runtime)

Dialpad’s AI Engineering organization is responsible for building and maintainin...
Location
Location
Argentina , Buenos Aires
Salary
Salary:
Not provided
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–6 years of experience in distributed systems, platform engineering, or ML infrastructure, with exposure to LLM-based or agentic systems strongly preferred
  • Strong understanding of agent architectures, including ReAct, plan-and-execute, and multi-agent coordination patterns
  • Deep knowledge of context management, prompt lifecycle, tool-call protocols (e.g., function calling, MCP), and agent memory strategies (short-term, episodic, and long-term)
  • Experience integrating and managing external tool ecosystems, including web search, code interpreters, databases, and third-party APIs
  • Familiarity with retrieval-augmented generation (RAG) and how retrieval fits into broader agentic pipelines
  • Understanding of LLM output reliability challenges — hallucination, non-determinism, and retry/fallback strategies at runtime
  • Proficiency in Go and Python 3 (experience with Rust or TypeScript is a plus)
  • Strong understanding of distributed systems, microservices, and event-driven architectures suited to long-running agent tasks
  • Passion for real-time performance optimization, including streaming responses, async execution, and parallel tool invocation
  • Experience with API design using OpenAPI, Swagger, or equivalent, with an eye toward agentic interaction patterns
Job Responsibility
Job Responsibility
  • Contribute to the design, development, and maintenance of agentic runtime systems, including agent orchestration, tool execution pipelines, and multi-step reasoning loops
  • Build and optimize core runtime components, including task planners, action dispatchers, memory managers, and context window management systems
  • Work on agent coordination techniques, including dynamic tool selection, parallel agent execution, state management, and result aggregation across multi-agent workflows
  • Maintain and enhance highly scalable agentic platforms with a focus on low-latency execution, cost efficiency, and deterministic behavior
  • Ensure high availability, reliability, and fault tolerance in agent runtime services, including graceful degradation when LLM or tool calls fail
  • Collaborate with cross-functional teams — including ML researchers, product, and platform engineers — to translate agentic product requirements into robust runtime infrastructure
  • Develop and optimize real-time distributed systems, microservices, and event-driven architectures powering agentic task execution
  • Design and implement sandboxed execution environments for safe agent use of tools, code execution, and external API calls
  • Implement and maintain monitoring, alerting, and performance metrics covering agent run success rates, token consumption, latency, and cost attribution
  • Evaluate and integrate emerging agentic frameworks, LLM APIs, and tooling ecosystems to continuously improve platform capabilities
What we offer
What we offer
  • Competitive benefits and perks
  • Robust training program
  • Inclusive office environment
  • Recognized Great Place to Work culture
Read More
Arrow Right

Unit Business Risk & Compliance Agent

You could think that we have supernatural powers, but the truth is that our team...
Location
Location
Canada , Richmond
Salary
Salary:
19.37 CAD / Hour
https://www.ikea.com Logo
IKEA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You have previous experience in the Health and Safety and Security sector and/or Safety and Security experience within a Retail environment
  • You’re knowledgeable of relevant safety standards and regulations, security processes, tools and working methods
  • You’re energized by the implementation of safeguards that bring value to the business and protect the financial and moral position
  • you can ensure the integrity of safety and security systems, guidelines and documentation
  • You know how to conduct a risk assessment and implement the hierarchy of controls
  • You have good communication and documentation skills in dealings with various levels of management
  • You think and work in a risk-based way (i.e. Evaluate trade-offs between potential costs and benefits and acts accordingly)
  • You have good analytical and numerical skills
Job Responsibility
Job Responsibility
  • Promote risk management in the unit, informing and sharing expertise in order to develop risk-aware decision taking in relation to unit goals and unit business plan
  • Support co-workers, by providing expertise, in acting in accordance with Ingka Risk & Compliance Rules and Local legislation on Health Safety and Security to secure a safe environment for customers and co-workers
  • Promote and ensure completion of trainings needed and facilitate for unit employees
  • Support a Risk & Compliance culture by utilizing systems to detect, analyze and reduce business loss and financial impact
  • Ensure the reporting of relevant figures for co-workers, customer and visitor incidents to establish progress and areas for improvement
What we offer
What we offer
  • Wellness days (in addition to your vacation days!)
  • Extended health, dental, and vision coverage (for you and your family)
  • RRSP with IKEA contribution matching options
  • Eligibility for our annual IKEA bonus incentive plan
  • Flexible spending account
  • Life insurance
  • Merchandise and restaurant discounts (plus free drinks and different healthy meal options in the co-worker restaurant, where available)
  • Parental leave
  • Bereavement leave
  • Employee assistance program (that helps you support your mental, physical, and financial wellbeing)
  • Fulltime
Read More
Arrow Right

Principal Product Manager

We are looking for a deeply technical and forward thinking Principal Product Man...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Bachelor's Degree AND 12+ years experience in product/service/program management or software development OR equivalent experience
  • 4+ years experience taking a product, feature, or experience to market (e.g., design, addressing product market fit, and launch, internal tool/framework)
  • 6+ years experience improving product metrics for a product, feature, or experience in a market (e.g., growing customer base, expanding customer usage, avoiding customer churn)
  • 6+ years experience disrupting a market for a product, feature, or experience (e.g., competitive disruption, taking the place of an established competing product)
  • Demonstrated technical depth across LLMs and line of business systems, with proven experience leading AI/LLM evaluation strategy—including offline/online eval frameworks, rubric and AI judge design, and defining measurable quality bars for agentic tools and orchestration workflows
  • Cross-functional collaboration skills, with the ability to influence across engineering, research, design and business teams
  • Exceptional written and verbal communication skills, with a knack for storytelling and clear articulation of complex ideas
Job Responsibility
Job Responsibility
  • Define and own the evaluation strategy for all 1P and 3P Agentic tools like MCP servers, skills etc. including tool invocation success, tool quality, trajectory evaluation, intent detection, and scenario‑level scoring
  • Develop a unified framework covering offline evals, online evals, AI‑judge‑based evals, and assertion‑based rubric design
  • Partner with engineering to evolve internal platforms like Agent 365 Evals, Agent Arena, dashboards, CI/CD‑integrated nightly evals, and metrics pipelines
  • Create grading frameworks, mapping strategies, and ground truth generation mechanisms, including automation for user‑intent derivation
  • Establish Cross‑Model, Cross‑Orchestrator Eval Infrastructure i.e. ensure agentic tools reliably work across all major LLMs and orchestrators
  • Design and maintain evaluation suites that capture model regressions, tool invocation drift, and scenario fidelity as products evolve
  • Drive alignment with internal partners and ISV teams to ensure consistent evaluation approaches, shared pipelines, and consolidated quality dashboards
  • Define product readiness criteria for 1P/3P tools, aligning certification requirements for partner‑built agentic tools
  • Partner with responsible AI, security, governance, and compliance teams to ensure eval frameworks respect enterprise boundaries and safety constraints
  • Track the latest developments in multi‑agent evaluation frameworks, trajectory alignment research, and AI behavioral evals
  • Fulltime
Read More
Arrow Right