This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re looking for a Platform Engineer, Applied Evaluations to define and operationalize quality for the agentic systems that power Antimetal’s investigation and automation engine. This role is core to our product. You’ll own online and offline evaluation pipelines that operate over petabytes of infrastructure data, and shape agent platform abstractions where necessary to ensure our agents are measurable, debuggable, and reliable. You’ll partner closely with platform, product, and research, leveraging quality signals to accelerate iteration across the company.
Job Responsibility:
Own the evaluation stack: Build online and offline eval pipelines that measure agent quality across ephemeral, voluminous MELT data, code, and unstructured docs. Set the metrics that define the experience
Define quality at scale: Production incidents span hundreds of services–ephemeral, high-volume, and where ground truth is approximative. Design evals that capture trajectory quality, not just final outputs, and validate that your metrics predict real outcomes
Build platform abstractions for agents: Design core agent architectures and extend internal frameworks (e.g. sub-agents, MCPs, middleware) – that lets product, platform, and research iterate with confidence and ship faster
Productionize: Own latency, observability, and uptime
Requirements:
At least 3 years of experience in ML platform engineering, data engineering, or a related role, preferably at a high-growth company
Prior experience designing evaluation systems where ground truth is noisy, high-volume, and hard to label (e.g. computer vision, deep research pipelines)
Strong system design skills: you think about how data flows through distributed systems and how decisions compound at scale
Proven ability to write clean, scalable code and strong data modeling skills
Demonstrated ability to bring ambiguous goals from prototype to production, using data and experimentation to drive product and architectural decisions
Proficient in Python and Typescript, with experience using common ML libraries and data engineering tools
Nice to have:
Experience with SRE-best practices and modern observability (OTEL, distributed tracing)
Strong on ML fundamentals: classification/regression, clustering, dimensionality reduction, evaluation + error analysis, probabilistic ML
Experience with agent architectures: multi-step reasoning, tool use, context management
What we offer:
Pay & ownership — Competitive salary with generous equity grants
Full coverage + retirement — Fully covered health, dental, and vision, plus retirement benefits
Unlimited PTO — Take the time you need to recharge
Dinner on late nights — Working late? Dinner is on us
Fitness stipend — Monthly support for your health and wellness
Tools of the trade — Any equipment you need to do your best work