This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
AI agents are rapidly becoming foundational across industries, transforming how organizations automate workflows, make decisions, and scale productivity. Unlike traditional promptbased systems, modern agents can reason, plan, and execute actions autonomously unlocking new business scenarios and operational efficiencies. By 2028, more than 1.3 billion agents are expected to be in active use worldwide, creating not only unprecedented opportunities but also a massive new ecosystem that organizations must govern, monitor, and secure. Agent 365 is Microsoft’s unified control plane for AI agents—providing a comprehensive platform for agent registry, lifecycle management, access governance, observability, interoperability, and security. The Agent 365 Tools Team builds the tooling, infrastructure, and evaluation systems that empower agents to perform reliably at scale. We are seeking a Research Engineer II to develop core tooling, frameworks, and infrastructure that enable agents to use tools effectively, evaluate tool performance, and drive continuous improvement across the agent ecosystem.
Job Responsibility:
Design and develop evaluation pipelines for both offline and online experimentation, enabling rapid iteration on tool quality. Automate critical performance metrics such as tool success rates, groundedness, latency, reliability, and cost efficiency.
Build and operationalize tool comparison frameworks, including scorecards, dashboards, and automated A/B tests, to support datadriven rollout decisions across large fleets of agents.
Implement telemetry, logging, and realtime monitoring systems to diagnose issues, refine tool interactions, and improve agenttool performance over time.
Strengthen and scale backend infrastructure by implementing caching strategies, ratelimiting mechanisms, security hardening, and safety filters to ensure tools are robust, secure, and productionready.
Develop packages, and internal tools to streamline tool onboarding, lifecycle management, QA, and integration for partner teams and agent developers.
Collaborate closely with Program Managers, Engineering teams, and Responsible AI partners to design, build, and deploy endtoend solutions aligned with safety, compliance, and product requirements.
Contribute to longterm architecture and strategy for the Agent 365 tooling ecosystem, influencing standards for tool quality, interoperability, and governance.
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have:
Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Experience building backend services or APIs in a production environment using at least one modern programming language (e.g., C#, Java, Python, or TypeScript).
Experience working with large language models (LLMs) or similar AI systems in any capacity (development, integration, or evaluation).
Experience deploying or operating distributed systems or cloud-based services (e.g., Azure, AWS, or GCP) in a production environment.