This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Copilot Tuning is a new product that enables organizations to customize large language models (LLMs) using tenant data, unlocking task-specific agents tailored to real enterprise workflows. We are a team advancing how models are adapted, evaluated, and deployed within Microsoft 365 Copilot, bridging cutting-edge research with production systems to transform the LLM experience in the enterprise. We are looking for candidates for a Research Engineer II role who are passionate about translating research ideas into scalable, reliable ML systems. This role is ideal for candidates with strong engineering fundamentals and experience working with machine learning systems, along with an interest in prototyping, experimentation, and system-building. The candidate will focus on turning research prototypes into production-ready capabilities, building data pipelines, developing and iterating on model tuning workflows, and operationalizing LLM-based systems that meet real-world performance, safety, and quality requirements. The candidate will work closely with applied scientists and engineers to bring research innovations into production, rapidly iterating on ideas, validating them with data, and integrating them into robust services. This includes enabling task-specific agents that reflect organizational knowledge, improving consistency and efficiency, and ensuring solutions generalize across diverse enterprise scenarios while maintaining strong security and compliance guarantees.We provide a nurturing environment for engineers excited about working at the intersection of research and production. We are looking for candidates who excel in problem-solving, experimentation, and system design, and who can navigate ambiguity while driving ideas from concept to shipped experience in a fast-paced, collaborative setting.
Job Responsibility:
Design and build inferencing‑time orchestration services that dynamically adapt Copilot behavior based on tenant context, user intent, and enterprise policies.
Develop runtime systems that integrate LLM inferencing with Microsoft 365 Substrate signals, APIs, and permission models for grounded Copilot responses.
Implement inferencing pipelines that support tool selection, prompt composition, and policy enforcement during Copilot execution.
Enable tenant‑safe Copilot execution by incorporating identity, compliance boundaries, and contextual grounding into inferencing workflows.
Build telemetry and feedback loops to evaluate inferencing performance across latency, relevance, grounding quality, and safety metrics.
Partner with platform teams to support real‑time routing, experimentation, and configuration of Copilot inferencing paths
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Experience developing and scaling machine learning systems from research prototypes to production, including experimentation, evaluation, UI, and iteration on models and pipelines.
Experience collaborating with research scientists to implement, validate, and productize systems, ensuring robustness, reproducibility, and measurable impact.
Experience building and supporting ML-powered services, including model deployment, monitoring, evaluation pipelines, and maintaining reliability and performance in production environments.
Experience with cloud platforms (e.g., Azure, AWS, or GCP) and modern engineering practices (e.g., CI/CD, testing, code reviews) applied to ML systems and experimentation workflows.
Familiarity with Generative AI concepts such as large language models (LLMs), prompt engineering, fine-tuning, or Retrieval-Augmented Generation (RAG), and experience adapting these techniques into real-world applications.