This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Principal Engineering Manager to lead the architecture, delivery and live‑site excellence of RDX deployment services supporting CPS and MASP platforms. This role is responsible for enabling safe, compliant and data‑driven deployment of Microsoft 365 client updates across global enterprise environments by building and operating large‑scale distributed services that power release orchestration, rollout governance and automated recovery. You will work closely with Office application teams and partner platform organizations to ensure changes are introduced through staged, observable and reversible deployment workflows that minimize regressions and reduce customer‑visible impact.
Job Responsibility:
Own CPS and MASP service architecture supporting Office client release and deployment workflows
Drive reliability, scalability and availability of deployment services used across M365 app teams
Enable Safe‑to‑Change release infrastructure through staged rollouts and automated safeguards
Deliver automation‑first rollback and remediation capabilities to minimize customer impact
Define telemetry pipelines and data signals used for release gating, validation and rollback
Leverage usage, reliability and performance data to inform deployment decisions
Enable deployment observability across client environments and tenant segments
Build predictive deployment health signals to proactively detect regressions
Partner with Office Product teams to onboard changes safely through CPS/MASP
Reduce customer escalations and regressions through service‑based deployment controls
Improve release reliability across Monthly Enterprise Channel delivery
Establish measurable deployment quality and recovery SLAs
Lead and grow a team responsible for Tier‑0 deployment services
Coach engineers in distributed systems design, production resiliency and live‑site excellence
Drive cross‑org collaboration across Office apps, platform engineering and compliance teams
Influence engineering strategy across RDX and Office release ecosystem
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Bachelor’s or Master’s degree in Computer Science, Engineering or related field (or equivalent experience)
12+ years of experience building and operating production software or services
5+ years of engineering management experience
Experience with large‑scale distributed systems and service architecture
Experience in deployment safety, reliability engineering or release governance platforms preferred
Experience working with data pipelines, experimentation systems or telemetry platforms
Experience building or operating large‑scale distributed deployment or release orchestration platforms used by multiple product teams
Experience running production cloud services with accountability for availability, reliability and live‑site health (SLO/SLA ownership)
Experience designing or operating safe deployment systems such as staged rollouts, experimentation frameworks, flighting or progressive exposure pipelines
Familiarity with client‑service deployment workflows or enterprise update delivery models at scale
Experience with telemetry pipelines, experimentation platforms, health monitoring systems or deployment validation signals
Demonstrated ability to use production data signals (reliability, performance, usage) to drive release gating or automated rollback decisions
Experience developing automated mitigation, remediation or recovery systems to minimize customer‑visible impact from regressions
Experience onboarding partner engineering teams onto shared infrastructure or platform services
Familiarity with compliance‑aware change management, privacy‑sensitive data collection or enterprise deployment governance models
Experience partnering across multiple engineering organizations to drive adoption of shared platform capabilities
Background in reliability engineering, deployment safety engineering or change management platforms in enterprise environments
Experience improving production outcomes such as reduced customer escalations, lower regression rates, improved release predictability or faster recovery times