This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Are you a customer-obsessed, engineering-minded program leader who thrives in high-stakes, regulated environments? Do you want to build a new function from the ground up, one that prevents customer outages before they happen and transforms how Microsoft supports its most sensitive cloud customers? Join Advanced Cloud Engineering & Supportability (ACES), a global Azure engineering support organization within Azure Engineering Operations (EngOps). ACES delivers engineering-led, world-class support across Azure's Government and Sovereign cloud portfolio, including US Government (Fairfax), and National Partner Clouds in France (Bleu), Germany (Delos), and Singapore (Merlion). We are building a new Gov Customer Resiliency function within ACES that brings proactive reliability engineering in-house for Government customers. This is not reactive support, this is about changing the probability, blast radius, and recovery time of customer outages through engineering-led detection, readiness, and prevention. The Role: We are hiring a Principal Customer Experience Program Manager to lead two interconnected workstreams under ACES Sovereign & Government: 1. Gov Customer Resiliency (60%) - You will build and operate a new Gov Customer Resiliency function, standing it up from scratch, starting with a named high-profile Government customer and scaling to a portfolio of 3-5 top Gov/Azure Engineering Direct customers. This function brings proactive resiliency capabilities in-house for Government customers under Sovereign & Government business. 2. Sovereign Cloud Operations & Readiness (40%) - You will drive support readiness, operational maturity, and customer experience strategy across Microsoft's Sovereign Cloud portfolio (Bleu, Delos, Merlion).
Job Responsibility:
Stand up a new proactive resiliency function for Government cloud customers, define charter, build playbooks, establish operating cadences, and own the end-to-end engagement model
Own the full resiliency lifecycle: proactive detection and monitoring, incident and crisis coordination, post-incident root cause analysis, and architecture/DR guidance
Drive Gov-vs-Commercial parity closure across monitoring, tooling, incident response, and remediation maturity
Drive resiliency and reliability workshops and customer conversations including Field enablement teams to drive customer value
Scale the resiliency model from a single anchor customer to a portfolio of 3-5 top Government customers using a repeatable, metrics-driven playbook
Develop and deliver internal enablement content such as training materials, case studies, and learning sessions
Define and report on success metrics including mean time to detect, time to engage, incident recurrence, proactive detection rates, and customer confidence
Leverage telemetry, monitoring data, and trend analysis to proactively identify and address emerging risks before they become customer-reported incidents
Partner with reliability engineering, product teams, and delivery leadership to ensure resiliency insights feed into upstream engineering actions, product improvements, & prevention strategies
Drive end-to-end support readiness (people, process, technology) for Microsoft's Sovereign Cloud portfolio across multiple regions and future launches
Design escalation pathways, incident handling standards, and compliance-aligned operational processes for Sovereign environments
Own readiness frameworks for new Sovereign cloud launches, influence design decisions upstream to prevent customer impact
Lead operational reporting and insights
translate data into risk assessments and executive-ready recommendations
Represent Sovereign and Government customer needs in cross-org forums, influencing priorities and investments to strengthen long-term customer trust
Leverage AI, automation, and data-driven insights to proactively identify gaps, reduce risk, and improve customer experience at scale
Extend the Gov Resiliency playbook to Sovereign clouds as they mature, build a unified approach across regulated environments
Drive alignment across geographically distributed teams and operating partners spanning multiple countries and time zones
Embody our culture and values
Requirements:
Bachelor's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 6+ years' experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
US Citizenship & Citizenship Verification
Microsoft Cloud Background Check
Nice to have:
Master's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 8+ years' experience in engineering, product/technical program management, data analysis, or product development OR Bachelor's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 12+ years' experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
Experience in CRE, SRE, ACE, or operational reliability roles within a cloud hyperscaler environment
Hands-on experience with resiliency tooling, platform monitoring and similar detection and incident management systems
Deep knowledge of Sovereign compliance, data residency, and geo-centric architecture models
Track record of executive-level customer engagement
Demonstrated experience in customer-facing resiliency, reliability engineering, or incident management roles
Experience working with government agencies, sovereign entities, or regulated industries
Strong understanding of Azure services and cloud technologies
Proven ability to build new functions or programs from scratch