This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This Principal Software Engineering Manager role is about delivering step-change improvements in telemetry, detection, and recovery across core infrastructure and foundational services. You will lead efforts that enable rapid, localized issue detection and resilient recovery at global scale, ensuring Azure Core meets the highest standards of performance and operational excellence. You will collaborate across Azure and Microsoft to integrate with existing systems while introducing modern approaches that maximize impact and efficiency. This role involves leveraging and contributing to open-source frameworks and communities. This is a rare opportunity to build a new team, shape platform-wide observability strategy, and deliver solutions that matter at hyperscale.
Job Responsibility:
Build and lead a team focused on observability for Azure Core services, driving improvements in telemetry, detection, and recovery
Advance existing standards and introduce innovations that raise reliability and operational excellence
Design scalable, resilient systems that enable rapid issue detection and automated recovery across global deployments
Collaborate with engineering and product teams to integrate with existing systems while introducing modern approaches for maximum impact
Engage with open-source communities and frameworks to adopt best practices and contribute improvements
Guide incident response and ensure high availability for critical services
Foster a culture of quality, innovation, and continuous improvement
Requirements:
Bachelor's Degree in Computer Science or related technical field
12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
5+ years of software engineering experience, including hands-on technical management
5+ years of experience recruiting and managing technical teams, including performance management
3+ year(s) of demonstrated experience in distributed systems, observability tooling, and operational excellence
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check
Nice to have:
Experience managing managers or multiple engineering teams in a global environment
Demonstrated ability to improve operational excellence and reliability at scale, including automation and recovery strategies
Experience in service reliability engineering and incident management for mission-critical systems
Proficiency in contributing to or adopting open-source frameworks and standards, including engagement with developer communities