This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Microsoft Substrate is the foundational cloud platform that powers many of Microsoft’s most critical services including Exchange Online and M365 Copilot, providing shared infrastructure, identity, messaging, storage, and service-to-service capabilities used across Microsoft 365 and related cloud offerings. Substrate services operate at global scale and are designed to deliver high availability, reliability, and security for some of the world’s most demanding workloads. As a Site Reliability Engineer II, you will take ownership of reliability and operational outcomes for specific components or services. You will independently diagnose and resolve production issues, design and implement automation to reduce toil, and contribute to service improvements that enhance availability, scalability, and efficiency. This role requires deeper technical judgment, stronger software engineering fundamentals, and close collaboration with partner teams to ensure reliability, diagnosability, security, and compliance are built into services from design through operation—particularly for services operating in highly-regulated environments.
Job Responsibility
Own reliability and operational health for one or more Substrate components or services in highly regulated environments
Serve as an actively engaged on-call engineer (OCE), participating in an on-call rotation and independently responding to incidents for owned services
Respond to, diagnose, and resolve production incidents with minimal supervision
Design and implement automation to reduce operational toil and improve service stability
Develop and maintain monitoring, alerting, and telemetry to support SLOs and operational metrics
Lead post-incident reviews for owned incidents, focusing on root cause analysis and durable fixes
Collaborate with software engineering teams to embed reliability and operability into service design
Write and maintain production-quality code and automation that improves reliability, scalability, and operational efficiency
Requirements
Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
4+ years technical experience in software engineering, network engineering, or systems administration
ability to meet Microsoft, customer and/or government security screening requirements
ability to obtain and maintain favorably adjudicated Tier 3 (T3) background investigation
ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
must pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have
Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
2+ years technical experience working with large-scale cloud or distributed systems
What we offer
Benefits and other compensation may be eligible
additional benefits and pay information available at https://careers.microsoft.com/us/en/us-corporate-pay