Senior Site Reliability Engineer Job at Microsoft Corporation (Redmond)

Job Description

Are you interested in working on cutting-edge cloud security products Would you like to be part of one of the world's most advanced cyber-security solutions and protect millions of computers from thousands of active attack attempts, every month Look no further than the Microsoft Defender engineering team. We are looking for a Senior Site Reliability Engineer who will be building and delivering cloud solutions to meet the scale that few companies in the industry are required to support. Leveraging state-of-the-art technologies, you will be instrumental in delivering holistic protection within highly sensitive and secure government environments. The Microsoft Defender team is responsible for delivering a constantly evolving set of services and solutions to meet the challenging landscape of our ever-evolving attackers. This is a team which provides on-call operational support and improvements to the operational posture of the Microsoft Defender products within US Government clouds. You will operate our production services, and work closely with other engineering teams to ensure services and systems are highly stable, meet performance SLAs, and meet the expectations of Internal and external customers and users. The Microsoft Defender team is responsible for delivering a constantly evolving set of services and solutions to meet the challenging landscape of our ever-evolving attackers. Microsoft's mission is to empower every person and every organization on the planet to achieve more.

Job Responsibility

Ensure 24x7 Service Reliability: Act as a Designated Responsible Individual (DRI) in an on-call rotation, leading incident response and resolution to maintain uptime and performance for Microsoft's most critical services
Support and Automate Deployments: Execute and improve manual operations and deployments for our products, while designing automation to scale and streamline those processes across environments
Build Scalable Systems: Develop automation for monitoring, alerting, debugging, and deployment to reduce manual effort and accelerate safe, reliable delivery
Drive Compliance and Security: Ensure systems meet Microsoft's standards for security, privacy, and accessibility, especially when onboarding new technologies
Lead Post-Incident Learning: Conduct postmortems, share insights, and implement solutions that prevent recurrence—fostering a culture of learning and continuous improvement
Collaborate Across Teams: Partner with engineering and product teams to align reliability goals with customer needs and deliver seamless user experiences
Stay Ahead Technically: Continuously invest in your technical growth to improve system availability, observability, and performance at scale
Embody our company's Culture and Values

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Candidates must have an active TS and be willing and eligible to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing and eligible to upgrade to TS/SCI (with polygraph). This role will require candidates to maintain the TS/SCI (with polygraph) clearance
Ability to meet Microsoft, customer and/or government security screening requirements are required pre-offer and post-hire for this role
Failure to maintain or obtain the appropriate clearance and/or customer screening requirements may result in employment action up to and including termination
This position requires successful verification of the stated security clearance to meet federal government customer requirements
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Nice to have

Doctorate Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 6+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
3+ years technical experience working with large-scale cloud or distributed systems
Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
Proficiency in one or more programming languages such as C#, Go, Java, or Python, with the ability to develop and maintain production-quality code
Experience with automation that results in measurable improvements
Experience with debugging and troubleshooting complex distributed systems in production environments
Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency
Hands-on experience with CI/CD pipelines, testing, deployment, and reliability tooling

Microsoft Corporation - All Job Offers

Select Country

Senior Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?