This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Are you interested in working on cutting-edge cloud security products? Would you like to be part of one of the world’s most advanced cyber-security solutions and protect millions of computers from thousands of active attack attempts, every month? Look no further than the Microsoft Defender engineering team. We are looking for a Site Reliability Engineer II who will be building and delivering cloud solutions to meet the scale that few companies in the industry are required to support. Leveraging state-of-the-art technologies, you will be instrumental in delivering holistic protection within highly sensitive and secure government environments. The Microsoft Defender team is responsible for delivering a constantly evolving set of services and solutions to meet the challenging landscape of our ever-evolving attackers. This is a team which provides on-call operational support and improvements to the operational posture of the Microsoft Defender products within US Government clouds. You will operate our production services, and work closely with other engineering teams to ensure services and systems are highly stable, meet performance SLAs, and meet the expectations of internal and external customers and users. The Microsoft Defender team is responsible for delivering a constantly evolving set of services and solutions to meet the challenging landscape of our ever-evolving attackers. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Job Responsibility
Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems
Requirements
Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements
Candidates must have an active TS and be willing and eligible to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing and eligible to upgrade to TS/SCI (with polygraph)
Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
2+ years technical experience working with large-scale cloud or distributed systems
Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
Proficiency in one or more programming languages such as C#, Go, Java, or Python, with the ability to develop and maintain production-quality code
Experience with automation that results in measurable improvements (e.g., reduced toil, fewer manual steps, improved system reliability)
Experience with debugging and troubleshooting complex distributed systems in production environments
Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency
Hands-on experience with CI/CD pipelines, testing, deployment, and reliability tooling
Nice to have
Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
2+ years technical experience working with large-scale cloud or distributed systems
Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
Proficiency in one or more programming languages such as C#, Go, Java, or Python
Experience with automation that results in measurable improvements
Experience with debugging and troubleshooting complex distributed systems in production environments
Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency
Hands-on experience with CI/CD pipelines, testing, deployment, and reliability tooling