This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The High Availability (HA) team part of M365 Core, is seeking a Senior Software Engineer - Chaos Engineering. This role is crucial as HA has been a cornerstone of the Substrate backend solution. We continue to explore opportunities for improving and optimizing service reliability. Our continuous strive to provide best service to our customers goes beyond just optimizing the storage stack solution. We work relentlessly on reducing Microsoft capital and operational expenses, as we continue to explore more paths for optimization while maintaining reliable 4.5 9s availability. To achieve that HA has extended its charter beyond traditional database availability and redundancy solution - towards optimizing power efficiency, platform costs, networking costs. The latter will be the major focus of a talented engineer who decides to join our team. Chaos Engineering is the discipline of experimenting on a system to build confidence in the system’s capability to withstand turbulent conditions in production. As part of Chaos team in HA, you will be working closely with partners (Azure, EXO-Exchange Online, MSR-Microsoft Research) to build the next generation of Chaos platform for Substrate. The platform will validate the resilience, architecture choices, predictability and even monitoring and incident response processes of critical components in M365 distributed systems. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Job Responsibility:
Own feature projects that directly impact behavior of High Availability component of Exchange Online (EXO) that reliably provides 4.5 9s of availability
Write production, monitoring, and test code, create reports and conduct performance analysis of storage engine, database replication, networking layer
Research Chaos experiments, identifying opportunities for testing and operational readiness of critical service components
Engage with EXO, Azure, and MSR partners to build interfaces for a modern Chaos experience, improve service resilience, improve predictability and observability of M365 distributed systems
Embody our Culture and Values
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python - OR equivalent experience
3+ years of software design and development experience with backend services
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have:
Cloud and services experience
Azure cloud experience is a plus
Experience writing services and micro-services on middle- or back-end tier
Experience with networking layer optimization and tuning, deploying and maintaining large scale cluster products, defining and testing performance characteristics of backend solutions
Analytical skills with systematic and structured approach to software design
Experience building reliable and well-tested code.