This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Senior Software Development Engineer to build, operate, and improve highly reliable, secure, and scalable software systems. In this role, you will own features and services end‑to‑end, contribute to design and architecture decisions within your problem space, and deliver high‑quality code that meets Microsoft’s standards for security, performance, and reliability. You will work closely with product managers, partner teams, and other engineers to translate customer and business requirements into well‑engineered solutions. This role is hands‑on, with strong expectations around coding, testing, live‑site ownership, and continuous improvement.
Job Responsibility:
Define and drive long‑term technical vision and strategy for AI‑assisted, cloud‑based production services, influencing direction across multiple teams or a critical platform area
Own end‑to‑end service health at scale, ensuring reliability, security, performance, and operational readiness across the full lifecycle—from architecture and design through live‑site operations
Make high‑impact architectural and investment decisions, evaluating complex trade‑offs that balance customer value, business priorities, risk, and long‑term sustainability
Establish and uphold a high bar for engineering excellence and code quality, setting standards, design patterns, and review practices adopted broadly across teams
Act as a force multiplier by mentoring engineers across levels, unblocking highly ambiguous problems, and raising overall organizational capability through technical leadership and influence
Partner closely with product, security, SRE, and operations teams, as well as senior leadership, to align engineering strategy with business objectives and customer commitments
Own live‑site strategy and operational governance, including incident response frameworks, operational reviews, and driving sustained reliability improvements through post‑incident learnings
Drive adoption and evolution of modern DevOps and operational practices to improve deployment velocity, reduce toil, and consistently meet service‑level objectives (SLOs/SLAs)
Requirements:
Demonstrated experience setting technical direction at organization or division scope, with sustained impact across multiple teams or large platform surfaces
8+ years of software engineering experience, with 4+ years owning production services in cloud/distributed systems
Proven ownership of high‑scale, cloud‑based services, with deep accountability for reliability, security, availability, and performance in production environments
Strong track record of end‑to‑end service ownership, spanning architecture, implementation, deployment, monitoring, live‑site operations, and continuous improvement
Ability to solve highly ambiguous, cross‑cutting problems with no clear owner, applying strong systems thinking, judgment, and technical depth
Demonstrated expertise in diagnosing and resolving complex production issues involving performance, availability, correctness, and multi‑service dependencies
Deep experience applying and evolving DevOps, SRE, and operational excellence practices to improve service health and reduce operational overhead at scale
Proven ability to influence senior leadership and executive stakeholders, clearly articulating technical risks, trade‑offs, and long‑term impact in business terms
Willingness to participate in escalation paths and executive‑level incident response, shaping long‑term reliability strategy rather than focusing solely on short‑term remediation