This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Senior Software Engineer to join our team to own and drive reliability outcomes across the stack for one of the world’s largest modern collaboration platforms — Microsoft Teams. Our team obsesses overachieving and sustaining three‑nines reliability across hundreds of high‑value customer scenarios spanning platforms, segments, and release rings. We partner deeply with feature teams and cross‑organization stakeholders to ensure product health, reduce customer impact, and prevent regression at scale. We build and operate reliability tooling and services that leverage telemetry, customer feedback, support signals, and performance data — increasingly augmented by AI — and integrate these insights directly into engineering and release processes (“shift left”) to ensure major issues are identified and fixed before reaching customers. As a team, we value continuous learning, deep technical exploration, and evidence‑based decision‑making to turn reliability opportunities into durable improvements for Microsoft customers through strong collaboration.
Job Responsibility:
Design, build, and own reliability tooling and services across Microsoft Teams, leveraging data and AI to drive actionable insights
Own end‑to‑end reliability metrics, including signal definition, instrumentation, monitoring, alerting, and ongoing metric quality
Act as a Designated Responsible Individual (DRI) for live‑site reliability, including on‑call participation, incident mitigation, post‑incident reviews, and driving long‑term corrective actions
Partner with feature teams to influence design‑for‑reliability and resiliency decisions, preventing regressions before release
Analyze telemetry and customer feedback to identify reliability gaps and trends, integrating learnings into the engineering lifecycle
Collaborate and mentor engineers across product, research, and engineering teams by sharing best practices in telemetry, feedback loops, and reliability, and by providing technical guidance and code reviews that raise the overall engineering bar
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
4+ years of experience building and operating full‑stack, production‑grade software systems at scale
2+ years of experience working with large‑scale telemetry systems and data analysis using SQL‑based query languages
Experience using modern AI‑assisted development tools such as GitHub Copilot or Claude Code to improve engineering productivity
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Hands‑on experience using modern AI‑assisted development tools such as GitHub Copilot or Claude Code to improve engineering productivity
Experience improving core fundamentals such as reliability, availability, and performance in customer‑facing systems
Proven ability to solve complex technical problems through cross‑team and cross‑organization collaboration