This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Senior Site Reliability Engineer to lead a team that builds and operates Microsoft CISO security engineering services in highly regulated environments, including U.S. Government Cloud deployments. In this space, success requires both operational rigor and strong software engineering fundamentals, maintainable code, extendable design, robust telemetry, and disciplined lifecycle practices that make reliability a built-in feature. This role is rooted in software engineering as a reliability lever. You will work with teams that deliver production code, automation, and self-healing capabilities, and partner with feature engineering teams to bake in reliability, diagnosability, security, and compliance from design through operations. You will help operate and evolve large-scale enterprise applications, and multi-petabyte data platforms where availability, resilience, and uptime are mission critical. You will amplify impact by developing engineers, setting up reliability strategies, and influencing how services are built and run across organizational boundaries.
Job Responsibility
Write secure, high-quality code that is maintainable, scalable, and performant
Architect, implement, and optimize hybrid and cloud infrastructure using Infrastructure as Code (e.g., Containers, Bicep, Terraform, AKS etc.) to improve availability, scale, security, and operational efficiency
Design and implement data governance, storage, backup, and disaster recovery for a multi-petabyte Azure environment, ensuring integrity, security, and performance
Build and operate large-scale data pipelines and data transformations to support analytics, governance, and operational needs
Evaluate emerging engineering tools and practices and incorporate them into the roadmap to continuously improve efficiency, reliability, and scale
Deliver automation to improve service health, manageability, reliability, telemetry, and alerting, with a focus on resiliency
Create and maintain clear technical documentation and design specifications aligned with best practices
Partner with engineering, project management, and operations to evolve services and optimize infrastructure in support of organizational goals
Participate in an on-call rotation to operate live services
troubleshoot and mitigate complex issues, escalate as needed, and write post-incident reviews to share learnings
Identify opportunities for automation using scripts, pipelines, policy‑driven guardrails, or AI‑enabled tooling to reduce manual toil and increase engineering productivity
Requirements
Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph
This position requires verification of U.S. citizenship due to citizenship-based legal restrictions
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have
Doctorate Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 6+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
4+ years of experience building, deploying, and operating containerized applications and infrastructure as code (e.g., Docker, Kubernetes, Azure Container Apps/AKS/ACI, Terraform, Azure Bicep, ARM templates)
4+ years of experience writing and maintaining scripts for deployment, orchestration, and automation (e.g., PowerShell, Python, Bash)
Experience working with large datasets, data pipelines, and data transformation patterns (batch and/or streaming)
Experience with one or more major cloud platforms (Azure, AWS, or Google Cloud)
Hands-on experience with Azure services and infrastructure (e.g., ARM templates, IaaS, VMs, Key Vault, Event Hubs, Synapse, Spark/Hadoop), or equivalent services in AWS or Google Cloud
Familiarity with data pipeline and transformation tooling (e.g., Spark, Hadoop) and operating at scale
Familiarity with large-scale Microsoft enterprise services (e.g., Microsoft 365: Exchange, SharePoint, Skype, Teams)
Familiarity with petabyte-scale datasets and building reliable data pipelines and transformations that support mission-critical services
Proficiency in at least one programming language (e.g., C# or Java) and scripting languages such as PowerShell, Bash, and Python