This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Azure Kubernetes Service (AKS) team is responsible for running Kubernetes at global cloud scale. On AKS, millions of containers are started, healed, and routed to serve production traffic every day. The team delivers essential control-plane and data-plane capabilities, and the work directly impacts reliability, performance, and developer productivity for customers around the world. As a Senior Software Engineer on Azure Kubernetes Service, you will design, build, and operate cloud services that provision, upgrade, secure, and monitor Kubernetes clusters across global infrastructure. This role involves working across distributed systems, networking, storage, and platform automation to deliver resilient customer experiences. It offers opportunities to grow your expertise in large-scale systems, deepen your knowledge of Kubernetes and cloud engineering, and strengthen your skills in Site Reliability Engineering (SRE) practices. Flexible work arrangements are supported, including hybrid and partial remote options. This position is ideal for individuals interested in building scalable, secure, and reliable cloud-native solutions. You will collaborate with a diverse team to solve complex technical challenges and contribute to the evolution of Microsoft Azure’s container orchestration capabilities. The work is impactful, fast-paced, and aligned with the needs of developers and enterprises worldwide.
Job Responsibility:
Collaborate with product managers, architects, and partner teams to clarify scenarios and user requirements for AKS features and platform investments.
Drive design for new or improved AKS components (e.g., cluster lifecycle, upgrades, networking/CNI, storage/CSI, policy, security, observability) including dependency mapping, design docs, and API contracts.
Create, implement, optimize, and refactor production code and automation to improve reliability, performance, maintainability, and cost efficiency across control-plane and data-plane services.
Leverage subject-matter expertise in Kubernetes and Azure to plan releases, break down work, and lead execution across a workgroup
provide technical mentorship and code reviews.
Act as a Designated Responsible Individual (DRI): participate in on-call, follow runbooks/playbooks, monitor for degradation, triage incidents, communicate status, and drive mitigations/RCAs for complex issues.
Proactively adopt new patterns and technologies to improve availability, reliability, efficiency, observability, and performance
champion consistency in telemetry, alerting, and operations at scale.
Uphold security and compliance best practices (least privilege, secrets management, supply-chain security, vulnerability remediation) across services and CI/CD.
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, Golang, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
1+ year(s) experience building or operating distributed systems or cloud services in production environments, including: Microservices architecture
Remote Procedure Call (RPC) frameworks
Messaging systems
Data store technologies
1+ year(s) experience working with containerization and orchestration technologies such as Docker and Kubernetes, along with foundational Linux knowledge in: Networking
Process management
Storage systems
1+ year(s) experience owning services in production environments, including: On-call responsibilities or Designated Responsible Individual (DRI) duties
Monitoring and incident response
Post-incident analysis and continuous improvement
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Nice to have:
Bachelor's Degree in Computer Science OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, Golang, C, C++, C#, Java, JavaScript, or Python OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, Golang, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
1+ year(s) experience with systems programming and container orchestration, including: Proficiency in Go (Golang) and/or C# for cloud services
Familiarity with Kubernetes internals such as controllers, webhooks, Custom Resource Definitions (CRDs), scheduler, and kubelet
Knowledge of cloud networking and storage technologies including Container Network Interface (CNI), load balancers, virtual networks (VNETs), Domain Name System (DNS), Ingress, Container Storage Interface (CSI), disks/files, and snapshots
Experience with infrastructure-as-code tools such as Azure Resource Manager (ARM), Bicep, and Terraform, and continuous integration/continuous delivery (CI/CD) pipelines
1+ year(s) experience applying reliability engineering practices, including: Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
Chaos and upgrade testing
Capacity and performance tuning.Telemetry pipelines and observability tools such as Kusto, Prometheus, and Grafana