This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Site Reliability Engineer II within the APX SRE organization, you’ll focus on delivering practical, scalable solutions to support the reliability and performance of our mission-critical, cloud-native global Kubernetes platform and the services that run on it. You care deeply about system stability, clear documentation, and creating tools that improve the developer experience.
Job Responsibility:
Build robust, easy-to-use kubernetes platforms and tools that enable engineering teams to provision and operate services rapidly, consistently, and securely
Exemplify cloud-native site reliability best practices
Write code that is performant, maintainable, clear, and concise
Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
Influence and educate the engineering organization to adopt new and improved architectural patterns
Provide robust documentation for use by engineers to promote self-service
Continually seek improvement within our kubernetes platform for improved reliability, operability, and cost efficiency
Take calculated risks, champion new ideas, and cultivate your craft
Requirements:
U.S. citizenship (due to handling of classified federal data)
3+ years of applicable experience in Platform engineering and container orchestration
Experience building platforms on clouds such as Azure and AWS
Building, operating, and innovating clustering solutions for Kubernetes platforms like AKS, EKS, or similar in production at scale
Experience with programming languages such as Python, Go, C#, Java, or similar
Experience of code collaboration such as GitHub, ArgoCD, or similar
Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Pulumi, or similar
Experience designing tooling to simplify the operational management of SaaS/PaaS systems
Familiarity with building flexible and testable Infrastructure as Code modules
Empathy to support the needs of software engineers