This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Site Reliability Engineer II within the APX SRE organization, you’ll focus on delivering practical, scalable solutions to support the reliability and performance of our mission-critical, cloud-native global Kubernetes platform and the services that run on it. You care deeply about system stability, clear documentation, and creating tools that improve the developer experience.
Job Responsibility:
Build robust, easy-to-use kubernetes platforms and tools that enable engineering teams to provision and operate services rapidly, consistently, and securely.
Exemplify cloud-native site reliability best practices.
Write code that is performant, maintainable, clear, and concise.
Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems.
Influence and educate the engineering organization to adopt new and improved architectural patterns.
Provide robust documentation for use by engineers to promote self-service.
Continually seek improvement within our kubernetes platform for improved reliability, operability, and cost efficiency
Take calculated risks, champion new ideas, and cultivate your craft.
Requirements:
This position involves handling of classified federal data
under federal regulations, it is open to U.S. citizens only
3+ years of applicable experience in Platform engineering, and container orchestration
Experience building platforms on clouds such as Azure and AWS
Building, operating, and innovating clustering solutions for Kubernetes platforms like AKS, EKS, or similar in production at scale
Experience with programming languages such as Python, Go, C#, Java, or similar.
Experience of code collaboration such as GitHub, ArgoCD, or similar.
Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases.
Experience using observability tools such as APM, logging, and metrics to assist with debugging issues.
Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Pulumi, or similar.
Experience designing tooling to simplify the operational management of SaaS/PaaS systems.
Familiarity with building flexible and testable Infrastructure as Code modules.
Empathy to support the needs of software engineers.