Principal Site Reliability Engineer (AIOps) Job at Palo Alto Networks (Santa Clara)

Job Description

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability. Our stack includes Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Vault, Gitlab, Spinnaker, Tensorflow, Datadog, Elasticsearch, Kafka, Hadoop, MySQL, Percona, MongoDB, Python, and Go. We don’t expect you to know all these, but we do expect you to learn the ones needed for this role.

Job Responsibility

Contribute to the success of SRE and DevOps
Develop expertise in new technologies
Work with developers, researchers, data scientists, and security experts
Design, build and operate reliable, secure Cloud infrastructure
Ensure that applications are production-ready, scalable, and reliable
Develop tools and automation frameworks
Automate robust deployment of robust services
Orchestrate end-to-end monitoring and alerting
Participate with SRE and Dev teams in the on-call rotation
Lead root cause analysis of critical business and production issues
Mentor and champion SRE culture
Participate in design reviews

Requirements

BS or MS in Computer Science, a related field, or equivalent professional experience
Expertise in configuration management with a framework such as Ansible, Terraform, Helm
Experience in Production Engineering, DevOps, or Site Reliability
Expertise in private or public cloud
Strong Linux administration, internals, and network troubleshooting
Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
Familiarity with CI/CD pipelines, GitLab and GitHub preferred
Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
Excellent written and verbal communication, able to collaborate and rally support
Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
Passion for infrastructure and monitoring as code
Ready to understand and dissect new technology stacks quickly

Nice to have

GitLab
GitHub

What we offer

restricted stock units
bonus
employee benefits

Palo Alto Networks - All Job Offers

Select Country

Principal Site Reliability Engineer (AIOps)

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Principal Site Reliability Engineer (AIOps)

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer (AI-first SRE)

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer (AI-first SRE)

Principal Solutions Consultant

Environmental Specialist

Our AI answers in your language