CrawlJobs Logo

Senior Staff Site Reliability Engineer

paloaltonetworks.com Logo

Palo Alto Networks

Location Icon

Location:
Israel , Tel Aviv

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As a Site Reliability Engineer on the SASE Platform team, you will play a critical role in building and operating highly available, secure, and globally distributed services. Your mission is to ensure our cloud-native security and networking platform is reliable, scalable, and performant from day one, protecting the users, applications, and data for the world's largest enterprises as they adopt cloud, remote work, and AI.

Job Responsibility:

  • Proactively collaborate with development teams to embed reliability, scalability, and operability into services from the earliest design stages
  • Design, review, and evolve cloud-native architectures to improve availability, performance, cost efficiency, and fault tolerance
  • Build and operate automation for provisioning, deploying, and managing global infrastructure using Infrastructure as Code (IaC)
  • Improve CI/CD pipelines and release processes to enable safe, fast, and repeatable deployments
  • Drive observability best practices, including metrics, logs, traces, and SLIs/SLOs to enable data-driven incident analysis
  • Participate in on-call rotations, reducing mean time to resolution (MTTR) through automation and proactive reliability improvements
  • Challenge existing processes by championing reliability, security, and operational maturity across the organization

Requirements:

  • 5+ years of experience working with Unix/Linux systems, including shell, tools, networking, and kernel concepts
  • 2+ years of hands-on experience with microservices architectures running on Kubernetes and container platforms
  • Proven experience operating workloads in public cloud environments (e.g., AWS, GCP, Azure) at scale
  • Proficiency in building automation and tools in at least one scripting or programming language (e.g., Python, Go, Java)
  • Strong experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible
  • Bachelor’s degree in Engineering, Computer Science, or a related technical field, or equivalent practical experience

Nice to have:

  • Deep expertise in designing and operating monitoring, alerting, and observability systems (e.g., Prometheus, Grafana, ELK Stack)
  • Advanced networking expertise, including TCP/IP, DNS, BGP, routing, and cloud networking concepts relevant to SASE architectures
  • Prior experience operating or supporting SASE, SD-WAN, Zero Trust, or network security platforms
  • Familiarity with using AI/LLM technologies to improve operational workflows (e.g., incident analysis, automation)

Additional Information:

Job Posted:
April 12, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Staff Site Reliability Engineer

FX Applications Support Senior Analyst

This hybrid role involves working as part of the FX Applications Support team to...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem management tools
  • good all-round technical skills
  • ability to effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • provides technical and business support for users of Citi applications
  • maintains application systems running in daily operations
  • manages, maintains and supports applications and their environments
  • performs start-of-day checks, continuous monitoring, and regional handovers
  • performs same day risk reconciliations
  • develops and maintains technical support documentation
  • assesses risk and impact and escalates in a timely manner
  • ensures storage and archiving procedures are functioning correctly
  • participates in application releases, from development to post-implementation analysis
  • identifies risks, vulnerabilities and security issues
What we offer
What we offer
  • rewarding work
  • supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • Fulltime
Read More
Arrow Right

FX Applications Support Senior Analyst

As an FX Application Support Analyst, you will play a key role in running and ma...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years’ experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem management tools
  • good all-round technical skills
  • ability to effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • provides technical and business support for users of Citi Applications
  • maintains application systems that have completed development stage and are running in daily operations
  • manages, maintains and supports applications and their operating environments, focusing on stability, quality and functionality
  • start of day checks, continuous monitoring, and regional handover
  • perform same day risk reconciliations
  • develop and maintain technical support documentation
  • identifies ways to maximize potential of applications used
  • assess risk and impact of production issues and escalate to business and technology management
  • ensures storage and archiving procedures are in place and functioning correctly
  • formulates and defines scope and objectives for complex application enhancements and problem resolution
What we offer
What we offer
  • rewarding work in a supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • diverse team of professionals
  • global network of people, data and relationships
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • at least 3+ of those years operating in a Senior+ SRE position
  • Strong background in running production SaaS systems at scale
  • Proficiency in at least one programming/scripting language (Python, Go, or similar)
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • Familiarity with advanced observability (OTEL, continuous profiling)
  • Proven incident management experience, including leading high-severity incidents and postmortems
  • Strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • At least 3+ of those years operating in a Senior+ SRE position
  • Strong background in running production SaaS systems at scale
  • Proficiency in at least one programming/scripting language (Python, Go, or similar)
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • Familiarity with advanced observability (OTEL, continuous profiling)
  • Proven incident management experience, including leading high-severity incidents and postmortems
  • Strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
India , Delhi
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • at least 3+ of those years operating in a Senior+ SRE position
  • strong background in running production SaaS systems at scale
  • proficiency in at least one programming/scripting language (Python, Go, or similar)
  • hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • familiarity with advanced observability (OTEL, continuous profiling)
  • proven incident management experience, including leading high-severity incidents and postmortems
  • strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
India , Pune
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • at least 3+ of those years operating in a Senior+ SRE position
  • strong background in running production SaaS systems at scale
  • proficiency in at least one programming/scripting language (Python, Go, or similar)
  • hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • familiarity with advanced observability (OTEL, continuous profiling)
  • proven incident management experience, including leading high-severity incidents and postmortems
  • strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

As a Staff Site Reliability Engineer, you will be a technical leader and strateg...
Location
Location
Singapore; Australia , Singapore; Melbourne
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in SRE, DevOps, or infrastructure engineering roles, with progressive responsibility
  • Proven ability to lead SRE strategy and execution for large-scale, complex, cross-functional projects
  • Deep expertise with cloud platforms (AWS/GCP), Kubernetes, container orchestration, observability, and incident response frameworks
  • Strong experience supporting production systems with stringent high availability, compliance, and security requirements
  • Demonstrated leadership in mentoring and growing technical teams
  • Excellent collaboration and communication skills, able to influence stakeholders at all levels
  • Degree in Computer Science or related field
Job Responsibility
Job Responsibility
  • Drive the strategic vision and roadmap for Site Reliability Engineering at Airwallex, aligned with business objectives and product goals
  • Architect and oversee the implementation of highly scalable, secure, and resilient cloud infrastructure for new services and platform-wide initiatives
  • Lead and mentor senior engineers and cross-functional teams in reliability engineering best practices, automation, and incident management
  • Champion and evolve operational excellence through advanced observability, SLO management, runbooks, and proactive risk mitigation
  • Lead incident response for high-severity incidents, facilitating post-mortems and driving continuous improvements
  • Collaborate closely with Product, Engineering, Security, and DevOps leadership to ensure compliance, resilience, and alignment across functions
  • Influence and shape engineering culture around reliability, scalability, and DevOps principles across multiple teams
  • Advocate for innovation in tooling, automation, and infrastructure to improve developer productivity and service uptime
  • Fulltime
Read More
Arrow Right
New

Senior Staff Site Reliability Engineer

As a Site Reliability Engineer on the SASE Platform team, you will play a critic...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
paloaltonetworks.it Logo
Palo Alto Networks Italia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience working with Unix/Linux systems, including shell, tools, networking, and kernel concepts
  • 2+ years of hands-on experience with microservices architectures running on Kubernetes and container platforms
  • Proven experience operating workloads in public cloud environments (e.g., AWS, GCP, Azure) at scale
  • Proficiency in building automation and tools in at least one scripting or programming language (e.g., Python, Go, Java)
  • Strong experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible
  • Bachelor’s degree in Engineering, Computer Science, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Proactively collaborate with development teams to embed reliability, scalability, and operability into services from the earliest design stages
  • Design, review, and evolve cloud-native architectures to improve availability, performance, cost efficiency, and fault tolerance
  • Build and operate automation for provisioning, deploying, and managing global infrastructure using Infrastructure as Code (IaC)
  • Improve CI/CD pipelines and release processes to enable safe, fast, and repeatable deployments
  • Drive observability best practices, including metrics, logs, traces, and SLIs/SLOs to enable data-driven incident analysis
  • Participate in on-call rotations, reducing mean time to resolution (MTTR) through automation and proactive reliability improvements
  • Challenge existing processes by championing reliability, security, and operational maturity across the organization
  • Fulltime
Read More
Arrow Right