Senior Staff Site Reliability Engineer Job at Palo Alto Networks Italia (Tel Aviv)

Senior Staff Site Reliability Engineer

Fivetran is looking for a high-performance, experienced engineer to be a part of...

Location

India , Bengaluru

Salary:

Not provided

Fivetran

Expiration Date

Until further notice

Requirements

12+ years of experience working with SaaS products at scale
Working knowledge of managed Kubernetes (EKS, AKS and GKE)
Knowledge of Cloud Platforms and related tooling: AWS, Azure, Google Cloud (GCP), Terraform, Ansible, Buildkite, Pulumi and ArgoCD
Experience in Python/Shell scripting and Go Language. Bonus if you have Java
Experience with Linux operating systems internals and administration
Experience with cloud networking like Site-to-Site VPNs, Privatelinks and Private Service connect (GCP)

Job Responsibility

Responsible for ongoing reliability and robustness of Fivetran’s production infrastructure by monitoring availability, capacity, and throughput
Evolve systems by adding reliability into our product roadmap
Coordinate the re-prioritize or fix critical bugs for support or sales requirements as needed
Make recommendations to production infrastructure by interfacing with engineering to ensure 100% availability
Ensure scalable artifacts deployment to all environments by automation scripts
Constantly monitor infrastructure vulnerabilities and remedy them by working with the security team

What we offer

100% employer-paid medical insurance
Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
RSU stock grants
Professional development and training opportunities
Company virtual happy hours, free food, and fun team-building activities
Monthly cell phone stipend
Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents

Fulltime

Staff Site Reliability Engineer

As a Staff Site Reliability Engineer, you will be a technical leader and strateg...

Location

Singapore; Australia , Singapore; Melbourne

Salary:

Not provided

Airwallex

Expiration Date

Until further notice

Requirements

10+ years of experience in SRE, DevOps, or infrastructure engineering roles, with progressive responsibility
Proven ability to lead SRE strategy and execution for large-scale, complex, cross-functional projects
Deep expertise with cloud platforms (AWS/GCP), Kubernetes, container orchestration, observability, and incident response frameworks
Strong experience supporting production systems with stringent high availability, compliance, and security requirements
Demonstrated leadership in mentoring and growing technical teams
Excellent collaboration and communication skills, able to influence stakeholders at all levels
Degree in Computer Science or related field

Job Responsibility

Drive the strategic vision and roadmap for Site Reliability Engineering at Airwallex, aligned with business objectives and product goals
Architect and oversee the implementation of highly scalable, secure, and resilient cloud infrastructure for new services and platform-wide initiatives
Lead and mentor senior engineers and cross-functional teams in reliability engineering best practices, automation, and incident management
Champion and evolve operational excellence through advanced observability, SLO management, runbooks, and proactive risk mitigation
Lead incident response for high-severity incidents, facilitating post-mortems and driving continuous improvements
Collaborate closely with Product, Engineering, Security, and DevOps leadership to ensure compliance, resilience, and alignment across functions
Influence and shape engineering culture around reliability, scalability, and DevOps principles across multiple teams
Advocate for innovation in tooling, automation, and infrastructure to improve developer productivity and service uptime

Fulltime

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...

Location

Finland , Helsinki

Salary:

Not provided

AlphaSense

Expiration Date

Until further notice

Requirements

8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
at least 3+ of those years operating in a Senior+ SRE position
Strong background in running production SaaS systems at scale
Proficiency in at least one programming/scripting language (Python, Go, or similar)
Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
Familiarity with advanced observability (OTEL, continuous profiling)
Proven incident management experience, including leading high-severity incidents and postmortems
Strong troubleshooting skills across the full stack

Job Responsibility

Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services
Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...

Location

United States , Santa Clara

Salary:

126000.00 - 203500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
Strong problem-solving skills and ability to work across teams

Job Responsibility

Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
Lead improvements across production systems, including performance, availability, and incident response
Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
Partner with development teams to improve system reliability, observability, and cloud-native design patterns
Define and implement monitoring, alerting, and observability strategies across distributed systems
Lead incident response efforts, including root cause analysis and long-term remediation strategies
Identify and eliminate operational toil through automation and system improvements
Mentor engineers and contribute to raising the bar for production engineering practices

What we offer

restricted stock units
bonus

Fulltime

Senior Staff Engineer, NPI

Senior Engineer, Global NPI, provides technical leadership and execution oversig...

Location

United States , Shakopee

Salary:

Not provided

CommScope

Expiration Date

Until further notice

Requirements

Bachelor's Degree in related curriculum such as Engineering with 8-10 years of work experience in relevant or related field
Proficient in FBC, SAP, Lean Six Sigma, SharePoint

Job Responsibility

Lead all phases of New Product Introduction (NPI), from concept through production launch
Define manufacturing strategies to ensure scalability, yield, and cost targets are achieved
Ensure alignment between product design and manufacturing through advanced DFM guidance
Drives technical leadership in optical test, interferometry, and continuity verification
develops and implements advanced processes leveraging emerging technologies and external partners, ensuring scalable deployment across sites without compromising quality or customer performance
Leads functional business analysis for Fiber Barcode (FBC) systems
defines software requirements and strategy to ensure product compliance, drives development activities, and validates system performance for robustness and reliability
Provides direction across all NPI phases, ensuring execution aligns with PDP requirements, procedures, and regulatory standards
coordinates with R&D and PLM on schedules, resources, and capital planning while driving process capability, yield, and cost performance through PFMEA, VSM, and throughput analysis
Specifies and leads implementation of automation equipment for fiber manufacturing systems, including concept, design, build coordination (internal and external), validation, and operator readiness

Fulltime

Principal Site Reliability Engineer

Arcadia’s customers rely on us to securely process and deliver high-value health...

Location

Salary:

Not provided

The Muse

Expiration Date

Until further notice

Requirements

8+ years of experience in SRE, platform engineering, systems engineering, or related roles operating production services at scale
Demonstrated principal-level impact: leading cross-team initiatives, influencing architecture decisions, and driving sustained improvements in reliability and operations
Expertise in Kubernetes operations and troubleshooting, including safe rollout/rollback patterns, workload debugging, and operational guardrails
Strong GitOps experience with Argo CD
experience building delivery workflows and automation using Argo Workflows
Strong infrastructure orchestration and provisioning experience with Crossplane and Terraform
ability to define reusable platform patterns and controls
Deep AWS experience (IAM, networking/VPC, compute, storage, managed services, observability) and strong understanding of reliability and failure modes in cloud systems
Proficiency in Python for building automation, tooling, and reliability improvements
Strong incident management and on-call leadership experience, including measurable improvements (availability, MTTR, alert quality, cost, or operational maturity)

Job Responsibility

Act as the technical leader for reliability for one or more domains
set direction and standards while remaining hands-on where it matters most
Drive reliability strategy across critical services: define SLOs/SLIs, error budgets, and reliability KPIs aligned to customer journeys and outcomes
Own incident response maturity: lead complex incidents, improve incident command practices, and ensure high-quality RCAs with prioritized, tracked remediation
Architect and implement automation to reduce toil and risk: runbook automation, self-service tools, and safe operational workflows (Python + Argo Workflows)
Advance GitOps delivery practices using Argo CD: promotion strategies, progressive delivery/canaries, and guardrails that reduce deploy risk
Scale infrastructure management with Crossplane and Terraform: reusable patterns, policy controls, and paved roads for teams
Lead operational readiness and reliability reviews for new features/architectural changes
reinforce non-functional requirements (availability, latency, security, cost)
Improve performance and cost efficiency through capacity planning, load testing, right-sizing, and architecture recommendations across AWS services

What we offer

Pet Insurance
Health Insurance
Dental Insurance
Vision Insurance
FSA
HSA
HSA With Employer Contribution
Life Insurance
Short-Term Disability
Long-Term Disability

Senior Staff DevOps Engineer (Secure Cloud Access)

As part of your role, you will design, implement, and deploy products and infras...

Location

Israel , Southern District

Salary:

Not provided

Palo Alto Networks Italia

Expiration Date

Until further notice

Requirements

4+ years of experience as a DevOps engineer or Site Reliability Engineer
Deep knowledge and experience in cloud infrastructure such as AWS, Azure, or Google Cloud
Strong hands-on experience operating production workloads on AWS, with an emphasis on serverless systems (Lambda, DynamoDB, OpenSearch, S3, API Gateway, EventBridge, SQS/SNS, CloudFront, IAM, and CloudWatch)
Experience with Infrastructure as Code (IaC) tools such as Terraform, AWS CDK, or CloudFormation
Experience with containerization and orchestration technologies like Docker, Kubernetes, or ECS/ECR
Experience with CI/CD tools and configuration management systems like Jenkins, Git, or Ansible
Practical experience with high-availability design, disaster recovery planning, backups, restores, and rollbacks across multiple AWS regions
Proficiency in scripting with Bash and Python
Experience with end-to-end system ownership, including on-call participation, incident response, and root-cause analysis
Fluent in English with strong writing skills

Job Responsibility

Design and manage Continuous Integration/Deployment Services, including build, packaging, and deployment
Design, document, implement, and maintain scripts to enhance current and future build and release processes
Incorporate new development projects into existing build structures
Continually evaluate tools and technologies to improve the overall release process

Fulltime

Senior Staff DevOps Engineer (Secure Cloud Access)

As part of your role, you will design, implement, and deploy products and infras...

Location

Israel , Southern District

Salary:

Not provided

Palo Alto Networks

Expiration Date

Until further notice

Requirements

4+ years of experience as a DevOps engineer or Site Reliability Engineer
Deep knowledge and experience in cloud infrastructure such as AWS, Azure, or Google Cloud
Strong hands-on experience operating production workloads on AWS, with an emphasis on serverless systems (Lambda, DynamoDB, OpenSearch, S3, API Gateway, EventBridge, SQS/SNS, CloudFront, IAM, and CloudWatch)
Experience with Infrastructure as Code (IaC) tools such as Terraform, AWS CDK, or CloudFormation
Experience with containerization and orchestration technologies like Docker, Kubernetes, or ECS/ECR
Experience with CI/CD tools and configuration management systems like Jenkins, Git, or Ansible
Practical experience with high-availability design, disaster recovery planning, backups, restores, and rollbacks across multiple AWS regions
Proficiency in scripting with Bash and Python
Experience with end-to-end system ownership, including on-call participation, incident response, and root-cause analysis
Fluent in English with strong writing skills

Job Responsibility

Design and manage Continuous Integration/Deployment Services, including build, packaging, and deployment
Design, document, implement, and maintain scripts to enhance current and future build and release processes
Incorporate new development projects into existing build structures
Continually evaluate tools and technologies to improve the overall release process

Fulltime

Select Country

Senior Staff Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Senior Staff Site Reliability Engineer

Senior Staff Site Reliability Engineer

Staff Site Reliability Engineer

Staff Site Reliability Engineer

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

Senior Staff Engineer, NPI

Principal Site Reliability Engineer

Senior Staff DevOps Engineer (Secure Cloud Access)

Senior Staff DevOps Engineer (Secure Cloud Access)

Our AI answers in your language