Staff SRE Engineer (Platform) Job at Phantom

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...

Location

United States , Santa Clara

Salary:

126000.00 - 203500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
Strong problem-solving skills and ability to work across teams

Job Responsibility

Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
Lead improvements across production systems, including performance, availability, and incident response
Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
Partner with development teams to improve system reliability, observability, and cloud-native design patterns
Define and implement monitoring, alerting, and observability strategies across distributed systems
Lead incident response efforts, including root cause analysis and long-term remediation strategies
Identify and eliminate operational toil through automation and system improvements
Mentor engineers and contribute to raising the bar for production engineering practices

What we offer

restricted stock units
bonus

Fulltime

Staff Engineer, Site Reliability Engineer

OnStar is a cornerstone of General Motors' connected services—bringing safety, s...

Location

Ireland , Dublin

Salary:

Not provided

General Motors

Expiration Date

Until further notice

Requirements

8+ years in SRE, DevOps, or systems engineering, including experience managing or mentoring high-impact teams
Track record of building and maintaining high-scale, cloud-native systems (preferably AWS, GCP, or Azure)
Expertise in container orchestration and deployment strategies using Kubernetes and CI/CD pipelines
Proficiency in Python, Go, or Java, with strong code review and readability standards
Experience leading cross-functional infrastructure projects, configuration strategy, or organizational tooling initiatives
Ability to think and act under pressure
Strong communication skills

Job Responsibility

Lead the design and implementation of scalable, fault-tolerant, and observable infrastructure supporting OnStar mobile and web experiences, in-vehicle services, and the backend platforms and integrations that power them
Champion configuration management, infrastructure refactoring, and testing frameworks to strengthen system resilience
Partner across SRE, development, and product teams to improve service reliability, deployment safety, and incident response practices
Drive internal consultation and strategic planning on reliability standards for new OnStar capabilities, customer-facing releases, and platform initiatives
Define and evolve observability strategy using tools such as Prometheus, Grafana, and Datadog, with automated alerting and actionable SLO dashboards
Own and improve on-call practices, manage blameless postmortems, and guide root cause analysis to eliminate recurring failures
Mentor engineers and help shape a high-performance culture rooted in extreme ownership and operational excellence
Support compliance and privacy-driven engineering initiatives across connected services, with potential crossover into areas like data retention and safety certification tooling

Fulltime

Sr Staff Engineer Software (Prisma AIRS - Runtime BackEnd)

As a Senior Staff Software Engineer on the Prisma AIRS Runtime Security team, yo...

Location

United States , Santa Clara

Salary:

126000.00 - 204500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science or a related field with 5+ years of experience, or a Master's degree with 3+ years of experience, or a PhD
Expertise in building scalable distributed systems with excellent Python or Golang programming skills
Proven experience with modern backend frameworks, relational databases (SQL), and cloud platforms, specifically GCP (Google Cloud Platform)
Demonstrated ability to work collaboratively with senior and junior engineers in a dynamic, fast-paced environment
Experience with container platforms like Kubernetes, CI/CD pipelines (GitLab pipeline, ArgoCD), observability and monitoring solution like Grafana and Prometheus

Job Responsibility

Lead cross-functionally with Product Management, SRE, Software, and Quality Engineering teams to deliver new security as a service offerings in a timely fashion
Analyze and solve complex problems by evaluating requirements and applying advanced engineering techniques to achieve high-quality results
Proactively identify problems and opportunities, proposing and developing simple, attainable solutions to enhance the team's development process and product quality
Evangelize and implement engineering best practices, including test-driven development, spec driven development within the team
Lead the architectural design and implementation of new features, ensuring the scalability, performance, and maintainability of the backend codebase
Mentor junior engineers, fostering a culture of technical excellence and continuous learning within the team

What we offer

Restricted stock units
Bonus

Fulltime

Site Reliability Engineer Staff

Site Reliability Engineer Staff. This role has been designed as 'Hybrid' with an...

Location

United States , San Juan

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Minimum of 4 years of hands-on experience in Infra Ops, Dev Ops, or Site Reliability Engineering (SRE)
Proficiency with Linux systems, especially Debian-based distributions
Strong experience with cloud platforms such as AWS and GCP
Expertise in Infrastructure as Code tools like Terraform, Packer, and Ansible
Solid programming skills in Python and/or Golang
Deep understanding of containerization (Docker, Container) and orchestration tools (AWS EKS, GCP GKE)
Experience with GitOps workflows
Proven track record in implementing and maintaining CI/CD pipelines
Strong background in security and familiarity with security programs
Experience with monitoring and logging tools (Prometheus, Grafana, ELK)

Job Responsibility

Enhance Infrastructure as Code (IAC) and enforce best practices
Optimize cloud infrastructure for scalability, security, and cost-effectiveness
Develop internal tools to support and streamline cloud platform operations
Improve CI/CD pipelines and deployment workflows using FluxCD and Jenkins
Address container image vulnerabilities and standardize remediation processes
Build Amazon Machine Images (AMIs) aligned with CIS and STIG benchmarks
Strengthen monitoring, alerting, and observability using Prometheus, Grafana, and logging tools
Troubleshoot complex production issues to ensure system reliability and customer satisfaction
Fine-tune distributed systems such as Apache Kafka and Cassandra
Collaborate with development, security, and operations teams to align infrastructure with application needs

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

Senior Staff Engineer Software

We are seeking a highly motivated Senior Software Engineer to lead and grow deve...

Location

India , Bengaluru

Salary:

Not provided

Palo Alto Networks

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
5+ years of experience in backend software development, with a strong track record of designing and delivering scalable, distributed systems
Expertise in Java, Go and/or Python for backend development
Strong understanding of software architecture principles, including microservices, event-driven architecture, and distributed systems patterns
Proven experience with system design, data modeling, and API design (RESTful, gRPC)

Job Responsibility

Lead the design and implementation of significant features and components within complex backend systems and microservices for our Prisma Cloud & Cortex Cloud
Develop and implement high-quality, resilient, and scalable backend services primarily using Go, Java, and Python
Drive technical design discussions and decisions for specific features, ensuring solutions meet with overall architectural vision
Collaborate closely with cross-functional teams, including product management, frontend engineers, security researchers, and SRE, to define, design, and ship new features and platform enhancements
Contribute to defining and promoting best practices for backend development, testing, and deployment within the organization, particularly for cloud-native security solutions
Analyze and resolve complex technical challenges and production issues, ensuring the reliability and performance of Prisma Cloud & Cortex Cloud platforms
Actively participate in code reviews, design reviews, and architectural reviews
Stay up-to-date of emerging technologies and industry trends in cloud and backend development, evaluating and recommending their adoption where appropriate

Fulltime

Senior Staff Engineer Software

We are seeking a highly motivated Senior Software Engineer to lead and grow deve...

Location

India , Bengaluru

Salary:

Not provided

Palo Alto Networks

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
5+ years of experience in backend software development, with a strong track record of designing and delivering scalable, distributed systems
Expertise in Java, Go and/or Python for backend development
Strong understanding of software architecture principles, including microservices, event-driven architecture, and distributed systems patterns
Proven experience with system design, data modeling, and API design (RESTful, gRPC)

Job Responsibility

Lead the design and implementation of significant features and components within complex backend systems and microservices for our Prisma Cloud & Cortex Cloud
Develop and implement high-quality, resilient, and scalable backend services primarily using Go, Java, and Python
Drive technical design discussions and decisions for specific features, ensuring solutions meet with overall architectural vision
Collaborate closely with cross-functional teams, including product management, frontend engineers, security researchers, and SRE, to define, design, and ship new features and platform enhancements
Contribute to defining and promoting best practices for backend development, testing, and deployment within the organization, particularly for cloud-native security solutions
Analyze and resolve complex technical challenges and production issues, ensuring the reliability and performance of Prisma Cloud & Cortex Cloud platforms
Actively participate in code reviews, design reviews, and architectural reviews
Stay up-to-date of emerging technologies and industry trends in cloud and backend development, evaluating and recommending their adoption where appropriate

Fulltime

Senior Staff Engineer, Hybrid Cloud Fabric

Become a key player in GEICO's tech transformation! We are seeking a Senior or S...

Location

United States , Palo Alto; Dallas; Chevy Chase; Seattle

Salary:

120000.00 - 260000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Service mesh expertise (dev): familiar with mesh architecture, components, and configuration options, including advanced traffic management, security policies, and telemetry customization
Service mesh experience (ops): designed, implemented, and managed service mesh solutions at scale, addressing challenges related to performance, security, and observability
Programming skills: Experience with Go is a must
Rust is a bonus
Linux OS: In-depth knowledge of Linux operating systems, including performance tuning, troubleshooting, and security best practices
Networking: Advanced understanding of networking concepts and tools (e.g., iptables, netfilter, traffic shaping) for analyzing and optimizing service mesh performance within the hybrid cloud environment
Kubernetes and containerization: Extensive experience with Kubernetes and container orchestration platforms, including networking, security, and service management
Microservices architecture: Deep understanding of microservices design patterns, service discovery mechanisms, API gateways, and distributed tracing
Observability and monitoring: Expertise in tools like Prometheus, Grafana, Jaeger, and Kiali to monitor service mesh performance and troubleshoot issues
Security best practices: Knowledge of zero-trust security principles, authentication and authorization mechanisms, and encryption technologies within the context of service mesh

Job Responsibility

Design and implement a robust service mesh architecture, encompassing traffic management, security, observability, and resilience for microservices across public and private clouds within our on-premises data centers
Integrate the service mesh with existing infrastructure and applications, ensuring seamless operation and interoperability with various platforms and technologies, including legacy systems
Establish and enforce service mesh best practices, including security policies, traffic routing rules, circuit breakers, and access control mechanisms, to maintain a secure and reliable application environment
Develop comprehensive monitoring and observability dashboards to provide deep insights into service mesh health, performance, and potential issues, enabling proactive problem identification and resolution
Guide and mentor engineers on service mesh principles and best practices, fostering knowledge sharing and expertise development within the team, empowering them to contribute effectively to the service mesh implementation
Work closely with networking and security teams to ensure secure and efficient integration of the service mesh with on-premises infrastructure and networks, addressing potential challenges and ensuring smooth operation
Partner with SREs to establish service mesh observability, monitoring, and alerting strategies for maintaining high availability and performance, collaborating to define SLOs, SLIs, and error budgets
Actively engage with the Istio community, contribute to open-source projects, and represent GEICO's leadership in service mesh adoption

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Engineer, Hybrid Cloud Fabric

Become a key player in GEICO's tech transformation! We are seeking a Senior or S...

Location

United States , Palo Alto; Washington; Dallas; Seattle

Salary:

110000.00 - 230000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Service mesh expertise (dev): familiar with mesh architecture, components, and configuration options
Service mesh experience (ops): designed, implemented, and managed service mesh solutions at scale
Programming skills: Experience with Go is a must
Rust is a bonus
Linux OS: In-depth knowledge of Linux operating systems
Networking: Advanced understanding of networking concepts and tools
Kubernetes and containerization: Extensive experience with Kubernetes and container orchestration platforms
Microservices architecture: Deep understanding of microservices design patterns
Observability and monitoring: Expertise in tools like Prometheus, Grafana, Jaeger, and Kiali
Security best practices: Knowledge of zero-trust security principles

Job Responsibility

Design and implement a robust service mesh architecture
Integrate the service mesh with existing infrastructure and applications
Establish and enforce service mesh best practices
Develop comprehensive monitoring and observability dashboards
Guide and mentor engineers on service mesh principles and best practices
Work closely with networking and security teams
Partner with SREs to establish service mesh observability, monitoring, and alerting strategies
Actively engage with the Istio community, contribute to open-source projects, and represent GEICO's leadership in service mesh adoption

What we offer

Comprehensive Total Rewards program
401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
tuition assistance
mental healthcare
fertility and adoption assistance
workplace flexibility
GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Select Country

Staff SRE Engineer (Platform)

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?