CrawlJobs Logo

Staff SRE Engineer (Platform)

phantom.app Logo

Phantom

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

200000.00 - 250000.00 USD / Year

Job Description:

Phantom is the modern money app used by tens of millions around the world. Our product combines everything people need to manage, spend, and grow their money in one simple, intuitive experience. Phantom brings all the control and flexibility of crypto-powered finance, without unnecessary complexity, into mainstream consumer finance.

Job Responsibility:

  • Kubernetes Ownership: Manage and scale Kubernetes clusters on AWS EKS, ensuring reliability, performance, and security
  • Infrastructure Automation: Implement and maintain Infrastructure-as-Code (Terraform/Pulumi) to automate infrastructure provisioning and management
  • Performance Optimization: Monitor and optimize system performance, scalability, and resource utilization
  • Blockchain Infrastructure: Configure and maintain crypto nodes across multiple blockchains to support our wallet’s operations
  • Database Scaling: Optimize and scale database infrastructure to handle terabytes of blockchain data efficiently
  • System Reliability: Continuously improve system uptime, monitoring, and observability using tools like Datadog and OpenTelemetry
  • Collaboration: Work closely with backend and product teams to support feature development and system scaling

Requirements:

  • 5+ years in a SRE or Software Engineer role
  • Strong hands-on experience with Kubernetes (EKS) in production environments
  • Proficiency with AWS infrastructure and services (EC2, S3, RDS, IAM)
  • Solid experience with Docker and Infrastructure-as-Code tools like Terraform or Pulumi
  • Monitoring and observability experience using tools like Datadog or OpenTelemetry
What we offer:
  • Competitive salary and equity
  • You will be eligible to participate in the Company's performance bonus program
  • Comprehensive insurance (medical/dental/vision) — 100% covered
  • Stipend for your ideal remote set-up
  • Flexible hours and a supportive remote environment
  • Unlimited vacation: Take time when you need it (and we really mean it!)
  • 401(k) retirement plan
  • Monthly wellness benefit
  • Weekly meal benefit
  • Global off-sites

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff SRE Engineer (Platform)

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years on cloud engineering at scale, on organizations operating SaaS solutions
  • Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
  • Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
  • Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
  • Customer focused with the ability to identify and understand both internal and external customer's needs
  • Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
  • Excellent presentation and written communication
  • Ability to deal with ambiguity, high level of pressure and rapidly changing environments
  • Engineering degree.
Job Responsibility
Job Responsibility
  • Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
  • Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
  • Perform integration of platform software components
  • Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
  • Influence and create standards & best practices in support of service level objectives
  • Automate key SRE metrics including SLOs/SLAs and error budgets
  • Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
  • Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
  • Ensure control of technical debt and promote quality practices
  • Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability
What we offer
What we offer
  • Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
  • Flexibility: A hybrid work policy
  • Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
  • Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
  • Well-being: Personal development, coaching & fitness with our dedicated partners
  • Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
  • High tech: Access to high performance office equipment and gadgets, including Apple products
  • Transport: Ledger reimburses part of your preferred means of transportation
  • Discounts: Employee discount on all our products.
  • Fulltime
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Software Engineer Staff

Designs, develops, troubleshoots and debugs software programs for software enhan...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of 10 years of professional software development experience
  • Proven expertise in one or more backend programming languages such as Golang (highly preferred), Java, Python, or C/C++
  • Deep understanding of networking protocols, network architectures, network security, and common networking concepts
  • Proven experience in designing, building, and deploying scalable microservices using Docker, Kubernetes, etc.
  • Significant experience in building, deploying, and operating scalable SaaS applications in a Public Cloud (AWS/GCP) environment
  • Strong understanding of distributed systems principles, including concurrency, scalability, fault tolerance, and consistency
  • Experience with various database technologies, including relational (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB, Redis) databases
  • Experience designing, building, and consuming RESTful APIs and other integration technologies like WebSocket, Kafka, etc.
  • Experience with network security principles, threat modelling, and secure coding practices is an added advantage
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Technical Leadership: Work with product managers, architects, and other engineers to understand the software requirements, and define corresponding functional and design specifications
  • Software Development: Design, develop, test, deploy, and maintain high-quality, production-grade software, with a strong emphasis on backend systems
  • System Design & Optimization: Design and implement micro-services for high availability, scalability, performance, and security within our SaaS platform
  • Networking Expertise: Apply deep knowledge of networking protocols (e.g., TCP/IP, HTTP/S, DNS, NAT), network security, and cloud networking concepts to build robust and secure solutions
  • SaaS & Cloud Native Development: Design and implement solutions leveraging cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Kubernetes, Docker)
  • Collaboration: Collaborate effectively with cross-functional teams including product management, QA, SRE, and Juniper technical assistance team
  • Code Quality & Best Practices: Champion best practices in software development, including code reviews, testing methodologies, CI/CD, and DevOps principles
  • Problem Solving: Troubleshoot and resolve complex technical issues in a timely and effective manner, often in production environments
  • Innovation & Research: Stay abreast of emerging technologies and industry trends in networking, SaaS, and software engineering
  • Documentation: Create and maintain comprehensive technical documentation for designs, APIs, and operational procedures
What we offer
What we offer
  • Health & Wellbeing: Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: Specific programs catered to helping you reach any career goals you have
  • Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness
  • Fulltime
Read More
Arrow Right

Software Engineer Staff

We are seeking a talented and motivated Staff Software Engineer to join our dyna...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of 10 years of professional software development experience
  • Proven expertise in one or more backend programming languages such as Golang (highly preferred), Java, Python, or C/C++
  • Deep understanding of networking protocols, network architectures, network security, and common networking concepts
  • Proven experience in designing, building, and deploying scalable microservices using Docker, Kubernetes, etc.
  • Significant experience in building, deploying, and operating scalable SaaS applications in a Public Cloud (AWS/GCP) environment
  • Strong understanding of distributed systems principles, including concurrency, scalability, fault tolerance, and consistency
  • Experience with various database technologies, including relational (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB, Redis) databases
  • Experience designing, building, and consuming RESTful APIs and other integration technologies like WebSocket, Kafka, etc.
  • Experience with network security principles, threat modelling, and secure coding practices is an added advantage
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Technical Leadership: Work with product managers, architects, and other engineers to understand the software requirements, and define corresponding functional and design specifications
  • Software Development: Design, develop, test, deploy, and maintain high-quality, production-grade software, with a strong emphasis on backend systems
  • System Design & Optimization: Design and implement micro-services for high availability, scalability, performance, and security within our SaaS platform
  • Networking Expertise: Apply deep knowledge of networking protocols (e.g., TCP/IP, HTTP/S, DNS, NAT), network security, and cloud networking concepts to build robust and secure solutions
  • SaaS & Cloud Native Development: Design and implement solutions leveraging cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Kubernetes, Docker)
  • Collaboration: Collaborate effectively with cross-functional teams including product management, QA, SRE, and Juniper technical assistance team
  • Code Quality & Best Practices: Champion best practices in software development, including code reviews, testing methodologies, CI/CD, and DevOps principles
  • Problem Solving: Troubleshoot and resolve complex technical issues in a timely and effective manner, often in production environments
  • Innovation & Research: Stay abreast of emerging technologies and industry trends in networking, SaaS, and software engineering
  • Documentation: Create and maintain comprehensive technical documentation for designs, APIs, and operational procedures
What we offer
What we offer
  • Health & Wellbeing: Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: Programs catered to helping you reach any career goals
  • Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right
New

Principal Site Reliability Engineer

Arcadia’s customers rely on us to securely process and deliver high-value health...
Location
Location
Salary
Salary:
Not provided
themuse.com Logo
The Muse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in SRE, platform engineering, systems engineering, or related roles operating production services at scale
  • Demonstrated principal-level impact: leading cross-team initiatives, influencing architecture decisions, and driving sustained improvements in reliability and operations
  • Expertise in Kubernetes operations and troubleshooting, including safe rollout/rollback patterns, workload debugging, and operational guardrails
  • Strong GitOps experience with Argo CD
  • experience building delivery workflows and automation using Argo Workflows
  • Strong infrastructure orchestration and provisioning experience with Crossplane and Terraform
  • ability to define reusable platform patterns and controls
  • Deep AWS experience (IAM, networking/VPC, compute, storage, managed services, observability) and strong understanding of reliability and failure modes in cloud systems
  • Proficiency in Python for building automation, tooling, and reliability improvements
  • Strong incident management and on-call leadership experience, including measurable improvements (availability, MTTR, alert quality, cost, or operational maturity)
Job Responsibility
Job Responsibility
  • Act as the technical leader for reliability for one or more domains
  • set direction and standards while remaining hands-on where it matters most
  • Drive reliability strategy across critical services: define SLOs/SLIs, error budgets, and reliability KPIs aligned to customer journeys and outcomes
  • Own incident response maturity: lead complex incidents, improve incident command practices, and ensure high-quality RCAs with prioritized, tracked remediation
  • Architect and implement automation to reduce toil and risk: runbook automation, self-service tools, and safe operational workflows (Python + Argo Workflows)
  • Advance GitOps delivery practices using Argo CD: promotion strategies, progressive delivery/canaries, and guardrails that reduce deploy risk
  • Scale infrastructure management with Crossplane and Terraform: reusable patterns, policy controls, and paved roads for teams
  • Lead operational readiness and reliability reviews for new features/architectural changes
  • reinforce non-functional requirements (availability, latency, security, cost)
  • Improve performance and cost efficiency through capacity planning, load testing, right-sizing, and architecture recommendations across AWS services
What we offer
What we offer
  • Pet Insurance
  • Health Insurance
  • Dental Insurance
  • Vision Insurance
  • FSA
  • HSA
  • HSA With Employer Contribution
  • Life Insurance
  • Short-Term Disability
  • Long-Term Disability
Read More
Arrow Right

Staff Site Reliability Engineer

As a Staff Site Reliability Engineer, you will be a technical leader and strateg...
Location
Location
Singapore; Australia , Singapore; Melbourne
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in SRE, DevOps, or infrastructure engineering roles, with progressive responsibility
  • Proven ability to lead SRE strategy and execution for large-scale, complex, cross-functional projects
  • Deep expertise with cloud platforms (AWS/GCP), Kubernetes, container orchestration, observability, and incident response frameworks
  • Strong experience supporting production systems with stringent high availability, compliance, and security requirements
  • Demonstrated leadership in mentoring and growing technical teams
  • Excellent collaboration and communication skills, able to influence stakeholders at all levels
  • Degree in Computer Science or related field
Job Responsibility
Job Responsibility
  • Drive the strategic vision and roadmap for Site Reliability Engineering at Airwallex, aligned with business objectives and product goals
  • Architect and oversee the implementation of highly scalable, secure, and resilient cloud infrastructure for new services and platform-wide initiatives
  • Lead and mentor senior engineers and cross-functional teams in reliability engineering best practices, automation, and incident management
  • Champion and evolve operational excellence through advanced observability, SLO management, runbooks, and proactive risk mitigation
  • Lead incident response for high-severity incidents, facilitating post-mortems and driving continuous improvements
  • Collaborate closely with Product, Engineering, Security, and DevOps leadership to ensure compliance, resilience, and alignment across functions
  • Influence and shape engineering culture around reliability, scalability, and DevOps principles across multiple teams
  • Advocate for innovation in tooling, automation, and infrastructure to improve developer productivity and service uptime
  • Fulltime
Read More
Arrow Right

Staff Infrastructure Security Engineer

Crusoe’s mission is to accelerate the abundance of energy and intelligence. We’r...
Location
Location
United States , San Francisco
Salary
Salary:
210000.00 - 265000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of hands-on experience in infrastructure engineering, SRE, or security engineering
  • Deep understanding of security principles across the stack, from Linux and container runtimes to cloud control planes
  • Proven experience using Infrastructure-as-Code (Terraform) to manage complex, multi-environment infrastructure at scale
  • Strong knowledge of cryptography, secrets management, PKI, and modern authentication standards
  • Experience securing public cloud (AWS, GCP) and/or bare-metal environments
  • Strong networking fundamentals, including routing, segmentation, firewalls, and Zero Trust architectures
  • Hands-on experience with Kubernetes and container security, including secure secrets injection into microservices
  • Fluency in at least one programming language (Go or Python preferred) for automation and tooling
Job Responsibility
Job Responsibility
  • Architecting security controls across compute, networking, and storage layers of a global cloud platform
  • Championing Infrastructure-as-Code (IaC) standards (e.g., Terraform) to enforce secure defaults, immutability, and drift detection
  • Building automated security guardrails embedded directly into CI/CD and deployment pipelines
  • Collaborating on a centralized Vault-as-a-Platform service to manage secrets, encryption keys, and internal PKI
  • Designing and operating certificate lifecycles (X.509, SSH) to support secure machine-to-machine trust
  • Driving adoption of short-lived, Just-In-Time (JIT) access models to reduce standing privileges and improve auditability
  • Securing core network foundations, including global DNS architecture, service discovery, and network authentication systems
  • Designing and maintaining authentication controls for network infrastructure to ensure secure, monitored access
  • Partnering closely with infrastructure, platform, and SRE teams to identify and remediate security gaps in foundational systems
What we offer
What we offer
  • Bonus
  • Restricted Stock Units
  • Fulltime
Read More
Arrow Right