CrawlJobs Logo

Staff SRE Engineer (Platform)

Phantom

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

200000.00 - 250000.00 USD / Year

Job Description:

Phantom is the modern money app used by tens of millions around the world. Our product combines everything people need to manage, spend, and grow their money in one simple, intuitive experience. Phantom brings all the control and flexibility of crypto-powered finance, without unnecessary complexity, into mainstream consumer finance.

Job Responsibility:

  • Kubernetes Ownership: Manage and scale Kubernetes clusters on AWS EKS, ensuring reliability, performance, and security
  • Infrastructure Automation: Implement and maintain Infrastructure-as-Code (Terraform/Pulumi) to automate infrastructure provisioning and management
  • Performance Optimization: Monitor and optimize system performance, scalability, and resource utilization
  • Blockchain Infrastructure: Configure and maintain crypto nodes across multiple blockchains to support our wallet’s operations
  • Database Scaling: Optimize and scale database infrastructure to handle terabytes of blockchain data efficiently
  • System Reliability: Continuously improve system uptime, monitoring, and observability using tools like Datadog and OpenTelemetry
  • Collaboration: Work closely with backend and product teams to support feature development and system scaling

Requirements:

  • 5+ years in a SRE or Software Engineer role
  • Strong hands-on experience with Kubernetes (EKS) in production environments
  • Proficiency with AWS infrastructure and services (EC2, S3, RDS, IAM)
  • Solid experience with Docker and Infrastructure-as-Code tools like Terraform or Pulumi
  • Monitoring and observability experience using tools like Datadog or OpenTelemetry
What we offer:
  • Competitive salary and equity
  • You will be eligible to participate in the Company's performance bonus program
  • Comprehensive insurance (medical/dental/vision) — 100% covered
  • Stipend for your ideal remote set-up
  • Flexible hours and a supportive remote environment
  • Unlimited vacation: Take time when you need it (and we really mean it!)
  • 401(k) retirement plan
  • Monthly wellness benefit
  • Weekly meal benefit
  • Global off-sites

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff SRE Engineer (Platform)

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years on cloud engineering at scale, on organizations operating SaaS solutions
  • Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
  • Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
  • Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
  • Customer focused with the ability to identify and understand both internal and external customer's needs
  • Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
  • Excellent presentation and written communication
  • Ability to deal with ambiguity, high level of pressure and rapidly changing environments
  • Engineering degree.
Job Responsibility
Job Responsibility
  • Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
  • Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
  • Perform integration of platform software components
  • Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
  • Influence and create standards & best practices in support of service level objectives
  • Automate key SRE metrics including SLOs/SLAs and error budgets
  • Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
  • Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
  • Ensure control of technical debt and promote quality practices
  • Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability
What we offer
What we offer
  • Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
  • Flexibility: A hybrid work policy
  • Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
  • Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
  • Well-being: Personal development, coaching & fitness with our dedicated partners
  • Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
  • High tech: Access to high performance office equipment and gadgets, including Apple products
  • Transport: Ledger reimburses part of your preferred means of transportation
  • Discounts: Employee discount on all our products.
  • Fulltime
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Software Engineer Staff

Designs, develops, troubleshoots and debugs software programs for software enhan...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of 10 years of professional software development experience
  • Proven expertise in one or more backend programming languages such as Golang (highly preferred), Java, Python, or C/C++
  • Deep understanding of networking protocols, network architectures, network security, and common networking concepts
  • Proven experience in designing, building, and deploying scalable microservices using Docker, Kubernetes, etc.
  • Significant experience in building, deploying, and operating scalable SaaS applications in a Public Cloud (AWS/GCP) environment
  • Strong understanding of distributed systems principles, including concurrency, scalability, fault tolerance, and consistency
  • Experience with various database technologies, including relational (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB, Redis) databases
  • Experience designing, building, and consuming RESTful APIs and other integration technologies like WebSocket, Kafka, etc.
  • Experience with network security principles, threat modelling, and secure coding practices is an added advantage
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Technical Leadership: Work with product managers, architects, and other engineers to understand the software requirements, and define corresponding functional and design specifications
  • Software Development: Design, develop, test, deploy, and maintain high-quality, production-grade software, with a strong emphasis on backend systems
  • System Design & Optimization: Design and implement micro-services for high availability, scalability, performance, and security within our SaaS platform
  • Networking Expertise: Apply deep knowledge of networking protocols (e.g., TCP/IP, HTTP/S, DNS, NAT), network security, and cloud networking concepts to build robust and secure solutions
  • SaaS & Cloud Native Development: Design and implement solutions leveraging cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Kubernetes, Docker)
  • Collaboration: Collaborate effectively with cross-functional teams including product management, QA, SRE, and Juniper technical assistance team
  • Code Quality & Best Practices: Champion best practices in software development, including code reviews, testing methodologies, CI/CD, and DevOps principles
  • Problem Solving: Troubleshoot and resolve complex technical issues in a timely and effective manner, often in production environments
  • Innovation & Research: Stay abreast of emerging technologies and industry trends in networking, SaaS, and software engineering
  • Documentation: Create and maintain comprehensive technical documentation for designs, APIs, and operational procedures
What we offer
What we offer
  • Health & Wellbeing: Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: Specific programs catered to helping you reach any career goals you have
  • Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness
  • Fulltime
Read More
Arrow Right

Software Engineer Staff

We are seeking a talented and motivated Staff Software Engineer to join our dyna...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of 10 years of professional software development experience
  • Proven expertise in one or more backend programming languages such as Golang (highly preferred), Java, Python, or C/C++
  • Deep understanding of networking protocols, network architectures, network security, and common networking concepts
  • Proven experience in designing, building, and deploying scalable microservices using Docker, Kubernetes, etc.
  • Significant experience in building, deploying, and operating scalable SaaS applications in a Public Cloud (AWS/GCP) environment
  • Strong understanding of distributed systems principles, including concurrency, scalability, fault tolerance, and consistency
  • Experience with various database technologies, including relational (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB, Redis) databases
  • Experience designing, building, and consuming RESTful APIs and other integration technologies like WebSocket, Kafka, etc.
  • Experience with network security principles, threat modelling, and secure coding practices is an added advantage
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Technical Leadership: Work with product managers, architects, and other engineers to understand the software requirements, and define corresponding functional and design specifications
  • Software Development: Design, develop, test, deploy, and maintain high-quality, production-grade software, with a strong emphasis on backend systems
  • System Design & Optimization: Design and implement micro-services for high availability, scalability, performance, and security within our SaaS platform
  • Networking Expertise: Apply deep knowledge of networking protocols (e.g., TCP/IP, HTTP/S, DNS, NAT), network security, and cloud networking concepts to build robust and secure solutions
  • SaaS & Cloud Native Development: Design and implement solutions leveraging cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Kubernetes, Docker)
  • Collaboration: Collaborate effectively with cross-functional teams including product management, QA, SRE, and Juniper technical assistance team
  • Code Quality & Best Practices: Champion best practices in software development, including code reviews, testing methodologies, CI/CD, and DevOps principles
  • Problem Solving: Troubleshoot and resolve complex technical issues in a timely and effective manner, often in production environments
  • Innovation & Research: Stay abreast of emerging technologies and industry trends in networking, SaaS, and software engineering
  • Documentation: Create and maintain comprehensive technical documentation for designs, APIs, and operational procedures
What we offer
What we offer
  • Health & Wellbeing: Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: Programs catered to helping you reach any career goals
  • Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Ai Azure Enterprise Automation Engineer

Baptist Health Information Services is looking for an Enterprise Automation Engi...
Location
Location
United States , Jacksonville
Salary
Salary:
Not provided
baptistjax.com Logo
Baptist Health (Florida)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree or Equivalent Experience
  • Over 5 years of Information Technology Experience Required
  • Experience designing or implementing AI-driven automation agents that support IT operations, observability, or cloud management by autonomously identifying and resolving issues
  • Familiarity with Large Language Model (LLM) integration (e.g., OpenAI, Claude, Gemini) for code generation, decision support, or infrastructure recommendations
  • Exposure to multi-agent orchestration frameworks such as LangChain, AutoGen, or Microsoft Autonomous Agents for coordinating complex, layered workflows
  • Integration of AI agents into DevOps workflows or incident response tooling
  • Understanding of prompt engineering, retrieval-augmented generation (RAG), or vector database utilization (e.g., Azure Cognitive Search, Weaviate) in the context of enterprise systems
  • Contributions to open-source automation or AI platforms that demonstrate thought leadership or technical innovation
  • Familiarity with healthcare IT standards and constraints (e.g., HIPAA compliance, identity management in clinical workflows) as they apply to automation and AI integration
  • Azure VMs, Virtual Networks, Storage Accounts, Azure AD
Job Responsibility
Job Responsibility
  • Expert level engineering skills across a broad range of technology stacks and programming languages
  • As an SRE at Baptist Health you will be a member of a team dedicated to improving our resiliency, reliability, observability, and scalability through different methodologies and tools
  • You will have the drive to improve and define how we automate, observe, scale, and operate enterprise services
  • Design and build infrastructure & systems that provide high levels of scalability, reliability, performance, and security across Azure and on-prem environments
  • Automate manual processes by designing and implementing end-to-end automation pipelines that reduce operational friction, eliminate repetitive tasks, and enforce consistency through Infrastructure-as-Code and CI/CD practices
  • Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for all core services
  • Improve observability of all enterprise services with actionable monitoring, logging, and alerting using tools like Azure Monitor, Application Insights, and SolarWinds
  • Develop playbooks and runbooks to guide operations teams and support staff in managing infrastructure efficiently and safely
  • Partner with Digital Cloud Development Operations, Application Development, and Product teams to ensure new systems are designed for reliability and maintainability
  • Work closely with vendors and cloud providers (Azure, AWS, GCP) to optimize infrastructure and troubleshoot escalated issues
  • Fulltime
Read More
Arrow Right
New

Staff Site Reliability Engineer

As a Staff Site Reliability Engineer, you will be a technical leader and strateg...
Location
Location
Singapore; Australia , Singapore; Melbourne
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in SRE, DevOps, or infrastructure engineering roles, with progressive responsibility
  • Proven ability to lead SRE strategy and execution for large-scale, complex, cross-functional projects
  • Deep expertise with cloud platforms (AWS/GCP), Kubernetes, container orchestration, observability, and incident response frameworks
  • Strong experience supporting production systems with stringent high availability, compliance, and security requirements
  • Demonstrated leadership in mentoring and growing technical teams
  • Excellent collaboration and communication skills, able to influence stakeholders at all levels
  • Degree in Computer Science or related field
Job Responsibility
Job Responsibility
  • Drive the strategic vision and roadmap for Site Reliability Engineering at Airwallex, aligned with business objectives and product goals
  • Architect and oversee the implementation of highly scalable, secure, and resilient cloud infrastructure for new services and platform-wide initiatives
  • Lead and mentor senior engineers and cross-functional teams in reliability engineering best practices, automation, and incident management
  • Champion and evolve operational excellence through advanced observability, SLO management, runbooks, and proactive risk mitigation
  • Lead incident response for high-severity incidents, facilitating post-mortems and driving continuous improvements
  • Collaborate closely with Product, Engineering, Security, and DevOps leadership to ensure compliance, resilience, and alignment across functions
  • Influence and shape engineering culture around reliability, scalability, and DevOps principles across multiple teams
  • Advocate for innovation in tooling, automation, and infrastructure to improve developer productivity and service uptime
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer

As a Senior Staff Software Engineer at NMI, you operate beyond the scope of a si...
Location
Location
United States
Salary
Salary:
130000.00 - 160000.00 USD / Year
parking.net Logo
Parking Network B.V.
Expiration Date
March 13, 2026
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, or equivalent practical experience
  • 8+ years of experience developing complex software applications in a commercial environment, with demonstrated impact at the Staff or Senior Staff engineer level
  • Advanced, hands-on experience building and maintaining large-scale systems using .NET Framework / C# (preferred) and/or PHP, with a strong understanding of object-oriented design principles and software architecture
  • Strong experience working with relational databases, particularly Microsoft SQL Server, including schema design, query optimization, performance tuning, and maintaining data integrity in production systems
  • Proven experience designing, coding, deploying, and operating cloud-based solutions hosted on AWS, with an understanding of scalability, fault tolerance, security, and cost-aware design
  • Experience designing and architecting scalable, distributed systems, with consideration for performance, reliability, and long-term maintainability
  • Deep understanding of the Software Development Life Cycle (SDLC) and agile development methodologies
  • Strong knowledge of security best practices, including secure coding principles and compliance requirements (e.g., OWASP Top Ten, PCI DSS, SOC 2, HIPAA, or similar)
  • Solid understanding of networking fundamentals, including HTTPS, DNS, SSL/TLS, and service-to-service communication patterns
  • Deep knowledge of design patterns and their practical application in real-world systems
Job Responsibility
Job Responsibility
  • Provide technical leadership for the team, influencing architecture and design decisions that span multiple teams
  • Own and evolve critical platform areas including partner onboarding, developer tooling, authentication, user management, and the unified partner portal
  • Identify long-term technical risks and opportunities, and lead initiatives to address scalability, reliability, security, and maintainability
  • Set and reinforce engineering standards, patterns, and best practices across teams
  • Collaborate closely with Engineering Managers and Directors to align technical strategy with delivery plans and team goals
  • Partner with Product Managers, Directors, and Designers to translate product vision into technically sound, scalable solutions
  • Act as a trusted technical advisor across teams, helping resolve complex cross-team dependencies and tradeoffs
  • Drive alignment and consistency across partner-facing systems and experiences
  • Design, implement, and review high-impact code, particularly in complex or high-risk areas
  • Lead technical discovery and execution for ambiguous or strategically important initiatives
What we offer
What we offer
  • A remote first culture
  • Flex PTO
  • Health, Dental and Vision Insurance
  • 13 Paid Holidays
  • Company volunteer days
  • bonus
  • Fulltime
Read More
Arrow Right