CrawlJobs Logo

Staff Software Engineer, Observability

United States, New York City 200000.00 - 230000.00 USD / Year · Job Posted February 01, 2026
Apply Position
Job Link Share

Job Description

At Astronomer, our R&D organization is dedicated to providing an exceptional experience in operating data orchestration based on Apache Airflow at many of the world’s largest companies. We are building out a world-class Observability team to deliver data observability capabilities that provide our customers visibility, reliability, and actionable insights into their data pipelines and products across some of the world’s largest enterprises.

Job Responsibility

  • Lead the end-to-end architecture and evolution of major platform components
  • Build scalable, reliable, and performant features that provide visibility into customer data pipelines
  • Collaborate closely with product, design, and engineering teams
  • Write high-quality, maintainable code
  • Set and champion engineering standards
  • Help improve and evolve the observability platform’s tooling, infrastructure, and processes
  • Lead and coordinate responses to complex production incidents
  • Proactively identify technical opportunities, risks, and gaps
  • Mentor engineers of all levels

Requirements

  • Strong technical experience building complex distributed systems
  • Hands-on experience designing, developing, and scaling production infrastructure
  • Excellent collaboration and communication skills
  • Focus on delivering customer value
  • Enthusiasm for contributing to a healthy, inclusive engineering culture
  • Comfort working in areas of high ambiguity

Nice to have

  • Experience building or scaling data observability products
  • Familiarity with Apache Airflow or similar workflow orchestrators
  • Background in observability, monitoring, or infrastructure at scale (e.g., Datadog, Honeycomb, or similar)

What we offer

  • equity component
  • comprehensive benefits package

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff Software Engineer, Observability

8 matching positions

Staff Software Engineer, DevProd (Observability)

We have an opening for a Staff Software Engineer on our Infrastructure Team, wit...
Location
Location
United States
Salary
Salary:
196000.00 - 245000.00 USD / Year
temporal.io Logo
Temporal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • User-first mindset
  • Motivated by impact
  • Strong opinions about tools and technology balanced by a pragmatic drive for impact
  • Ability to work in a self-directed manner in a fast-paced environment
  • Excellent collaboration and communication skills
  • Demonstrated ability to develop horizontally scalable, resilient, and high performance distributed systems in a production environment
  • Experience designing, implementing, deploying, and supporting large scale, geographically distributed observability and/or high throughput data streaming/processing pipelines, or similar
  • Expert in one or more high-level programming languages, preferably Go
  • Expert-level Kubernetes skills
  • Expert-level query development skills, preferably SQL
Job Responsibility
Job Responsibility
  • Lead the end-to-end Software Development Lifecycle: goals & requirements solicitation, design & review, implementation, operationalization & deployment, support & maintenance
  • Lead feature design, review with stakeholders, iterate to incorporate feedback and drive consensus
  • Clearly document design choices and operational knowledge to successfully deploy and manage the software you develop
  • Provide appropriate test and production readiness coverage for unit, integration, and performance of your feature ownership area
  • Set a high bar for technical excellence and take pride in the software you develop
  • Design and build multi-component, distributed systems that operate at scale
  • Investigate issues with a methodical approach to identify a root cause
  • Understand performance and reliability implications of design options at scale
  • Make related tradeoffs
  • Participate in the team’s on-call rotation
What we offer
What we offer
  • Unlimited PTO, 12 Holidays + 2 Floating Holidays
  • 100% Premiums Coverage for Medical, Dental, and Vision
  • AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
  • Empower 401K Plan
  • Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more
  • $3,600 / Year Work from Home Meals
  • $1,800 / Year Professional Enrichment (Career Development & Professional Memberships)
  • $1,200 / Year Lifestyle Spending Account
  • $1,000 / Year In-Home Office Setup (In addition to Temporal issued equipment)
  • $74 / Month Reimbursement for Internet
  • Fulltime
Read More
Arrow Right

Staff Software engineer - Authentication and Security Observability

The Login Services team sits within Core Security Engineering and owns Uber’s au...
Location
Location
United States , Sunnyvale
Salary
Salary:
232000.00 - 258000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 8+ years of industry experience building large-scale backend platforms, with deep experience in distributed systems and production infrastructure
  • Strong programming experience in multiple languages (e.g., Go, Java, Python, Node.js/TypeScript), with a track record of shipping reliable systems
  • Demonstrated expertise designing and operating scalable distributed services, including reliability engineering and operational excellence (observability, incident response, SLAs)
  • Strong background in security engineering, preferably in identity/authentication and building or operating security-critical pipelines at scale
  • Proven ability to own complex systems end-to-end—from architecture and implementation to rollout, monitoring, and long-term maintainability—in large-scale environments
Job Responsibility
Job Responsibility
  • Lead architecture and execution of core authentication capabilities for human and non-human identities, delivering secure, resilient, and frictionless login experiences at Uber scale
  • Own and evolve Uber’s tier-zero authentication and SSO infrastructure, maintaining high availability, security, and performance for core login flows and enabling secure, policy-driven access to internal and third-party applications
  • Build and evolve platform services (APIs, workflows, policy enforcement) with strong engineering fundamentals: reliability, performance, observability, and safe rollout/rollback
  • Develop the Security Knowledge Platform, building the data/graph foundations and risk signals to categorize identity + asset risk and power multiple security and product use cases
  • Build the next generation of automation and intelligence—agentify IAM operations to reduce toil/cost and develop the Security Knowledge Platform to power identity + asset risk insights across Security Engineering
  • Partner cross-functionally and raise the bar—align stakeholders across Security/IT/Ops/Product, mentor engineers through design reviews and incident learning, and set technical direction for the team
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • All full-time employees are eligible to participate in a 401(k) plan
  • Eligible for various benefits
  • Fulltime
Read More
Arrow Right

Staff Software Development Engineer-Automation Engineer

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
106605.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 29, 2026
Flip Icon
Requirements
Requirements
  • Extensive experience in software development and production support for enterprise systems
  • Strong expertise in automation/RPA platforms, scripting, and debugging complex workflows
  • Proven ability to lead incident response and root cause analysis in high-availability environments
  • Deep understanding of SDLC, CI/CD, release management, and production readiness standards
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Serve as the technical owner for production support of automation and RPA solutions across critical business processes
  • Lead incident triage, root cause analysis, and permanent remediation for high-severity automation failures
  • Establish and enforce runbooks, support models, escalation paths, and on-call readiness for automation platforms
  • Proactively identify systemic issues and implement stability, resiliency, and performance improvements
  • Provide hands-on technical leadership for automation design, debugging, and optimization in production environments
  • Review automation code and configurations to ensure adherence to standards, security, and reliability best practices
  • Partner with development teams to ensure production readiness of new automations before release
  • Guide architectural decisions that reduce operational complexity and technical debt
  • Design and maintain monitoring, alerting, and health dashboards for automation platforms
  • Drive adoption of AIOps, SRE, and automation-first support practices where applicable
What we offer
What we offer
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Fulltime
!
Read More
Arrow Right

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...
Location
Location
United States , Santa Clara
Salary
Salary:
126000.00 - 203500.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
  • Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
  • Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
  • Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
  • Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
  • Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
  • Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
  • Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
  • Strong problem-solving skills and ability to work across teams
Job Responsibility
Job Responsibility
  • Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
  • Lead improvements across production systems, including performance, availability, and incident response
  • Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
  • Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
  • Partner with development teams to improve system reliability, observability, and cloud-native design patterns
  • Define and implement monitoring, alerting, and observability strategies across distributed systems
  • Lead incident response efforts, including root cause analysis and long-term remediation strategies
  • Identify and eliminate operational toil through automation and system improvements
  • Mentor engineers and contribute to raising the bar for production engineering practices
What we offer
What we offer
  • restricted stock units
  • bonus
  • Fulltime
Read More
Arrow Right

Staff Software Engineer

Work Arrangement: This role is categorized as hybrid. This means the successful ...
Location
Location
United States , Austin, Texas; Warren, Michigan
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's in CS, Engineering, or related field (or equivalent experience)
  • 8+ years of experience building enterprise-grade backend platforms and services
  • Deep expertise in Java, Spring Boot, and scalable microservice architectures
  • Experience in generative AI using LLMs, MCP, and/or predictive maintenance
  • Experience with distributed systems, event-driven architecture, and technologies like Apache Kafka
  • Hands-on experience with containerization (Docker, K8s/AKS) and Microsoft Azure
  • Familiarity with PostgreSQL, Redis, and cloud-native storage solutions
  • Track record of mentoring engineers and lead
Job Responsibility
Job Responsibility
  • Architect and evolve distributed systems with a focus on performance, scalability, and maintainability
  • Lead the strategy and implementation of automation across broad technical areas, integrating work across multiple teams and stakeholders to eliminate manual processes, improve reliability, and establish scalable operational mechanisms with measurable KPI impact aligned to business priorities
  • Drive modernization of legacy platforms by influencing key technical decisions, championing change across dependent teams, and delivering solutions that improve scalability, maintainability, and long-term operational efficiency across multiple departments
  • Lead the development of backend APIs and services that power customer-facing digital platforms
  • Write clean, well-tested, secure code—and guide others to do the same through code reviews and mentorship
  • Build and optimize synchronous and asynchronous integrations (REST, GraphQL, Kafka, messaging queues)
  • Mentor engineers on architectural design, modern development patterns, and industry best practices
  • Evaluate and integrate emerging technologies to improve system capabilities and developer efficiency
  • Collaborate with cross-functional teams to align technical execution with business goals
  • Champion CI/CD, automated testing, observability, and system performance
What we offer
What we offer
  • This job may be eligible for relocation benefits.
  • Fulltime
Read More
Arrow Right

Staff Software Engineer I - Confluent Platform

We’re not just building better tech. We’re rewriting how data moves and what the...
Location
Location
India
Salary
Salary:
Not provided
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of hands-on software development experience, with a proven ability to anticipate future technical needs and execute toward them
  • A strong track record of taking ideas from concept to production in complex, high-scale systems
  • Willingness to roll up your sleeves—design, code, debug, and operate critical systems
  • Deep experience building and operating large-scale distributed systems, including strong fundamentals in OS, networking, storage, and cloud infrastructure
  • Excellent grounding in distributed systems, concurrency, and multi-threaded programming
  • A proactive, self-starter mindset with strong problem-solving skills—identifying root causes and driving durable fixes
  • Ability to balance short-term execution with long-term architectural integrity
  • ship incrementally and iterate with urgency
  • Strong influence skills—able to drive technical decisions across teams and senior leadership through clear, data-driven communication
  • Experience handling high-severity production issues, including on-call ownership, deep debugging, and mitigation under pressure
Job Responsibility
Job Responsibility
  • Technically lead the evolution of the Confluent Platform, with deep ownership of USM, hybrid-first management, and platform-wide operational capabilities
  • Partner closely with product management, engineering leadership, and cross-org stakeholders (including Confluent Cloud) to define and execute the CP roadmap
  • Act as a strong external technical voice for Confluent Platform across the company
  • Champion domain health, operational hygiene, and platform reliability
  • raise the bar through design and code reviews
  • Lead architecture and design of large, complex systems spanning distributed systems, Kubernetes, security, and observability
  • Mentor and grow senior engineers and technical leads, providing hands-on guidance and career mentorship
  • Represent and strengthen engineering leadership in India, setting standards for execution, communication, and engineering excellence
  • Build and evolve processes that enable teams to operate at scale without sacrificing quality or velocity
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Connectivity (C++)

We’re searching for a Staff Software Engineer to join Aurora’s Vehicle Connectiv...
Location
Location
United States , Pittsburgh
Salary
Salary:
171000.00 - 273000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS/PhD in Computer Science or related field, or equivalent industry experience
  • Expert-level C++ programming skills and the ability to design high-performance, thread-safe, and memory-efficient systems in a Linux environment
  • Ability to work across multiple programming languages and paradigms
  • Experience working with networking protocols, such as TCP, UDP, gRPC, HTTP and network health monitoring frameworks
  • A passion for writing robust, intuitive, and pragmatic production code
  • Experience with Linux network configuration and troubleshooting
  • Ability to navigate and work effectively in large codebases
  • Strong verbal and written communication skills
  • Work autonomously, but still be a great team player with colleagues across time-zones
Job Responsibility
Job Responsibility
  • Define vehicle connectivity and communication architecture
  • Design and implement highly-reliable, low-latency vehicle communications framework handling diverse mobile network conditions
  • Address connectivity-specific concerns for vehicle runtime, e.g. networking, performance, and observability
  • Work with autonomy engineers to meet performance and efficiency requirements for data throughput
  • Provide engineering support for field testing and fleet operations
  • Mentor senior engineers, set coding standards, and drive the long-term roadmap for vehicle to cloud connectivity
What we offer
What we offer
  • Annual bonus
  • Equity compensation
  • Benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Connectivity

Location
Location
United States , Mountain View
Salary
Salary:
180000.00 - 303000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS/PhD in Computer Science or related field, or equivalent industry experience
  • Expert-level C++ programming skills and the ability to design high-performance, thread-safe, and memory-efficient systems in a Linux environment
  • Ability to work across multiple programming languages and paradigms
  • Experience working with networking protocols, such as TCP, UDP, gRPC, HTTP and network health monitoring frameworks
  • A passion for writing robust, intuitive, and pragmatic production code
  • Experience with Linux network configuration and troubleshooting
  • Ability to navigate and work effectively in large codebases
  • Strong verbal and written communication skills
  • Work autonomously, but still be a great team player with colleagues across time-zones
Job Responsibility
Job Responsibility
  • Define vehicle connectivity and communication architecture
  • Design and implement highly-reliable, low-latency vehicle communications framework handling diverse mobile network conditions
  • Address connectivity-specific concerns for vehicle runtime, e.g. networking, performance, and observability
  • Work with autonomy engineers to meet performance and efficiency requirements for data throughput
  • Provide engineering support for field testing and fleet operations
  • Mentor senior engineers, set coding standards, and drive the long-term roadmap for vehicle to cloud connectivity
What we offer
What we offer
  • Annual bonus
  • Equity compensation
  • Benefits
  • Fulltime
Read More
Arrow Right