CrawlJobs Logo
Briefcase Icon
Category Icon

Filters

×

Senior Software Engineer - Observability and Reliability Jobs

1 Job Offers

Filters
Senior Software Engineer, Site Reliability
Save Icon
Join Babylist as a Senior Site Reliability Engineer on our Platform team. You will ensure system stability and scalability using AWS, Terraform, and Kubernetes. This remote role in the US/Canada offers strong benefits, including comprehensive health insurance and a supportive, AI-forward environm...
Location Icon
Location
United States; Canada
Salary Icon
Salary
186818.00 - 224183.00 USD; CAD / Year
babylist.com Logo
Babylist
Expiration Date
Until further notice
Senior Software Engineer - Observability and Reliability jobs represent a critical and highly specialized career path at the intersection of software development, systems engineering, and operations. Professionals in this role are the architects of resilience, dedicated to building and maintaining systems that are not only functional but also robust, scalable, and transparent. Their core mission is to ensure that complex digital services—whether user-facing applications or internal platforms—remain healthy, performant, and available around the clock. This goes beyond traditional support; it involves proactive engineering to prevent issues before they impact users and to create systems that are inherently easier to understand and troubleshoot. The typical responsibilities for a Senior Software Engineer in this domain are multifaceted. A primary focus is on designing, implementing, and managing comprehensive observability frameworks. This involves instrumenting code and infrastructure to generate meaningful logs, metrics, and traces, providing a deep, actionable view into system behavior. They build and refine alerting systems that are precise and actionable, reducing noise and enabling rapid incident response. A significant portion of their work is dedicated to reliability engineering: conducting chaos experiments, defining and tracking Service Level Objectives (SLOs), and implementing automation for failover, scaling, and recovery. They also champion and build developer tools and platforms that enhance the entire engineering lifecycle, from continuous integration and deployment (CI/CD) pipelines to infrastructure-as-code (IaC) templates, empowering other teams to build reliable software by default. To excel in these jobs, individuals typically possess a hybrid skill set. A strong software engineering background is non-negotiable, with proficiency in languages like Go, Python, or Java to build tools and automation. Deep expertise in cloud platforms (e.g., AWS, GCP, Azure) and container orchestration with Kubernetes is standard. They are adept with observability stacks such as Prometheus, Grafana, OpenTelemetry, and distributed tracing tools. Equally important are the foundational principles of Site Reliability Engineering (SRE), including capacity planning, performance analysis, and post-incident review processes. Senior-level roles demand excellent problem-solving skills, the ability to debug complex, cross-system issues, and strong collaboration and communication skills to work with diverse product teams and articulate technical trade-offs. Leadership in establishing best practices, mentoring engineers, and driving a culture of reliability across the organization is a key expectation. For those passionate about creating stable, efficient, and observable systems that serve millions, Senior Software Engineer - Observability and Reliability jobs offer a challenging and impactful career.

Filters

×
Countries
Category
Location
Work Mode
Salary