Cloud Software Engineer - Observability Platform Job at ClickHouse

Cloud Software Engineer - Observability Platform

ClickHouse is looking for an experienced engineer to join our Observability team...

Location

Canada

Salary:

Not provided

ClickHouse

Expiration Date

Until further notice

Requirements

5+ years building and running production systems at scale
Proficiency in Golang
Experience with Kubernetes, Helm, ArgoCD, and Terraform or similar IaC tools
Comfortable working with at least one major cloud provider (AWS, GCP, Azure)
Experience with OpenTelemetry, Prometheus, Grafana, or similar tools
Experience with ClickHouse preferred

Job Responsibility

Design, build, and operate distributed systems that power observability across ClickHouse Cloud
Own reliability, performance, and cost-efficiency of our telemetry pipeline and storage systems
Take part in the on-call rotation and help drive root-cause resolution and long-term fixes
Build tooling and automation to eliminate repetitive operational work
Help shape the roadmap for observability by identifying bottlenecks and scaling challenges
Collaborate with other engineering teams to improve their observability posture
Contribute to design discussions, architecture reviews, and mentor teammates

What we offer

Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
Healthcare - Employer contributions towards your healthcare
Equity in the company - Every new team member who joins our company receives stock options
Time off - Flexible time off in the US, generous entitlement in other countries
A $500 Home office setup if you’re a remote employee
Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

We are looking for a highly skilled engineer with deep expertise in building and...

Location

United States , San Francisco

Salary:

166000.00 - 201000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

7+ years of experience in infrastructure or platform engineering, with a focus on observability and monitoring systems
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)
Strong programming skills in Go or Python for automation, operators, and custom integrations
Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments
Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data
Solid understanding of distributed systems, performance engineering, and debugging complex workloads
Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices

Job Responsibility

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Senior Cloud Platform Software Engineer

We are seeking a Senior Cloud Platform Software Engineer to join our team and be...

Location

United Kingdom , London

Salary:

Not provided

Zenobē

Expiration Date

Until further notice

Requirements

Strong hands-on experience with AWS services such as EC2, S3, IAM, RDS, Control Tower etc.
Hands-on experience and daily management of Kafka
A working knowledge of Kubernetes
Proficiency in Terraform for managing cloud infrastructure at scale
Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK, CloudWatch)
Strong automation skills (e.g., Ansible, GitHub Actions) for reliability and operational tasks
Solid understanding and practical experience with GitOps principles and tools, CI/CD pipelines and DevOps best practices
Proficient with version control using Git and collaboration via Git-based workflows
Excellent communication skills, able to present technical information clearly to non-technical stakeholders
Experience mentoring junior engineers and leading others by example

Job Responsibility

Designing, implementing, and managing scalable, secure, and highly available cloud infrastructure
Help the development of our AWS cloud architecture using automation and DevOps practices
Collaborating closely with development teams to troubleshoot complex issues, optimise performance, and enforce compliance with industry standards
Evaluating emerging cloud technologies to align with business goals and drive innovation
Mentoring other engineers, helping your team grow, and taking on some team and project leadership activities
Being a go-to person when another team or another Cloud team member is facing an unknown issue with a production or pre-production workload
Planning, leading and executing on our ideas for a more reliable and scalable usage of AWS
Collaborate across teams to deliver scalable, real-time and batch data pipelines that support our products and analytics
Support and mentor teammates, sharing knowledge and reviewing designs and code
Contribute to the architecture and evolution of our data platform

What we offer

Up to 33% annual bonus
25 days holiday, increasing with length of service up to 30 days, plus bank holidays
Private Medical Insurance
£1,500 training budget per year
EV Salary Sacrifice Scheme
Pension scheme, up to 8% matched contributions
Enhanced parental leave
Cash back health plan

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...

Location

United States , Oakland

Salary:

164000.00 - 208000.00 USD / Year

Everlaw

Expiration Date

Until further notice

Requirements

BS or MS in Computer Science, or equivalent coursework
At least 3 years of experience building logging, metrics, and tracing infrastructure
Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
Excellent communication and collaboration skills
Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role. Due to federal contract requirements, Everlaw may only hire US citizens for this position.

Job Responsibility

Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base.

What we offer

Equity program
401(k) retirement plan with company matching
Health, dental, and vision
Flexible Spending Accounts for health and dependent care expenses
Paid parental leave and approximately 10 days (80 hours) per year of sick leave
Seventeen paid vacation days plus 11 federal holidays
Membership to Modern Health to help employees prioritize mental health and wellness
Annual allocation for Learning & Development opportunities and applicable professional membership dues
Company-sponsored life and disability insurance
Work in Uptown Oakland, just steps from the BART line and dozens of restaurants and walking distance to Lake Merritt

Fulltime

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...

Location

United States , Oakland

Salary:

164000.00 - 239000.00 USD / Year

Everlaw

Expiration Date

Until further notice

Requirements

BS or MS in Computer Science, or equivalent coursework
At least 3 years of experience building logging, metrics, and tracing infrastructure
Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
Excellent communication and collaboration skills that can motivate and move the team towards a common direction
Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role
Due to federal contract requirements, Everlaw may only hire US citizens for this position

Job Responsibility

Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
Build custom libraries and plugins in Java and Python to allow engineers to generate meaningful metrics, logs and traces
Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base
Be a Code Reviewer by reviewing code developed by others using your knowledge of programming languages, design patterns, and best practices
Contribute to documentation for internal engineering consumption or for external the Everlaw platform

What we offer

Competitive compensation
Comprehensive benefits package that includes medical, dental, wellness program
Paid parental leave
Professional development
Fully stocked kitchen
Equity program
401(k) retirement plan with company matching
Health, dental, and vision
Flexible Spending Accounts for health and dependent care expenses
Paid parental leave and approximately 10 days (80 hours) per year of sick leave

Fulltime

Senior Software Engineer - Cloud Infrastructure & Observability

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

15+ years in software engineering with a track record of architecting distributed systems or platforms at scale
Strong hands‑on experience in Golang and one scripting language (e.g., Python or Shell)
Experience operating observability at pb-scale ingestion and hundreds of millions of series
Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
strong proficiency with service mesh technologies (Istio/Envoy), infrastructure‑as‑code (Terraform) and experience in multi‑cloud (AWS, GCP)
Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
Proven experience integrating security as part of infrastructure and platform development
Exceptional cross‑functional communication
effective collaboration with both technical and non‑technical stakeholders

Job Responsibility

Architect and lead Roku’s observability platform across metrics, logs, and traces
evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
Extend and harden open‑source observability systems
overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
Implement features such as pre‑aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
augment and automate CI/CD flows and onboarding
Integrate security into infrastructure and platform services
ensure robust multi‑tenant, multi‑cluster, and multi‑cloud designs
Contribute improvements back to open source and CNCF‑aligned projects

What we offer

Global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off in accordance with local leave policies

Fulltime

Senior Software Engineer - Cloud Infrastructure & Observability

We are building a next-generation observability and cloud platform that is high-...

Location

United Kingdom , Cambridge

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
Experience operating observability at pb-scale ingestion and hundreds of millions of series
Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
Proven experience integrating security as part of infrastructure and platform development
Exceptional cross-functional communication
effective collaboration with both technical and non-technical stakeholders

Job Responsibility

Architect and lead Roku’s observability platform across metrics, logs, and traces
evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
Extend and harden open-source observability systems
overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
augment and automate CI/CD flows and onboarding
Integrate security into infrastructure and platform services
ensure robust multi-tenant, multi-cluster, and multi-cloud designs
Contribute improvements back to open source and CNCF-aligned projects

What we offer

Global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off work for vacation and other personal reasons

Fulltime

Senior Software Engineer II - Cloud Compute Platform

As a Software Engineer on the Compute Platform team, you will be a key technical...

Location

United States

Salary:

197400.00 - 232000.00 USD / Year

Confluent

Expiration Date

Until further notice

Requirements

8+ years of experience delivering scalable software solutions
Proven track record of leading the delivery of large-scale, highly available, low-latency systems
Deep expertise in Kubernetes including controller development, operator patterns, and multi-cluster architectures
Strong proficiency in Go with experience building production-grade distributed systems
Experience with multi-tenant platform architectures and security isolation patterns
Familiarity with gRPC, Protobuf, and API design for internal platform services
Experience with observability tools and operational excellence practices
Experience with multi-cloud environments (AWS, GCP, Azure) and cloud-provider integrations
Track record of providing technical leadership and mentorship
Track record of working collaboratively across teams including product management, SRE, and other engineering teams

Job Responsibility

Drive the overall technical charter for the Compute Platform, including multi-cluster orchestration, workload placement, and security architecture
Design and implement platform APIs and Kubernetes operators using Go to support evolving workload requirements
Work closely with product management and engineering leadership to build and drive the roadmap for Confluent's Compute Platform, enabling new business opportunities across Confluent
Deliver high-impact initiatives in areas such as workload scheduling, disruption management, network isolation, rolling update strategies, and cross-cluster resource management
Lead technical design reviews and drive architectural decisions across organizational boundaries
Mentor and grow other engineers on the team through code reviews, pairing, and technical guidance
Own operational aspects including availability, reliability, performance monitoring, emergency response, and disaster recovery for our global compute infrastructure

What we offer

Remote-First Work
Robust Insurance Benefits
Flexible Time Away
The Best Teammates
Experience Ambassadors
Open and Honest Culture
Well-Being and Growth
Offers Equity

Fulltime

Select Country

Cloud Software Engineer - Observability Platform

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Cloud Software Engineer - Observability Platform

Cloud Software Engineer - Observability Platform

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Senior Cloud Platform Software Engineer

Senior Software Engineer, Platform Observability

Senior Software Engineer, Platform Observability

Senior Software Engineer - Cloud Infrastructure & Observability

Senior Software Engineer - Cloud Infrastructure & Observability

Senior Software Engineer II - Cloud Compute Platform

Our AI answers in your language