CrawlJobs Logo

Senior Software Engineer - Cloud Infrastructure & Observability

roku.com Logo

Roku

Location Icon

Location:
United Kingdom , Cambridge

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are building a next-generation observability and cloud platform that is high-performance, cost-efficient, secure, and scalable across multi-region, multi-cloud clusters. You will lead the architecture and evolution of Roku’s observability and cloud infrastructure stack. This includes metrics, logs, traces, telemetry pipelines, service mesh, developer experience, and reliability of systems that power thousands of services and millions of devices. You will drive a vision where developers gain deep visibility with minimal overhead, onboarding is seamless, and insights are available in real time. Your work will directly help Roku scale efficiently while maintaining reliability, cost control, and performance.

Job Responsibility:

  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open-source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi-tenant, multi-cluster, and multi-cloud designs
  • Contribute improvements back to open source and CNCF-aligned projects
  • shape standards adoption (OpenTelemetry, OpenMetrics) across the company
  • Mentor engineers
  • establish best practices for reliability, efficiency, and cost management across service mesh and observability domains

Requirements:

  • Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross-functional communication
  • effective collaboration with both technical and non-technical stakeholders
  • Culture fit: independent thinker, pragmatic problem-solver, low-ego collaborator who moves fast and focuses on company success
  • Experience integrating AI tools to improve processes and reduce toil
  • Open-source contributions in CNCF projects is preferred but not mandatory

Nice to have:

Open-source contributions in CNCF projects

What we offer:
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off work for vacation and other personal reasons

Additional Information:

Job Posted:
April 23, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Cloud Infrastructure & Observability

Senior Director of Engineering, Infrastructure

Senior Director of Engineering role leading the Infrastructure group at PagerDut...
Location
Location
United States , San Francisco
Salary
Salary:
233000.00 - 392000.00 USD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in senior engineering leadership roles, managing multiple layers of managers
  • Significant experience as a hands-on technical contributor earlier in your career
  • Deep knowledge of modern infrastructure and software delivery: high availability, distributed systems, public cloud (AWS), microservices, containers, CI/CD pipelines, observability, and automation
  • Track record of building and scaling high-performing, inclusive engineering organizations
Job Responsibility
Job Responsibility
  • Define and drive the multi-year strategy for PagerDuty's infrastructure and platform foundations
  • Strong ownership of PagerDuty's reliability patterns and practices
  • Bar raiser for all engineering functions
  • Lead, mentor, and scale a diverse team of Engineering Managers, Senior Managers, and technical leaders across multiple geographies
  • Ensure the reliability, scalability, and security of PagerDuty's global SaaS platform
  • Partner with peers in Engineering, Product, and Security to deliver large cross-functional initiatives
  • Champion engineering excellence: CI/CD maturity, observability best practices, operational rigor, and incident readiness
  • Manage budgets, headcount, and vendor relationships to optimize infrastructure investments
  • Represent Infrastructure externally with customers and partners, and internally with executives, as a trusted voice on technical and business tradeoffs
  • Foster a culture of inclusion, accountability, collaboration, and growth
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Senior Director of Engineering, Infrastructure

Senior Director of Engineering to lead the Infrastructure group at PagerDuty, se...
Location
Location
United States , Atlanta
Salary
Salary:
233000.00 - 392000.00 USD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in senior engineering leadership roles, managing multiple layers of managers
  • Significant experience as a hands-on technical contributor earlier in your career
  • Deep knowledge of modern infrastructure and software delivery: high availability, distributed systems, public cloud (AWS), microservices, containers, CI/CD pipelines, observability, and automation
  • Track record of building and scaling high-performing, inclusive engineering organizations
Job Responsibility
Job Responsibility
  • Define and drive the multi-year strategy for PagerDuty's infrastructure and platform foundations
  • Strong ownership of PagerDuty's reliability patterns and practices
  • Lead, mentor, and scale a diverse team of Engineering Managers, Senior Managers, and technical leaders across multiple geographies
  • Ensure the reliability, scalability, and security of PagerDuty's global SaaS platform
  • Partner with peers in Engineering, Product, and Security to deliver large cross-functional initiatives
  • Champion engineering excellence: CI/CD maturity, observability best practices, operational rigor, and incident readiness
  • Manage budgets, headcount, and vendor relationships to optimize infrastructure investments
  • Represent Infrastructure externally with customers and partners, and internally with executives
  • Foster a culture of inclusion, accountability, collaboration, and growth
What we offer
What we offer
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Paid volunteer time off: 20 hours per year
  • Fulltime
Read More
Arrow Right

Senior Software Engineer (Infrastructure) - HyperDX

Join us in revolutionizing Observability for Developers! We’re on a mission to r...
Location
Location
Netherlands
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of backend engineering experience
  • Strong TypeScript and Node.js skills (bonus for additional languages)
  • Deep understanding of APIs, event-driven systems, and high-throughput data pipelines
  • Proficiency in SQL and experience working with analytical databases (ClickHouse experience a plus)
  • Experience with Docker and Kubernetes, plus Helm for managing production deployments
  • Experience with infrastructure-as-code (Terraform, Pulumi, or similar)
  • Familiarity with CI/CD pipelines, monitoring systems, and production-grade alerting practices
  • A passion for building reliable, maintainable, cloud-native systems
Job Responsibility
Job Responsibility
  • Build the core platform: Design and implement backend systems and APIs that power HyperDX, enabling engineers to ingest, query, and analyze observability data at massive scale
  • Scale deployments and infrastructure: Architect, deploy, and maintain cloud-native systems that ensure reliability, scalability, and performance. You’ll use Kubernetes, Helm, and infrastructure-as-code to make deployments simple and resilient
  • Ensure maintainability and operational excellence: Define best practices for CI/CD, monitoring, logging, and alerting. Drive automation across testing, scaling, and incident response to keep our platform healthy and developer-friendly
  • Engineer for scale: Design and operate ingestion and data processing pipelines that remain performant, resilient, and observable—even as we grow to petabyte-level workloads
  • Engage with the community: Collaborate with open-source contributors and customers, solve their challenges, and incorporate their feedback into our roadmap
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer (Infrastructure) - HyperDX

Join us in revolutionizing Observability for Developers! We’re on a mission to r...
Location
Location
Germany
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of backend engineering experience
  • Strong TypeScript and Node.js skills (bonus for additional languages)
  • Deep understanding of APIs, event-driven systems, and high-throughput data pipelines
  • Proficiency in SQL and experience working with analytical databases (ClickHouse experience a plus)
  • Experience with Docker and Kubernetes, plus Helm for managing production deployments
  • Experience with infrastructure-as-code (Terraform, Pulumi, or similar)
  • Familiarity with CI/CD pipelines, monitoring systems, and production-grade alerting practices
  • A passion for building reliable, maintainable, cloud-native systems
Job Responsibility
Job Responsibility
  • Build the core platform: Design and implement backend systems and APIs that power HyperDX, enabling engineers to ingest, query, and analyze observability data at massive scale
  • Scale deployments and infrastructure: Architect, deploy, and maintain cloud-native systems that ensure reliability, scalability, and performance. You’ll use Kubernetes, Helm, and infrastructure-as-code to make deployments simple and resilient
  • Ensure maintainability and operational excellence: Define best practices for CI/CD, monitoring, logging, and alerting. Drive automation across testing, scaling, and incident response to keep our platform healthy and developer-friendly
  • Engineer for scale: Design and operate ingestion and data processing pipelines that remain performant, resilient, and observable—even as we grow to petabyte-level workloads
  • Engage with the community: Collaborate with open-source contributors and customers, solve their challenges, and incorporate their feedback into our roadmap
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Postgres

ClickHouse is launching a strategic Postgres initiative to extend our developer-...
Location
Location
United States
Salary
Salary:
140000.00 - 208000.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering, ideally with experience building or operating database or cloud platform systems
  • Deep understanding of Postgres — configuration, extensions, operations, and performance tuning
  • Strong programming experience in Ruby, Go, or Python (or willingness to work across languages)
  • Familiarity with cloud infrastructure, APIs, and automation tools (Terraform, Kubernetes, CI/CD)
  • Understanding of distributed systems, data replication, and service orchestration patterns
  • Pragmatic, detail-oriented, and comfortable with both greenfield development and operational ownership
  • Happy to contribute where needed — from backend APIs and platform automation to Postgres internals and debugging
  • Strong communicator who works effectively across teams in a fast-paced, cross-functional environment
  • Operate with a founder’s mindset — take initiative, move quickly, and care deeply about outcomes
Job Responsibility
Job Responsibility
  • Design and build backend services that orchestrate and manage database clusters in ClickHouse Cloud
  • Extend our platform control plane — written in Ruby, Go, and TypeScript — to support new Postgres capabilities
  • Contribute to automation and tooling that simplify cluster provisioning, scaling, and lifecycle management
  • Collaborate with infrastructure, SRE, and product teams to ensure operational excellence, performance, and reliability
  • Develop APIs and integrations that expose new Postgres functionality to customers and internal systems
  • Improve observability, deployment safety, and debugging workflows for database services
  • Participate in design discussions, code reviews, and on-call rotations, contributing to the overall reliability and velocity of the team
  • Operate with autonomy — identifying opportunities, driving execution, and delivering meaningful impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer (Infrastructure) - HyperDX

Join us in revolutionizing Observability for Developers! We’re on a mission to r...
Location
Location
United Kingdom
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of backend engineering experience
  • Strong TypeScript and Node.js skills (bonus for additional languages)
  • Deep understanding of APIs, event-driven systems, and high-throughput data pipelines
  • Proficiency in SQL and experience working with analytical databases (ClickHouse experience a plus)
  • Experience with Docker and Kubernetes, plus Helm for managing production deployments
  • Experience with infrastructure-as-code (Terraform, Pulumi, or similar)
  • Familiarity with CI/CD pipelines, monitoring systems, and production-grade alerting practices
  • A passion for building reliable, maintainable, cloud-native systems
Job Responsibility
Job Responsibility
  • Build the core platform: Design and implement backend systems and APIs that power HyperDX, enabling engineers to ingest, query, and analyze observability data at massive scale
  • Scale deployments and infrastructure: Architect, deploy, and maintain cloud-native systems that ensure reliability, scalability, and performance. You’ll use Kubernetes, Helm, and infrastructure-as-code to make deployments simple and resilient
  • Ensure maintainability and operational excellence: Define best practices for CI/CD, monitoring, logging, and alerting. Drive automation across testing, scaling, and incident response to keep our platform healthy and developer-friendly
  • Engineer for scale: Design and operate ingestion and data processing pipelines that remain performant, resilient, and observable—even as we grow to petabyte-level workloads
  • Engage with the community: Collaborate with open-source contributors and customers, solve their challenges, and incorporate their feedback into our roadmap
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Senior Software Engineer - Postgres

ClickHouse is launching a strategic Postgres initiative to extend our developer-...
Location
Location
Canada
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering, ideally with experience building or operating database or cloud platform systems
  • Deep understanding of Postgres — configuration, extensions, operations, and performance tuning
  • Strong programming experience in Ruby, Go, or Python (or willingness to work across languages)
  • Familiarity with cloud infrastructure, APIs, and automation tools (Terraform, Kubernetes, CI/CD)
  • Understanding of distributed systems, data replication, and service orchestration patterns
  • Pragmatic, detail-oriented, and comfortable with both greenfield development and operational ownership
  • Happy to contribute where needed — from backend APIs and platform automation to Postgres internals and debugging
  • Strong communicator who works effectively across teams in a fast-paced, cross-functional environment
  • Operate with a founder’s mindset — take initiative, move quickly, and care deeply about outcomes
Job Responsibility
Job Responsibility
  • Design and build backend services that orchestrate and manage database clusters in ClickHouse Cloud
  • Extend our platform control plane — written in Ruby, Go, and TypeScript — to support new Postgres capabilities
  • Contribute to automation and tooling that simplify cluster provisioning, scaling, and lifecycle management
  • Collaborate with infrastructure, SRE, and product teams to ensure operational excellence, performance, and reliability
  • Develop APIs and integrations that expose new Postgres functionality to customers and internal systems
  • Improve observability, deployment safety, and debugging workflows for database services
  • Participate in design discussions, code reviews, and on-call rotations, contributing to the overall reliability and velocity of the team
  • Operate with autonomy — identifying opportunities, driving execution, and delivering meaningful impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Postgres

ClickHouse is launching a strategic Postgres initiative to extend our developer-...
Location
Location
India
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering, ideally with experience building or operating database or cloud platform systems
  • Deep understanding of Postgres — configuration, extensions, operations, and performance tuning
  • Strong programming experience in Ruby, Go, or Python (or willingness to work across languages)
  • Familiarity with cloud infrastructure, APIs, and automation tools (Terraform, Kubernetes, CI/CD)
  • Understanding of distributed systems, data replication, and service orchestration patterns
  • Pragmatic, detail-oriented, and comfortable with both greenfield development and operational ownership
  • Happy to contribute where needed — from backend APIs and platform automation to Postgres internals and debugging
  • Strong communicator who works effectively across teams in a fast-paced, cross-functional environment
  • Operate with a founder’s mindset — take initiative, move quickly, and care deeply about outcomes
Job Responsibility
Job Responsibility
  • Design and build backend services that orchestrate and manage database clusters in ClickHouse Cloud
  • Extend our platform control plane — written in Ruby, Go, and TypeScript — to support new Postgres capabilities
  • Contribute to automation and tooling that simplify cluster provisioning, scaling, and lifecycle management
  • Collaborate with infrastructure, SRE, and product teams to ensure operational excellence, performance, and reliability
  • Develop APIs and integrations that expose new Postgres functionality to customers and internal systems
  • Improve observability, deployment safety, and debugging workflows for database services
  • Participate in design discussions, code reviews, and on-call rotations, contributing to the overall reliability and velocity of the team
  • Operate with autonomy — identifying opportunities, driving execution, and delivering meaningful impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right