Senior Cloud Engineer

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

We are looking for a highly skilled engineer with deep expertise in building and...

Location

United States , San Francisco

Salary:

166000.00 - 201000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

7+ years of experience in infrastructure or platform engineering, with a focus on observability and monitoring systems
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)
Strong programming skills in Go or Python for automation, operators, and custom integrations
Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments
Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data
Solid understanding of distributed systems, performance engineering, and debugging complex workloads
Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices

Job Responsibility

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Senior Cloud Engineer – AV Cloud Engineering

We are hiring a Senior Cloud Engineer to join the AV Cloud Engineering team with...

Location

United States , Austin, Texas; Sunnyvale, California

Salary:

170000.00 - 230000.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Fully qualified proficiency in Kubernetes (GKE) and the Google Cloud Platform (GCP) ecosystem, including VPC, IAM, etc.
Hands-on experience implementing high-availability systems and managing the lifecycle of production-grade clusters
Strong proficiency in software engineering and DevOps principles, specifically using Golang/Python and Terraform
Ability to operate independently in an ambiguous environment and translate high-level requirements into clear technical tasks
A "Growth-based Mindset" with a commitment to continuous upskilling and a belief that team capacities are developed through effort and coaching
Professional experience managing the trade-offs between hardware-level performance (GPU passthrough) and clean cloud abstractions

Job Responsibility

Architectural Execution: Implement and manage the lifecycle of Kubernetes (GKE) clusters across hybrid and multi-cloud environments, ensuring production safety through automated patching and upgrades
Platform Implementation: Develop and maintain self-service PaaS features that abstract infrastructure complexity, providing reliable and performant access to specialized hardware like GPUs
Connectivity & Traffic: Implement and optimize high-throughput ingress patterns and service mesh (Istio) configurations to support distributed AV data and ML workloads
Project Independence: Take ownership of complex cloud initiatives from design through deployment, identifying technical gaps and proactively implementing robust solutions
Operational Excellence: Eliminate "human duct tape" by replacing manual cloud-management tasks with declarative state-enforcement (Terraform/GitOps) and custom automation
Reliability & Observability: Define and monitor SLIs/SLOs for cloud services, ensuring the platform meets the availability targets required for Super Cruise validation
Peer Mentorship: Proactively share technical lessons learned and participate in rigorous code and architecture reviews to support a healthy, high-trust engineering culture

What we offer

medical
dental
vision
Health Savings Account
Flexible Spending Accounts
retirement savings plan
sickness and accident benefits
life insurance
paid vacation & holidays
tuition assistance programs

Fulltime

Senior Software Engineer - Cloud Infrastructure & Observability

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

15+ years in software engineering with a track record of architecting distributed systems or platforms at scale
Strong hands‑on experience in Golang and one scripting language (e.g., Python or Shell)
Experience operating observability at pb-scale ingestion and hundreds of millions of series
Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
strong proficiency with service mesh technologies (Istio/Envoy), infrastructure‑as‑code (Terraform) and experience in multi‑cloud (AWS, GCP)
Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
Proven experience integrating security as part of infrastructure and platform development
Exceptional cross‑functional communication
effective collaboration with both technical and non‑technical stakeholders

Job Responsibility

Architect and lead Roku’s observability platform across metrics, logs, and traces
evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
Extend and harden open‑source observability systems
overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
Implement features such as pre‑aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
augment and automate CI/CD flows and onboarding
Integrate security into infrastructure and platform services
ensure robust multi‑tenant, multi‑cluster, and multi‑cloud designs
Contribute improvements back to open source and CNCF‑aligned projects

What we offer

Global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off in accordance with local leave policies

Fulltime

Senior Software Engineer - Cloud Infrastructure & Observability

We are building a next-generation observability and cloud platform that is high-...

Location

United Kingdom , Cambridge

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
Experience operating observability at pb-scale ingestion and hundreds of millions of series
Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
Proven experience integrating security as part of infrastructure and platform development
Exceptional cross-functional communication
effective collaboration with both technical and non-technical stakeholders

Job Responsibility

Architect and lead Roku’s observability platform across metrics, logs, and traces
evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
Extend and harden open-source observability systems
overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
augment and automate CI/CD flows and onboarding
Integrate security into infrastructure and platform services
ensure robust multi-tenant, multi-cluster, and multi-cloud designs
Contribute improvements back to open source and CNCF-aligned projects

What we offer

Global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off work for vacation and other personal reasons

Fulltime

Senior Platform Engineer – AV Cloud Engineering

We are hiring a Senior Platform Engineer to join the Autonomous Vehicle (AV) Clo...

Location

United States , Austin;Sunnyvale

Salary:

170000.00 - 230000.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Fully qualified proficiency in Kubernetes (GKE) and the Google Cloud Platform (GCP) ecosystem, including VPC, IAM, etc.
Hands-on experience implementing high-availability systems and managing the lifecycle of production-grade clusters
Strong proficiency in software engineering and DevOps principles, specifically using Golang/Python and Terraform
Ability to operate independently in an ambiguous environment and translate high-level requirements into clear technical tasks
A 'Growth-based Mindset' with a commitment to continuous upskilling and a belief that team capacities are developed through effort and coaching
Professional experience managing the trade-offs between hardware-level performance (GPU passthrough) and clean cloud abstractions
5+ years of experience or proven record of defining and executing technical strategy that required coordination across multiple teams, senior executives, and front-line engineers
Bachelors Degree in Computer Science or related field OR equivalent work experience
Hands-on experience with Kubernetes in production and strong familiarity with at least one major cloud ecosystem such as GCP, AWS, or Azure
Strong software engineering skills in Go, Python, or similar languages, with the ability to build reusable automation, services, APIs, or controllers

Job Responsibility

Build and evolve internal platform capabilities, self-service workflows, APIs, and automation that make the right path the easiest path for AV engineering teams
Design clean abstractions that allow product and research teams to consume complex infrastructure, including specialized hardware such as GPUs, without deep infrastructure expertise
Improve platform primitives across traffic, service mesh, runtime configuration, and connectivity for distributed AV workloads
Reduce toil and improve reliability through reusable platform tooling, declarative automation, strong service ownership, observability, and SLIs/SLOs
Take ownership of complex platform initiatives from problem definition through design, implementation, and adoption
Partner with adjacent AV infrastructure teams to reduce developer friction and improve the reliability of the platform AV engineers depend on
Contribute to a healthy engineering culture through design reviews, code reviews, mentorship, and clear technical communication

What we offer

medical
dental
vision
Health Savings Account
Flexible Spending Accounts
retirement savings plan
sickness and accident benefits
life insurance
paid vacation & holidays
tuition assistance programs

Fulltime

New

The Senior Cloud Engineer designs, builds, and optimizes cloud‑native applicatio...

Location

United States , Wilton

Salary:

Not provided

ASML

Expiration Date

Until further notice

Requirements

5+ years supporting large-scale software or cloud environments
Expertise with Google Cloud Platform (Compute, IAM, Networking, GCS, Cloud Build, GKE, AI Platform)
Experience with Azure and hybrid cloud architectures
Strong background in Linux (RHEL/CentOS)
Networking (TCP/IP, UDP)
CI/CD pipelines and DevOps tools
Version control (Git/SVN)
Application lifecycle tools (Jira, Confluence, Bitbucket)
VM and storage management (NFS, SMB, ZFS, NAS)
VMware and enterprise hardware environments

Job Responsibility

Architect, deploy, and maintain GCP-based platforms, integrating AI/ML services, data pipelines, and automated infrastructure
Build and support cloud-native applications, including installation, patching, performance tuning, and systems hardening
Implement monitoring, observability, and logging using tools such as Splunk and native GCP services
Troubleshoot complex distributed systems using diagnostic tools and structured problem analysis
Serve as Tier 1 / Tier 2 escalation for cloud and platform issues, ensuring fast resolution and clear communication
Drive CI/CD automation, Git-based workflows, and cloud release management
Propose and implement improvements to system performance, reliability, cloud cost efficiency, and developer experience
Ensure compliance with IT standards, security policies, and service-level requirements
Support engineering use cases by provisioning environments, automating workflows, and optimizing cloud resource utilization

What we offer

Flexible workplace arrangement (up to two days a week remote)

Fulltime

Senior Cloud Engineer

We believe that there is a smarter, more data-driven way to make decisions in he...

Location

United States , Boston

Salary:

71250.00 - 143750.00 USD / Year

SOPHiA GENETICS

Expiration Date

Until further notice

Requirements

5+ years of experience as a Cloud Engineer, DevOps Engineer, SRE or similar role
Proven experience designing and implementing secure cloud infrastructure solutions
Expert Kubernetes knowledge
Strong Linux knowledge
Experience with infrastructure as code tools (e.g., Terraform, Ansible)
Strong understanding of cloud platform security features (specific to chosen platform)
Experience with observability tools (Prometheus, Grafana, ELK, Loki, etc.)
Excellent communication, collaboration, and problem-solving skills
A passion for cloud and staying current with the latest advancements
Strong knowledge of cloud computing platforms especially Azure

Job Responsibility

Design, architect, and implement secure cloud infrastructure solutions on cloud platforms (Azure in particular)
Kubernetes design, implementations and operations at scale
Mentor junior cloud engineers
Act as a subject matter expert for Microsoft Azure and Kubernetes
Help engineering teams troubleshoot and perform root cause analysis
Lead infrastructure and cloud related projects

What we offer

Outstanding Medical, Dental & Vision with 90% Employer Contribution
Company matched 401K at 4%
Company-paid short & long-term disability insurance
FSA commuter benefits
20 Days PTO, increasing to 25 with tenure
5 Days Sick and 14 Public Holidays
Free EAP

Fulltime

Senior Cloud Engineer - Crypto

As a Senior Cloud Engineer on Sokin’s Crypto team, you will architect, deploy, a...

Location

Serbia , Belgrade

Salary:

Not provided

Sokin

Expiration Date

Until further notice

Requirements

5+ years of professional cloud engineering experience, with at least 2 years in a fintech, payments, or other regulated industry
Proven track record of designing and managing production AWS infrastructure at scale
Hands-on experience with Terraform (or equivalent IaC) and CI/CD pipelines (GitHub Actions preferred)
Experience with containerisation and orchestration (Docker, Kubernetes, ECS/EKS)
Demonstrable experience building or operating blockchain node infrastructure and crypto-related cloud services
Proficiency in AWS with relevant certifications (AWS Solutions Architect Professional or equivalent strongly preferred)
Expertise in scripting and automation (Python, Bash)
Strong understanding of networking (VPC, subnets, VPN, load balancers, service mesh)
Knowledge of blockchain infrastructure — running nodes, RPC providers, on-chain data indexing, and wallet/key management systems
Understanding of stablecoin mechanics — minting/burning, settlement flows, liquidity management, and the major stablecoin protocols and issuers

Job Responsibility

Architect and deploy secure, scalable, and cost-optimised AWS infrastructure to support Sokin’s payments platform, stablecoin settlement services, and real-time transaction processing
Design high-availability, multi-region architectures that meet the latency and throughput demands of cross-border payment flows
Own the end-to-end infrastructure lifecycle — from capacity planning and provisioning through monitoring, optimisation, and decommissioning
Build and maintain blockchain node infrastructure (RPC endpoints, indexers, event listeners) required for on-chain settlement and stablecoin operations
Design and operate secure key management and custody infrastructure for blockchain wallets and transaction signing
Implement infrastructure for fiat on-ramp/off-ramp services, ensuring reliable connectivity between traditional payment rails and blockchain networks
Monitor blockchain node health, chain synchronisation, and mempool conditions to ensure transaction reliability
Develop and maintain IaC using Terraform to automate all infrastructure provisioning, configuration, and lifecycle management
Integrate cloud infrastructure with CI/CD pipelines using GitHub Actions, enabling seamless and repeatable deployments
Implement GitOps workflows and automated testing for infrastructure changes, including policy-as-code compliance checks

Fulltime

Select Country

Senior Cloud Engineer – Observability & Performance Engineering

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Senior Cloud Engineer – Observability & Performance Engineering

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Senior Cloud Engineer – AV Cloud Engineering

Senior Software Engineer - Cloud Infrastructure & Observability

Senior Software Engineer - Cloud Infrastructure & Observability

Senior Platform Engineer – AV Cloud Engineering

Senior Cloud Engineer

Senior Cloud Engineer

Senior Cloud Engineer - Crypto

Our AI answers in your language