CrawlJobs Logo

Senior Production Engineer, Cloud Infrastructure

crusoe.ai Logo

Crusoe

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

172000.00 - 209000.00 USD / Year

Job Description:

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure. We’re looking for an experienced Production Engineer to design, build, and operate the cloud infrastructure that powers our AI-first compute environment. In this role, you will be responsible for the reliability, scalability, and operational excellence of our production systems. You’ll build automation, guardrails, and resilient platforms that enable engineering teams to move quickly while maintaining strong security, performance, and uptime standards.

Job Responsibility:

  • Design, build, and manage core cloud infrastructure across compute, networking, storage, and IAM
  • Architect, operate and scale Kubernetes-based platforms
  • Deploy and manage Kubernetes workloads using Helm charts and continuous deployment systems
  • Help operate the observability platforms for cloud and Kubernetes workloads using tools such as VictoriaMetrics and Grafana
  • Develop and maintain Terraform modules to define automated, auditable, and secure cloud environments
  • Own VPC design, routing, load balancers, interconnects, peering, and network security boundaries
  • Implement policies and guardrails across IAM, resource hierarchy, service accounts, and VPC-SC
  • Build automation for provisioning, lifecycle management, and blue/green or canary deploy patterns
  • Partner closely with security and platform teams on monitoring, logging, compliance, and operational readiness
  • Optimize cloud costs, quotas, and capacity planning across multiple projects and regions
  • Troubleshoot complex production issues across compute, storage, and networking layers

Requirements:

  • 5–8+ years operating large-scale production workloads on major cloud providers such as GCP or AWS
  • Deep knowledge of GCE, Kubernetes in general and GKE in particular, VPC networking, load balancers, firewall rules, interconnect, and GCS
  • Strong experience managing Kubernetes workloads and authoring Helm Charts
  • Strong Terraform experience and a track record of building automated multi-environment infrastructure
  • Hands-on experience with Kubernetes internals, workload orchestration, scaling, and observability
  • Ability to debug complex distributed systems across compute, storage, and network boundaries
  • Strong cloud security fundamentals, including least privilege, secrets management, and policy enforcement
  • Proficiency with Python, Go, or Shell for automation and tooling
  • Experience influencing design decisions and partnering with cross-functional teams

Nice to have:

  • Experience supporting high-performance or AI/ML workloads in GCP
  • Familiarity with service mesh, or multi-cluster Kubernetes operations
  • Background in hybrid or multi-cloud infrastructure
  • Strong SRE fundamentals including SLOs, incident response, and postmortems
  • Experience with Spanner, BigQuery, Bigtable, or large-scale data platforms
What we offer:
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit
  • $300 per month

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Production Engineer, Cloud Infrastructure

Senior Cloud Infrastructure Engineer

HPE Aruba Networking is a leading provider of next-generation networking solutio...
Location
Location
United States , San Jose
Salary
Salary:
133500.00 - 307000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum expected industry experience is around 6 years
  • Minimum education at BS or MS level in Computer Science or related fields
  • Proven record of developing and releasing cloud applications in the production environment
  • Experience with DevOps and Cloud Infrastructure Deployment and Automation in Python, Terraform, Ansibles, GitOps, GitLabs, and Jenkins/Spinnaker
  • Experience in RDBMS (Postgres), GraphQL, and NoSQL (Cassandra, OpenSearch, Clickhouse, and etc.)
  • Experience in cloud stacks such as Redis, Kafka, RabbitMQ, Hazelcast
  • Experience in development in Kubernetes and Docker containers
  • Programming language experience with Shell Scripts, Python, Golang, or Java
  • Ability to deploy various techniques to ‘scale’ an application in a cloud environment
  • Demonstrated abilities to work with QA and Remote Teams
Job Responsibility
Job Responsibility
  • Participate in architecture and design discussions
  • Develop scalable applications that run on top of Next Generation Central
  • Contribute to multiple technical programs simultaneously
What we offer
What we offer
  • Health benefits
  • Comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • Personal and professional development programs
  • Inclusion and diversity initiatives
  • Exciting and fun work culture
  • Innovation and growth opportunities
  • Fulltime
Read More
Arrow Right

Senior Director of Engineering, Infrastructure

Senior Director of Engineering role leading the Infrastructure group at PagerDut...
Location
Location
United States , San Francisco
Salary
Salary:
233000.00 - 392000.00 USD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in senior engineering leadership roles, managing multiple layers of managers
  • Significant experience as a hands-on technical contributor earlier in your career
  • Deep knowledge of modern infrastructure and software delivery: high availability, distributed systems, public cloud (AWS), microservices, containers, CI/CD pipelines, observability, and automation
  • Track record of building and scaling high-performing, inclusive engineering organizations
Job Responsibility
Job Responsibility
  • Define and drive the multi-year strategy for PagerDuty's infrastructure and platform foundations
  • Strong ownership of PagerDuty's reliability patterns and practices
  • Bar raiser for all engineering functions
  • Lead, mentor, and scale a diverse team of Engineering Managers, Senior Managers, and technical leaders across multiple geographies
  • Ensure the reliability, scalability, and security of PagerDuty's global SaaS platform
  • Partner with peers in Engineering, Product, and Security to deliver large cross-functional initiatives
  • Champion engineering excellence: CI/CD maturity, observability best practices, operational rigor, and incident readiness
  • Manage budgets, headcount, and vendor relationships to optimize infrastructure investments
  • Represent Infrastructure externally with customers and partners, and internally with executives, as a trusted voice on technical and business tradeoffs
  • Foster a culture of inclusion, accountability, collaboration, and growth
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Senior Director of Engineering, Infrastructure

Senior Director of Engineering to lead the Infrastructure group at PagerDuty, se...
Location
Location
United States , Atlanta
Salary
Salary:
233000.00 - 392000.00 USD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in senior engineering leadership roles, managing multiple layers of managers
  • Significant experience as a hands-on technical contributor earlier in your career
  • Deep knowledge of modern infrastructure and software delivery: high availability, distributed systems, public cloud (AWS), microservices, containers, CI/CD pipelines, observability, and automation
  • Track record of building and scaling high-performing, inclusive engineering organizations
Job Responsibility
Job Responsibility
  • Define and drive the multi-year strategy for PagerDuty's infrastructure and platform foundations
  • Strong ownership of PagerDuty's reliability patterns and practices
  • Lead, mentor, and scale a diverse team of Engineering Managers, Senior Managers, and technical leaders across multiple geographies
  • Ensure the reliability, scalability, and security of PagerDuty's global SaaS platform
  • Partner with peers in Engineering, Product, and Security to deliver large cross-functional initiatives
  • Champion engineering excellence: CI/CD maturity, observability best practices, operational rigor, and incident readiness
  • Manage budgets, headcount, and vendor relationships to optimize infrastructure investments
  • Represent Infrastructure externally with customers and partners, and internally with executives
  • Foster a culture of inclusion, accountability, collaboration, and growth
What we offer
What we offer
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Paid volunteer time off: 20 hours per year
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Cloud Platform

As a Senior Software Engineer, Cloud Platform at Chef Robotics, you'll be respon...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 5+ years of professional experience in cloud infrastructure and DevOps roles
  • Expert knowledge of cloud infrastructure and deployment (AWS, GCP, or Azure)
  • Strong proficiency with containerization (Docker) and orchestration (Kubernetes) technologies
  • Extensive experience with CI/CD practices and infrastructure-as-code principles
  • Experience with system monitoring, logging, and performance optimization
  • Understanding of secure data pipeline design and implementation
  • Understanding of infrastructure requirements for robotics or automation systems
  • Experience with real-time or near-real-time systems and cloud architecture
  • Background in developing reliable systems with high availability requirements
Job Responsibility
Job Responsibility
  • Design and implement cloud infrastructure to support robotics platform deployment and operations
  • Provision robots for seamless deployment across diverse customer environments
  • Enable remote software updates to enhance performance and reliability of deployed systems
  • Implement containerization (Docker) and orchestration (Kubernetes) for scalable deployments
  • Manage cloud infrastructure across AWS, GCP, or Azure platforms
  • Improve the performance and reliability of cloud services supporting the Chef system
  • Implement fault-tolerant design patterns to ensure reliability in production environments
  • Establish performance benchmarks and optimize systems to meet latency requirements for robotics operations
  • Implement comprehensive logging, monitoring, and alerting for cloud infrastructure
  • Create diagnostic tools and dashboards for operational visibility
What we offer
What we offer
  • medical insurance
  • dental insurance
  • vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right

Senior Cloud Engineer - Product Metrics

The Product Metrics team owns the collection, storage, and serving of metrics co...
Location
Location
United States
Salary
Salary:
141000.00 - 208000.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • 2+ years of software application development experience using Golang
  • Experience with at least one of the major Cloud Service Providers such as AWS, GCP or Azure
  • Experience with storing, shipping, and retrieving large volumes of data efficiently using technologies such as ClickHouse
  • Experience with technologies such as Kubernetes, Helm, ArgoCD, Temporal as well as infrastructure-as-code tools such as Terraform
Job Responsibility
Job Responsibility
  • Take an active part in determining the roadmap for the Product Metrics team
  • Work closely within the team to deliver new features, iterate and improve them
  • Design, build, operate, and maintain business-critical petabyte-scale systems
  • Be responsible for the performance, reliability, availability and cost-efficiency of the Product Metrics systems
  • Mentor and support other team members, participate in design discussions and collaborate with the team
  • Be a part of on-call rotation and take ownership of the services you're running
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Cloud Engineer - Product Metrics

The Product Metrics team owns the collection, storage, and serving of metrics co...
Location
Location
Canada
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • 2+ years of software application development experience using Golang
  • Experience with at least one of the major Cloud Service Providers such as AWS, GCP or Azure
  • Experience with storing, shipping, and retrieving large volumes of data efficiently using technologies such as ClickHouse
  • Experience with technologies such as Kubernetes, Helm, ArgoCD, Temporal as well as infrastructure-as-code tools such as Terraform
Job Responsibility
Job Responsibility
  • Take an active part in determining the roadmap for the Product Metrics team
  • Work closely within the team to deliver new features, iterate and improve them
  • Design, build, operate, and maintain business-critical petabyte-scale systems
  • Be responsible for the performance, reliability, availability and cost-efficiency of the Product Metrics systems
  • Mentor and support other team members, participate in design discussions and collaborate with the team
  • Be a part of on-call rotation and take ownership of the services you're running
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure

The Cloud Infrastructure Engineering team builds and manages the foundational bl...
Location
Location
Australia
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • Software development experience in Go, C/C++, Java, or another OOP language
  • Experience with cloud technologies such as AWS, Azure, or GCP, including infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
  • Experience developing cloud infrastructure services, preferably with Kubernetes
  • Experience developing cloud native edge or service mesh services, preferably with envoy and Istio
  • Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers
  • Understanding of network topologies, protocols, and security principles, such as VPNs, firewalls, and load balancers
  • Knowledge of cloud security best practices, including encryption, access controls, and compliance standards like SOC2 and GDPR
  • You have excellent communication skills and the ability to work well within a global team
  • You are a strong problem-solver and have solid production debugging skills
Job Responsibility
Job Responsibility
  • Architect and build a robust, scalable, and highly available distributed infrastructure
  • Build a cutting-edge cloud-native platform on top of the public cloud, and automate our cloud resource management
  • Work closely with our ClickHouse core database development team, and security team and partner with them to produce the SAS offering
  • Work on routing and traffic components to improve the reliability and scalability of our cloud service
  • Systematically improve availability by applying industry and distributed systems best practices
  • Design and build security components & tooling: firewall, PKI and certificate infra, zero trust network, etc.
  • Improve performance and cost efficiency of our infrastructure
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure

About ClickHouse: Recognized on the 2025 Forbes Cloud 100 list, ClickHouse is on...
Location
Location
Singapore
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • Software development experience in Go, C/C++, Java, or another OOP language
  • Experience with cloud technologies such as AWS, Azure, or GCP, including infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
  • Experience developing cloud infrastructure services, preferably with Kubernetes
  • Experience developing cloud native edge or service mesh services, preferably with envoy and Istio
  • Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers
  • Understanding of network topologies, protocols, and security principles, such as VPNs, firewalls, and load balancers
  • Knowledge of cloud security best practices, including encryption, access controls, and compliance standards like SOC2 and GDPR
  • You have excellent communication skills and the ability to work well within a global team
  • You are a strong problem-solver and have solid production debugging skills
Job Responsibility
Job Responsibility
  • Architect and build a robust, scalable, and highly available distributed infrastructure
  • Build a cutting-edge cloud-native platform on top of the public cloud, and automate our cloud resource management
  • Work closely with our ClickHouse core database development team, and security team and partner with them to produce the SAS offering
  • Work on routing and traffic components to improve the reliability and scalability of our cloud service
  • Systematically improve availability by applying industry and distributed systems best practices
  • Design and build security components & tooling: firewall, PKI and certificate infra, zero trust network, etc.
  • Improve performance and cost efficiency of our infrastructure
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right