Reliability Engineer Jobs (Remote work)

42 Job Offers

Filters

Senior Site Reliability Engineer

Join our team as a Senior Site Reliability Engineer, focusing on our self-hosted product platform. You will architect and maintain containerized systems (Kubernetes, Docker) and ensure seamless customer deployments. This remote US role offers competitive salary, equity, and comprehensive benefits...

Location

United States

Salary

200000.00 - 220000.00 USD / Year

Tines

Expiration Date

Until further notice

Site Reliability Engineering Manager

Lead a globally distributed SRE team at the Wikimedia Foundation, supporting infrastructure used by hundreds of millions. Utilize your hands-on expertise in cloud, Linux, Kubernetes, and IaC to guide critical projects and ensure reliability. This remote US role offers the chance to mentor enginee...

Location

United States of America

Salary

132439.00 - 208378.00 USD / Year

Wikimedia Foundation

Expiration Date

Until further notice

Staff Site Reliability Engineer

Join Affirm in Spain as a Staff Site Reliability Engineer. You will define technical strategy and frameworks to ensure system reliability at scale using AWS, Kubernetes, and Python/Kotlin. This senior role requires 8+ years of backend and SRE experience, focusing on incident management and distri...

Location

Spain

Salary

101000.00 - 131000.00 EUR / Year

Affirm

Expiration Date

Until further notice

Staff Site Reliability Engineer

Lead our Site Reliability Engineering vision in Poland as a Staff SRE. You will design scalable backend systems using AWS, Kubernetes, and Python/Kotlin, while driving incident management and system resilience. This role offers major benefits like full health premium coverage and flexible lifesty...

Location

Poland

Salary

358000.00 - 458000.00 PLN / Year

Affirm

Expiration Date

Until further notice

Senior Site Reliability Engineer

Join Affirm in Poland as a Senior Site Reliability Engineer. You will design and operate highly available distributed systems using AWS, Kubernetes, and Python/Kotlin. Drive reliability frameworks, lead incident management, and support a global engineering team. Enjoy premium benefits, including ...

Location

Poland

Salary

301000.00 - 401000.00 PLN / Year

Affirm

Expiration Date

Until further notice

Senior Site Reliability Engineer

Join Affirm in Spain as a Senior Site Reliability Engineer. Design and launch scalable backend systems using Python, Kotlin, AWS, and Kubernetes. Drive reliability, incident management, and tooling for honest financial products. Enjoy comprehensive benefits, including full health coverage and fle...

Location

Spain

Salary

85000.00 - 115000.00 EUR / Year

Affirm

Expiration Date

Until further notice

Customer Reliability Engineer

Join Endor Labs as a Customer Reliability Engineer, the top-tier technical expert on our Customer Success team. You'll resolve complex, high-priority escalations using deep software engineering and DevOps expertise. This US-based role offers competitive benefits, flexible PTO, and a collaborative...

Location

United States

Salary

Not provided

Endor Labs

Expiration Date

Until further notice

Principal Site Reliability Engineer

Lead the CVML Platform team as a Principal SRE, architecting a secure, cost-effective hybrid infrastructure for robotics. Integrate edge devices, on-prem, and cloud (AWS, K8s) using Terraform, Python, and Go. Optimize performance and stability while collaborating cross-functionally in the autonom...

Location

United States

Salary

166000.00 - 293000.00 USD / Year

Blue River Technology

Expiration Date

Until further notice

FedRAMP Site Reliability Engineer

Join Confluent as a FedRAMP Site Reliability Engineer in Canada. You will ensure high operational standards for our real-time data streaming platform used by federal agencies. This remote-first role requires deep Kubernetes, Terraform, and cloud-native expertise to maintain compliance and system ...

Location

Canada

Salary

144200.00 - 169400.00 CAD / Year

Confluent

Expiration Date

Until further notice

Senior Site Reliability Engineer

Join our team as a Senior Site Reliability Engineer in the United States. You will enhance system reliability through automation, CI/CD, and Azure cloud expertise. This role requires deep experience in scalable, distributed systems and observability practices. Drive incident resolution and influe...

Location

United States of America

Salary

Not provided

VantageLinks

Expiration Date

Until further notice

Site Reliability Engineer

Join our DevOps team in Limerick as a Site Reliability Engineer. You will develop and automate software applications, working with AWS, Terraform, and C#. This role requires experience with real-time production systems, IaC, and firewall technologies to ensure reliability and performance.

Location

Ireland , Limerick

Salary

Not provided

Solas IT Recruitment

Expiration Date

Until further notice

Staff Site Reliability Engineer

Lead our infrastructure reliability strategy as a Staff Site Reliability Engineer. Architect large-scale, fault-tolerant AWS systems using Terraform and ECS expertise. Drive technical initiatives, mentor engineers, and tackle complex operational challenges. This remote US role offers a discretion...

Location

United States

Salary

151040.00 - 188800.00 USD / Year

Bugcrowd

Expiration Date

Until further notice

Senior Site Reliability Engineer

Join our agile infrastructure team as a Senior Site Reliability Engineer. Design and maintain scalable AWS infrastructure using Terraform and ECS. You'll automate CI/CD, ensure system reliability, and collaborate in an international tech environment. This remote US role offers a bonus program for...

Location

United States

Salary

129280.00 - 161600.00 USD / Year

Bugcrowd

Expiration Date

Until further notice

Site Reliability Engineer

Join our team as a Site Reliability Engineer in Gurgaon. You will ensure system reliability and performance using AWS, Terraform, and CDN technologies. Collaborate with development teams to build scalable systems and robust automation. Bring your 5+ years of SRE/DevOps experience to make a direct...

Location

India , Gurgaon

Salary

Not provided

Rackspace

Expiration Date

Until further notice

Staff Site Reliability Engineer

Join AlphaSense as a Staff Site Reliability Engineer to architect core reliability platforms and drive SRE best practices globally. You'll need 8+ years of SRE/DevOps experience, cloud and Kubernetes expertise, and strong incident leadership. This high-impact, US-based role offers equity and focu...

Location

United States

Salary

150000.00 - 225000.00 USD / Year

AlphaSense

Expiration Date

Until further notice

Senior Site Reliability Engineer - Data Pipeline

Join our Data Pipeline team as a Senior Site Reliability Engineer in Slovakia. You will build and maintain a robust GCP/Kubernetes ecosystem, ensuring high observability and scalability. We seek an expert in Terraform, HELM, and DevOps culture who values infrastructure as code. Enjoy a virtual-fi...

Location

Slovakia

Salary

3500.00 EUR / Month

Bloomreach

Expiration Date

Until further notice

Site Reliability Engineer

Join Luma AI to architect the physical and digital foundation of AGI. As a Site Reliability Engineer, you will build and optimize massive-scale, multi-vendor GPU supercomputers in Palo Alto or London. Your elite HPC knowledge will design high-performance clusters, optimizing low-level networking ...

Location

United States; United Kingdom , Palo Alto; London

Salary

170000.00 - 360000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Software Engineer - Reliability

Join Luma as a Software Engineer - Reliability in Palo Alto. Architect and scale next-gen AI infrastructure across AWS and OCI. Utilize your deep Linux and system performance expertise to ensure high availability for GPU clusters. Thrive in a fast-paced role solving complex hardware/software chal...

Location

United States , Palo Alto

Salary

170000.00 - 360000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Software Engineer - Reliability GPU Infrastructure

Shape the future of creative AI as a Software Engineer for GPU Infrastructure at Luma AI. You will architect and own our massive-scale, multi-cloud and on-premise compute substrate. This role requires deep expertise in distributed systems and infrastructure as code, based in Palo Alto or London.

Location

United States; United Kingdom , Palo Alto; London

Salary

170000.00 - 360000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Principal Site Reliability Engineer (AI-first SRE)

Lead the AI-driven reliability transformation at Groupon as a Principal SRE. You will architect self-healing systems using AI/ML, GCP/AWS, and Kubernetes to ensure 99.9%+ availability. This role requires 10+ years of experience, expertise in AIOps, and offers a chance to shape scalable, predictiv...

Location

Salary

Not provided

Groupon

Expiration Date

Until further notice

1 2 3

Explore a dynamic and critical career path with Reliability Engineer jobs, a profession dedicated to ensuring systems and assets operate with maximum uptime, performance, and efficiency. Reliability Engineers are the guardians of operational integrity, applying engineering principles and data-driven analysis to prevent failures, optimize performance, and implement sustainable processes. This field broadly splits into two key domains: IT/Software Reliability and Industrial/Physical Asset Reliability, both united by the core mission of building and maintaining resilient systems. In the technology sector, often titled Site Reliability Engineer (SRE), professionals blend software engineering and systems administration to create scalable and highly reliable software platforms. Their general responsibilities include designing and automating infrastructure deployment, building robust monitoring and alerting systems, and managing incident response through on-call rotations. They focus on key service level indicators (SLIs) and objectives (SLOs) to measure and improve user experience. Typical tasks involve writing code for automation, conducting post-incident reviews (blameless postmortems), and collaborating with development teams to embed reliability into the software lifecycle from the start. Common requirements for these roles include proficiency in programming/scripting (e.g., Python, Go), expertise in cloud platforms (AWS, Azure, GCP), container orchestration (Kubernetes), and infrastructure-as-code tools, alongside a strong grasp of CI/CD pipelines and observability stacks. Conversely, in industrial settings like manufacturing, energy, or oil and gas, Reliability Engineers focus on physical assets such as rotating equipment, electrical systems, and production machinery. Their work is centered on predictive and preventive maintenance strategies. General duties involve analyzing equipment performance data, conducting Root Cause Failure Analysis (RCFA), developing maintenance procedures, and managing reliability-centered maintenance (RCM) programs. They use statistical analysis and reliability modeling to predict asset lifecycles, recommend improvements, and manage risk. Typical skills include a strong mechanical or electrical engineering foundation, knowledge of condition monitoring technologies (vibration analysis, thermography), familiarity with Computerized Maintenance Management Systems (CMMS), and expertise in process safety and lifecycle cost analysis. Across both domains, successful Reliability Engineers are systematic problem-solvers with a proactive mindset. They possess strong analytical skills to interpret complex data, excellent communication skills to collaborate across teams and justify investments, and a relentless focus on continuous improvement. Whether ensuring a global web service remains online or a refinery operates safely and efficiently, Reliability Engineer jobs are foundational to modern operational excellence. For those passionate about building systems that don't fail and optimizing performance through engineering, this profession offers a challenging and impactful career with opportunities spanning virtually every industry.