CrawlJobs Logo

Customer Reliability Engineer

https://www.endorlabs.com Logo

Endor Labs

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As a Customer Reliability Engineer at Endor Labs on our Customer Success team, you will serve as the highest-level technical support resource, handling complex, high-priority issues that require deep product and systems expertise.

Job Responsibility:

  • Own technical escalations from Customer Success Engineers, Solution Architects and Implementation Engineers ensuring swift reproduction and resolution of critical issues
  • Collaborate with Engineering and Product teams to triage and resolve bugs or architectural issues
  • Provide insight and build closely with our engineering teams, translating customer feedback and troubleshooting insights into tangible product improvements
  • Act promptly when technical issues emerge, applying your advanced troubleshooting skills and understanding of programming and DevOps practices to ensure our customers are successful
  • Conduct deep diagnostics, including logs, APIs, and infrastructure troubleshooting
  • Serve as a bridge between the customer and R&D for complex or systemic issues
  • Document and share solutions for long-term knowledge management and root cause prevention

Requirements:

  • Strong background in software engineering, with 4 -10 years of deep understanding of programming languages, application security, and DevOps practices
  • Demonstrated experience in developing custom technical solutions and actively engaging in customer-facing roles, with a proven ability to handle project-based work effectively
  • A passionate advocate for customer success, with a focus on building secure, scalable solutions from the ground up
  • Exceptional communication skills, capable of breaking down complex technical topics into clear, understandable terms for a variety of audiences
  • Proactive and anticipatory approach to problem-solving, with the ability to foresee customer needs and craft strategic solutions that align with their overarching goals
What we offer:
  • Competitive salary and comprehensive benefits package including Health, Dental, Vision and Mental Health plans
  • 401(k) plan to support your longterm financial goals
  • Flexible PTO to maintain a healthy work-life balance
  • Opportunities for co-working and team meetups to foster collaboration
  • A dog-friendly office environment for those who love to bring their fur babies along

Additional Information:

Job Posted:
December 27, 2025

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Customer Reliability Engineer

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Netherlands
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Germany
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Database Reliability Engineer - Core Team

We are committed to providing our customers with reliable and secure services at...
Location
Location
United Kingdom
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right
New

Reliability & Maintainability Engineering Manager

At Boeing, we innovate and collaborate to make the world a better place. We’re c...
Location
Location
United States , Everett; Renton
Salary
Salary:
147050.00 - 198950.00 USD / Year
boeing.com Logo
Boeing
Expiration Date
January 16, 2026
Flip Icon
Requirements
Requirements
  • Bachelor of Science degree from an accredited course of study in engineering, engineering technology (includes manufacturing engineering technology), chemistry, physics, mathematics, data science, or computer science
  • 5+ years of experience leading engineering teams in R&M or related functional areas
  • Knowledge of the basic Principles, Processes and Lifecycle of Systems Engineering
  • Understanding concept of Technical Performance Measures (customer centric view of a product performance)
  • Knowledge of basic definitions of Reliability, Maintainability, Durability, and Availability
  • General knowledge of probability & statistics and the basis of such in Reliability & Safety analysis
  • Knowledge of System Modeling methods and relation to R&M modeling & analysis (Model Based Engineering)
  • High level knowledge of Airplane Systems and Structures of commercial or military airplanes
  • Demonstrated ability to work in a multi-discipline engineering environment
Job Responsibility
Job Responsibility
  • Develops project plans aligned to an Airplane Development Program and R&M strategy and objectives
  • Implements plans to ensure business, technical and customer requirements are achieved
  • Develops and monitors appropriate metrics to ensure performance to plan
  • Provides technical direction and guidance to the team regarding processes, tools, technology and deliverables
  • Ensures team products and processes meet customer, company, and regulatory requirements for quality and safety
  • Coaches, counsels, mentors and provides developmental opportunities to improve employee satisfaction and retain a skilled and motivated team
  • Forecasts and negotiates with internal customers and other R&M managers resource needs and recruit personnel if needed
  • Collaborates with other SEIT managers and team members
  • Establishes partnerships and good working relationships with internal customers, stakeholders, peers and direct report
What we offer
What we offer
  • Generous company match to your 401(k)
  • Industry-leading tuition assistance program pays your institution directly
  • Fertility, adoption, and surrogacy benefits
  • Up to $10,000 gift match when you support your favorite nonprofit organizations
  • Relocation based on candidate eligibility
  • Opportunity to enroll in a variety of benefit programs, generally including health insurance, flexible spending accounts, health savings accounts, retirement savings plans, life and disability insurance programs, and a number of programs that provide for both paid and unpaid time away from work
  • Fulltime
Read More
Arrow Right
New

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Hammond, Indiana
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right
New

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Hammond
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

About LogRocket: Founded in 2016, LogRocket's goal is to make every experience o...
Location
Location
United States , Boston
Salary
Salary:
135000.00 - 220000.00 USD / Year
logrocket.com Logo
LogRocket
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 4 years of experience as a Site Reliability Engineer, or related job
  • Ability to read and understand product code
  • Familiarity with the state of the art in cloud technologies, including common providers, specific tools of the trade, and their strengths and weaknesses
  • Experience operating applications and databases with demanding scalability or availability requirements
  • Proven expertise in modern container orchestration practices
  • A strong understanding of the performance, architecture, tooling, and cost of cloud systems
  • A security focused mindset with a solid understanding of incident response and risk mitigation
  • A strong collaborator who is transparent about progress on tasks, seeks feedback early and often, works effectively with the team and customers
Job Responsibility
Job Responsibility
  • Improve quality of pager alerts while reducing noise
  • Maintain awareness of engineering initiatives across the organization and monitor their impact on stability, cost, and performance
  • Keep infrastructure up-to-date to take advantage of security patches and new features
  • Improve operational security without sacrificing engineering independence
What we offer
What we offer
  • Catered lunch and an impressive array of your favorite snacks
  • Unlimited vacation policy
  • Health, Dental, Vision benefits, 401k, commuter benefits
  • Generous stock options
  • Regular team outings and activities
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
  • Applicants must be authorized to work for any employer in the U.S.
  • Unable to sponsor or take over sponsorship of an employment visa at this time.
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.
What we offer
What we offer
  • Support for Parents
  • Continuing Education/Professional Development
  • Employee Health & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • Medical and dental coverage starting day one
  • Insurance coverage for basic life, accident, short-term and long-term disability
  • Business travel accident insurance
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.