CrawlJobs Logo

Customer Reliability Engineer

https://www.endorlabs.com Logo

Endor Labs

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As a Customer Reliability Engineer at Endor Labs on our Customer Success team, you will serve as the highest-level technical support resource, handling complex, high-priority issues that require deep product and systems expertise.

Job Responsibility:

  • Own technical escalations from Customer Success Engineers, Solution Architects and Implementation Engineers ensuring swift reproduction and resolution of critical issues
  • Collaborate with Engineering and Product teams to triage and resolve bugs or architectural issues
  • Provide insight and build closely with our engineering teams, translating customer feedback and troubleshooting insights into tangible product improvements
  • Act promptly when technical issues emerge, applying your advanced troubleshooting skills and understanding of programming and DevOps practices to ensure our customers are successful
  • Conduct deep diagnostics, including logs, APIs, and infrastructure troubleshooting
  • Serve as a bridge between the customer and R&D for complex or systemic issues
  • Document and share solutions for long-term knowledge management and root cause prevention

Requirements:

  • Strong background in software engineering, with 4 -10 years of deep understanding of programming languages, application security, and DevOps practices
  • Demonstrated experience in developing custom technical solutions and actively engaging in customer-facing roles, with a proven ability to handle project-based work effectively
  • A passionate advocate for customer success, with a focus on building secure, scalable solutions from the ground up
  • Exceptional communication skills, capable of breaking down complex technical topics into clear, understandable terms for a variety of audiences
  • Proactive and anticipatory approach to problem-solving, with the ability to foresee customer needs and craft strategic solutions that align with their overarching goals
What we offer:
  • Competitive salary and comprehensive benefits package including Health, Dental, Vision and Mental Health plans
  • 401(k) plan to support your longterm financial goals
  • Flexible PTO to maintain a healthy work-life balance
  • Opportunities for co-working and team meetups to foster collaboration
  • A dog-friendly office environment for those who love to bring their fur babies along

Additional Information:

Job Posted:
December 27, 2025

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Customer Reliability Engineer

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Netherlands
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Germany
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Database Reliability Engineer - Core Team

We are committed to providing our customers with reliable and secure services at...
Location
Location
United Kingdom
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Tupelo, Mississippi
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited) or equivalent experience (ex. heavy industrial maintenance, reliability, or operations experience)
  • Minimum of one year of reliability experience
  • Demonstrates ability to use reliability tool sets
  • Experience in Performance of RCA
  • Involvement with RCM & FMEA
  • Master Level Proficiency in Predictive Technology
  • Vibration I Certification
  • Machine Health Monitoring Intermediate Proficiency
  • Experience with Work Execution Management
  • Technical understanding of electrical or mechanical components, tools, and designs
Job Responsibility
Job Responsibility
  • Promotes and adheres to the ATS safety culture
  • Ensures compliance with regulatory requirements and ATS policies and procedures
  • Partners with internal/external customer for engineered solutions to improve reliability and throughput
  • Identifies opportunities for Capital Expenditures for equipment replacement (develops and communicates ROI)
  • Highly knowledgeable in operating systems, critical elements, and best practices to enable a precision reliability culture
  • Knowledgeable application of common precision tools and practices
  • Partners with peers to perform reliability centered maintenance and deliverables (equipment specific maintenance plan -ESMP)
  • Actively collaborates with maintenance team on the use of predictive, preventative, and precision maintenance technologies and strategies designed to identify or control risks prior to failure and ensure optimum maintenance execution
  • Partners with peers to perform failure mode & effects analysis
  • Understands Work Execution Management (WEM) & improvements identified through reliability strategy session performance
  • Fulltime
Read More
Arrow Right

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Hammond
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Hammond, Indiana
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

About LogRocket: Founded in 2016, LogRocket's goal is to make every experience o...
Location
Location
United States , Boston
Salary
Salary:
135000.00 - 220000.00 USD / Year
logrocket.com Logo
LogRocket
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 4 years of experience as a Site Reliability Engineer, or related job
  • Ability to read and understand product code
  • Familiarity with the state of the art in cloud technologies, including common providers, specific tools of the trade, and their strengths and weaknesses
  • Experience operating applications and databases with demanding scalability or availability requirements
  • Proven expertise in modern container orchestration practices
  • A strong understanding of the performance, architecture, tooling, and cost of cloud systems
  • A security focused mindset with a solid understanding of incident response and risk mitigation
  • A strong collaborator who is transparent about progress on tasks, seeks feedback early and often, works effectively with the team and customers
Job Responsibility
Job Responsibility
  • Improve quality of pager alerts while reducing noise
  • Maintain awareness of engineering initiatives across the organization and monitor their impact on stability, cost, and performance
  • Keep infrastructure up-to-date to take advantage of security patches and new features
  • Improve operational security without sacrificing engineering independence
What we offer
What we offer
  • Catered lunch and an impressive array of your favorite snacks
  • Unlimited vacation policy
  • Health, Dental, Vision benefits, 401k, commuter benefits
  • Generous stock options
  • Regular team outings and activities
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
  • Applicants must be authorized to work for any employer in the U.S.
  • Unable to sponsor or take over sponsorship of an employment visa at this time.
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.
What we offer
What we offer
  • Support for Parents
  • Continuing Education/Professional Development
  • Employee Health & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • Medical and dental coverage starting day one
  • Insurance coverage for basic life, accident, short-term and long-term disability
  • Business travel accident insurance
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Fulltime
Read More
Arrow Right