CrawlJobs Logo

Reliability Engineer 3

usbank.com Logo

U.S. Bank National Association

Location Icon

Location:
United States , Irving

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

105400.00 - 124000.00 USD / Year
Save Job
Save Icon
Job offer has expired

Job Description:

As a Reliability Engineer, your role will be a combination of supporting production applications and proactively looking for ways to automate your discoveries, eliminate incidents from recurring and/or reduce the time it takes to get our customers back up and running. In addition, you'll focus on improving the following for our applications: availability, latency, performance, efficiency, and effective proactive monitoring. The reliability engineer interfaces with business users, development teams and system administrators to ensure systems perform to meet their business needs and specifications.

Job Responsibility:

  • Developing, coordinating, and conducting technical reliability studies on engineering designs to assess the likelihood that a product/process performs its intended function over the intended lifecycle
  • Measuring and analyzing the reliability of the design, materials, processes, cost, and final products of production
  • Recommending design or test methods and statistical process control procedures for achieving required levels of product reliability
  • Completing risk analysis studies of new designs and processes
  • Undertaking testing and analysis on failures, proposing changes in design or formulation to improve system and/or process reliability
  • Supporting production applications and proactively looking for ways to automate your discoveries, eliminate incidents from recurring and/or reduce the time it takes to get our customers back up and running
  • Improving the following for our applications: availability, latency, performance, efficiency, and effective proactive monitoring
  • Interfaces with business users, development teams and system administrators to ensure systems perform to meet their business needs and specifications

Requirements:

  • Bachelor's degree, or equivalent work experience
  • Five to seven years of relevant work experience in business and risk analysis, IT Service Management, production support, product/project management, or application development
  • Proven experience as a Site Reliability Engineer or similar role
  • Strong knowledge of monitoring tools and incident management
  • Proficiency in Python or Powershell
  • Excellent problem-solving and troubleshooting skills
  • Strong experience with AWS or Azure services
  • Experience with Docker and container clustering technologies like AWS ECS or Kubernetes
  • Experience with monitoring and logging tools such as Data Dog, Splunk, Elasticsearch, Kibana and CloudWatch
  • Experience using GitLab/GitHub for version control and/or you’ve tracked work
  • Strong communication and collaboration abilities
  • Financial Services industry experience a plus

Nice to have:

Financial Services industry experience a plus

What we offer:
  • Healthcare (medical, dental, vision)
  • Basic term and optional term life insurance
  • Short-term and long-term disability
  • Pregnancy disability and parental leave
  • 401(k) and employer-funded retirement plan
  • Paid vacation (from two to five weeks depending on salary grade and tenure)
  • Up to 11 paid holiday opportunities
  • Adoption assistance
  • Sick and Safe Leave accruals of one hour for every 30 worked, up to 80 hours per calendar year unless otherwise provided by law

Additional Information:

Job Posted:
February 20, 2026

Expiration:
February 25, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Reliability Engineer 3

Senior Electrical Reliability Engineer

Champion efforts that maintain and continuously improve the reliability of the m...
Location
Location
United States , Ashdown
Salary
Salary:
Not provided
domtar.com Logo
Domtar
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Electrical Engineering
  • Minimum of 3 years of applicable experience in electrical reliability, distribution systems, or related field
  • Strong commitment to safety and safe work practices
  • Proficient computer skills and familiarity with reliability tracking systems
Job Responsibility
Job Responsibility
  • Lead Root Cause Problem Elimination (RCPE) efforts for downtime and slowback events
  • Assist in capital planning for the Electrical Distribution system
  • Support turbine generator repairs, upgrades, and overhauls
  • Serve as a technical resource for operators and maintenance personnel
  • Track and report Key Performance Indicators (KPIs) related to electrical reliability, providing monthly reports
  • Lead and maintain Electrical Reliability Programs
  • Provide support for mill-wide projects and ISO compliance requirements
What we offer
What we offer
  • Competitive compensation
  • Supportive working environment
  • Rewarding career paths
  • Plenty of opportunities for learning and growth
  • Fulltime
Read More
Arrow Right

Reliability Engineer

With our specialist asset reliability services, we help prevent machinery failur...
Location
Location
United Kingdom , Derby; Leicester; Nottingham
Salary
Salary:
37000.00 GBP / Year
besgroup.com Logo
BES Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ideally a background in engineering within the marine, rail, or manufacturing environment
  • Level 1 Vibration Analysis or working towards this is essential
  • Experience or knowledge of Vibration Analysis
  • Experience or knowledge of condition monitoring hardware and software
  • A Mechanical Engineering qualification (level 3 upwards) is highly regarded
  • Flexibility to work away and travel as per business and customer requirements
  • Full UK driving license
Job Responsibility
Job Responsibility
  • Carry out condition based maintenance (CBM) techniques utilising vibration analysis, ultrasound, thermography and oil analysis
  • Perform data collection – the analysis of equipment performance, failure data and corrective maintenance history
  • Assess and report on machine performance and recommend improvements
  • Spec, set up and installation of online and wireless systems or remote sensors
  • Always provide the exceptional level of customer service expected from our team, whilst representing our brilliant company professionally
What we offer
What we offer
  • £5000 annual car cash allowance
  • Company Pension Scheme
  • Annual salary review
  • 25 days annual leave plus 8 bank holidays
  • An extra day’s holiday to take on Christmas Eve each year
  • Access to our buy and sell holiday scheme
  • Opportunity for flexible working
  • Electric Vehicle salary sacrifice scheme
  • Discounts and savings via our employee benefits portal
  • Health and wellbeing support via our Employee Assistance Programme
  • Fulltime
Read More
Arrow Right

Database Reliability Engineer

The Database Reliability Engineer (DBRE) is responsible for managing, building, ...
Location
Location
United States
Salary
Salary:
120000.00 - 179000.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience working with relational database systems
  • Strong hands-on experience with MySQL (administration, performance tuning, replication, HA/DR)
  • 1+ years in a DBRE or database-focused engineering role
  • Experience working in cloud environments (AWS, GCP, or Azure — Azure preferred)
  • Coding and automation experience (Python, PowerShell, SQL, etc.)
  • Experience with Infrastructure-as-Code tools such as Ansible and Terraform
  • Experience working with source control systems such as Git
  • MySQL experience preferred
  • PostgreSQL is a plus
  • Experience working with VLDBs (1+ TB) and managing large database fleets (100+ instances)
Job Responsibility
Job Responsibility
  • Managing, building, maintaining, monitoring, and troubleshooting the cloud-based MySQL database infrastructure that our mission-critical SaaS application depends on
  • Focuses heavily on automation and coding to reduce operational toil
  • Collaborate closely with Engineering and SRE teams to support new product development and ensure reliable database integration across the platform
  • Work on observability of MySQL database metrics and ensure database performance and reliability objectives are consistently met
  • Work with the DBA team to identify areas of operational toil and implement automations/processes to manage PCC’s MySQL database systems at scale
  • Apply a data-driven approach to performance tuning, availability improvements, and operational optimization
  • Provide database support to Engineering and SRE teams, including review of database migrations, query performance, schema/design improvements, and standardizing MySQL configuration and deployment patterns
  • Assist the DBA team with performance troubleshooting and root-cause analysis
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Reliability Engineer Intern

Amazon develops innovative consumer-centric product solutions. As a reliability ...
Location
Location
China , Shenzhen
Salary
Salary:
Not provided
amazon.de Logo
Amazon Pforzheim GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's degree or above in mechanical engineering, electrical engineering, material science, physics or equivalent
  • Speak, write, and read fluently in Mandarin
  • Can work 5 days per week during summer holiday for at least 3 months duration
  • Is willing to work in Shenzhen
Job Responsibility
Job Responsibility
  • Participate in creating reliability test plans including resource allocations, validation schedule assumptions, and validation items scope.
  • Participate in implementing specific validation items in reliability test plans with schedule.
  • Work closely with senior reliability engineer in reporting reliability execution progress and issues.
  • Participate in evaluating and developing reliability test methodologies to reduce test time and increase test coverage.
  • Travel domestically to supplier sites as projects require.
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We’re looking for a passionate Site Reliability Engineer to pioneer our SRE stra...
Location
Location
Spain , Barcelona
Salary
Salary:
45000.00 - 59000.00 EUR / Year
edpuzzle.com Logo
Edpuzzle
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 3 years of experience in Site Reliability Engineering, DevOps Engineering, System Administration or Cloud Infrastructure Engineering for a web-based product with a focus on observability and reliability
  • Good knowledge of Amazon Web Services (AWS), CloudWatch and Datadog
  • Experience with software release management and deployment pipelines (Git, CI/CD)
  • Experience with Infrastructure as Code using AWS CDK
  • Experience writing JavaScript, TypeScript or Node.js code
  • Pragmatic with technologies: you understand tech is a tool to solve a product problem, tech is never the end goal
  • Excellent ability to communicate your ideas, regardless of the audience
  • Product-oriented: You make all your technology decisions with the final user in mind
  • You are naturally drawn towards understanding the bigger picture and recognize when there's a need for improvement, applying your intentional and rational thought process to address complex issues
  • You are able to work independently, plan and exercise conscious control of time spent on specific goals to reach deadlines effectively, and you don’t hesitate to pursue a goal despite the difficulties, all while maintaining a flexible mindset
Job Responsibility
Job Responsibility
  • Work with the Product, Infrastructure and Engineering teams to find the best technical solutions by participating in discussions and sharing your opinions
  • Take ownership of the problems that are being worked on, understanding why they are needed by the users, carrying out your own research, making your own proposals and working on the implementation while relying on your teammates for help when needed
  • Communicate effectively in a team in order to maximize productivity, ownership, and focus to help projects reach the finish line with the best possible outcome and by the project deadline
  • Design a cloud infrastructure that is secure, scalable, and highly available on AWS
  • Engage in proactive monitoring and observability with comprehensive tools and practices that not only detect and warn, but also predict potential system issues before they affect our users
  • Lead the charge in root cause analysis for production and infrastructure issues, transforming challenges into learning opportunities
  • Provision, configure and maintain cloud infrastructure as code
  • Perform rotatory on-call service, ensuring reliability and uptime for our users
  • Write technical documentation, contributing to our technical knowledge base and empowering your peers
  • Perform other exciting duties as opportunities and needs arise.
What we offer
What we offer
  • On-call compensation
  • 24 days’ paid holidays plus December 24th and 31st
  • Flexible working hours and reduced working time on Fridays to support work-life balance
  • €2000 annual allowance for meals with Cobee
  • Private health insurance policy with AXA
  • Access to Wellhub to support physical and emotional well-being
  • Flexible remuneration for childcare
  • Flexible remuneration for public transport
  • Flexible remuneration for health insurance of immediate family members (spouse and/or children)
  • Training and development (CodelyTV, Cloud Academy, etc.)
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer 2

Join us. At PagerDuty, you'll tackle complex problems, collaborate with kind and...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Experience with Kubernetes and container orchestration
  • Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
Job Responsibility
Job Responsibility
  • Deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
  • Help maintain the overall health of the platform, including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
  • Continuously strive to improve the internal developer experience and the software development lifecycle
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Participate in a 24/7 on-call rotation
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

The Wikimedia Foundation is looking for an Engineering Manager to join our SRE t...
Location
Location
United States of America
Salary
Salary:
132439.00 - 208378.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Prior experience managing teams
  • Prior hands-on experience with software or reliability engineering (within the last 3 years preferred)
  • Ability to analyze complex systems, troubleshoot issues, and devise effective solutions under pressure
  • Proficiency in project management methodologies to effectively plan, execute, and track new and existing initiatives
  • Strong understanding of cloud computing, networking, Linux systems administration, containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible) to be able to provide technical support to the team
  • Aptitude for automation and streamlining of tasks
  • Communicate effectively in both spoken and written English
  • Ability to work independently, as an effective part of a globally distributed team
  • Ability to travel several times a year for occasional in-person meetings
  • B.S. or M.S. in Computer Science or the equivalent in related work experience
Job Responsibility
Job Responsibility
  • Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization
  • Providing guidance, mentorship, and support to ensure the team's effectiveness and growth
  • Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path
  • Recruiting, hiring, and helping onboard new team members
  • Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members
  • Coordinating and communicating with other members of the Wikimedia product & engineering teams on relevant projects, executing complex projects and contributing to the organizational strategy
  • Continuously developing the roadmap of the team in alignment with other SRE and Product & Technology teams, and helping to draft and execute the team’s annual and quarterly plans
  • Project managing new and existing initiatives
  • Leading the definition, refinement, and execution of the processes through which the team manages and performs work
  • Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure
  • Fulltime
Read More
Arrow Right