This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Reliability Engineer, your role will be a combination of supporting production applications and proactively looking for ways to automate your discoveries, eliminate incidents from recurring and/or reduce the time it takes to get our customers back up and running. In addition, you'll focus on improving the following for our applications: availability, latency, performance, efficiency, and effective proactive monitoring. The reliability engineer interfaces with business users, development teams and system administrators to ensure systems perform to meet their business needs and specifications.
Job Responsibility:
Developing, coordinating, and conducting technical reliability studies on engineering designs to assess the likelihood that a product/process performs its intended function over the intended lifecycle
Measuring and analyzing the reliability of the design, materials, processes, cost, and final products of production
Recommending design or test methods and statistical process control procedures for achieving required levels of product reliability
Completing risk analysis studies of new designs and processes
Undertaking testing and analysis on failures, proposing changes in design or formulation to improve system and/or process reliability
Supporting production applications and proactively looking for ways to automate your discoveries, eliminate incidents from recurring and/or reduce the time it takes to get our customers back up and running
Improving the following for our applications: availability, latency, performance, efficiency, and effective proactive monitoring
Interfaces with business users, development teams and system administrators to ensure systems perform to meet their business needs and specifications
Requirements:
Bachelor's degree, or equivalent work experience
Five to seven years of relevant work experience in business and risk analysis, IT Service Management, production support, product/project management, or application development
Proven experience as a Site Reliability Engineer or similar role
Strong knowledge of monitoring tools and incident management
Proficiency in Python or Powershell
Excellent problem-solving and troubleshooting skills
Strong experience with AWS or Azure services
Experience with Docker and container clustering technologies like AWS ECS or Kubernetes
Experience with monitoring and logging tools such as Data Dog, Splunk, Elasticsearch, Kibana and CloudWatch
Experience using GitLab/GitHub for version control and/or you’ve tracked work
Strong communication and collaboration abilities
Financial Services industry experience a plus
Nice to have:
Financial Services industry experience a plus
What we offer:
Healthcare (medical, dental, vision)
Basic term and optional term life insurance
Short-term and long-term disability
Pregnancy disability and parental leave
401(k) and employer-funded retirement plan
Paid vacation (from two to five weeks depending on salary grade and tenure)
Up to 11 paid holiday opportunities
Adoption assistance
Sick and Safe Leave accruals of one hour for every 30 worked, up to 80 hours per calendar year unless otherwise provided by law