CrawlJobs Logo

Lead Service Reliability Engineer

Singapore, Singapore · Job Posted January 12, 2026
Apply Position
Job Link Share

Job Description

As Service Reliability Engineer (SRE) in DAMO service line, you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives.

Job Responsibility

  • You will be responsible for understanding requirements or SRE goals in depth from both tech and business perspectives
  • You will provide solutions to improve reliability, including identifying and implementing mechanisms and architectures that enable fault tolerance and faster median time to respond and median time to detect
  • You will be responsible for enhancing the incident management process, including the development of an incident prioritization matrix, triage, communication, mitigation, post-mortem analysis and implementation of corrective actions
  • You will manage client stakeholder expectations and queries during production incidents, providing detailed technical analysis of issues and remediation plans for mitigation and prevention in future, and act as the interface for C-level executives, if or when needed
  • You will be a liaison with client engineering teams, build trust and productive relationships with senior client stakeholders and team leads to influence them in making better decisions
  • You will be responsible for identifying opportunities for enhancing system performance and reliability in alignment with business SLAs, SLOs, KPIs and objectives, and provide guidance and assistance to SRE teams in implementing the identified improvements
  • As an SRE expert, you will collaborate with Thoughtworks application development leads and solution architects, recommending changes in system design and adopting best practices for improved reliability from day one
  • You will oversee and mentor other SREs on the team, contributing to their growth and development

Requirements

  • You can program with one or more high-level languages such as Python, Golang, Shell scripting, Ruby or Java
  • You are familiar with DevOps and GitOps practices, driving the integration of observability automation into CI/CD pipelines, e.g.: GitLab, Jenkins, CircleCI or equivalent
  • You have in-depth knowledge of configuration management and Infrastructure as Code (IAC) tools such as Terraform, Ansible, ARM and CloudFormation for provisioning and managing infrastructure
  • You have an expertise in observability, logs, tracing and monitoring tools such as Grafana (Loki and Tempo), Prometheus, Graylog, Jaeger, Zipkin, ELK stack or equivalent
  • You have a strong understanding of container-based architecture and hands-on experience with orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc
  • You have in-depth experience in application and infrastructure performance tuning and scaling to handle heavy loads under different scenarios e.g.: Periodic traffic load and tsunami patterns
  • You have a good understanding of essential concepts such as quality gates encompassing SLI/SLO/SLA, chaos engineering, golden signals, blameless postmortem methodologies, synthetic monitoring, distributed tracing, end-user monitoring and performance testing
  • You have experience with network load balancing, security tech stacks, Transport Layer Security (TLS) and certificate management, and an understanding of standard networking protocols and configurations
  • You have strong communication and articulation skills, and are proficient in English
  • You are able to convey resolutions to audiences with varying degrees of technical/business proficiency and bring them to consensus
  • You have excellent problem-solving and analytical skills, with a focus on continuous improvement
  • You have good listening and presentation skills
  • You solve challenging problems and difficult to debug issues with a never give up attitude
  • You can collaborate with cross-functional engineering teams to conduct capacity planning and scalability assessments, and design solutions for handling current and future growth
  • You have the ability to work under pressure, with composure, during production incidents
  • You understand requirements provided by the client on both technical and business aspects, and can break them down for successful implementation
  • You’re willing to be part of a rotation- and need-based, 24x7 available team
  • Candidates must be Singaporean citizens or already hold Singaporean Permanent Residency (PR) at the time of application

What we offer

  • There is no one-size-fits-all career path
  • career is supported by interactive tools, numerous development programs and teammates who want to help you grow

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Lead Service Reliability Engineer

8 matching positions

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Hammond
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Field Reliability Services Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Greenville
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Hammond, Indiana
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Field Reliability Services Engineer

Field Reliability Services Engineer role requiring 95% travel. Promotes safety, ...
Location
Location
United States , Greenville
Salary
Salary:
Not provided
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National, International)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right

Engineer Reliability Fixed Equipment

HF Sinclair in El Dorado, KS is seeking a Fixed Equipment Engineer. This positio...
Location
Location
United States , El Dorado
Salary
Salary:
Not provided
hfsinclair.com Logo
HF Sinclair
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of eight years of progressive work experience in a specific engineering discipline and project management experience is required
  • Emphasis on plant or refinery engineering, fixed equipment and/or mechanical integrity is required
  • A minimum of a Bachelor's Degree in engineering discipline is required
  • Technical expert in area of specialty
  • Advanced ability to stay abreast of new technology developments and processes and apply knowledge analytically
  • Strong knowledge of Microsoft products and commonly used engineering concepts and experience with engineering software
  • Familiarity with standards and practices of the specific discipline
  • Ability to effectively communicate with others, both written and verbal communication, advanced reading and writing skills, with the ability to perform advanced mathematical calculations
  • Ability to operate and drive all assigned company vehicles at company standard insurance rates is essential
  • Valid state driver's license and proof of insurance required
Job Responsibility
Job Responsibility
  • Defines engineering projects by determining objectives, evaluating technical strategies, and providing plant-engineering support to assigned business unit(s)
  • Plans and leads engineering work by writing specifications, developing schedules and budgets, and identifying improvements to existing equipment, inspection practices, and mechanical integrity programs
  • Implements engineering solutions by monitoring performance, coordinating with Operations/Inspection/Maintenance, taking corrective actions, updating procedures and reports, and securing materials, supplies, and services
  • Completes projects by delivering final outputs, closing administrative requirements, and evaluating overall project performance, including lessons learned for future inspection, maintenance, and reliability work
  • Analyzes the economics of each project where appropriate
  • calculates ROI for proposed projects
  • Provides engineering documentation, mechanical integrity analysis, operating analysis, and recommendations for management
  • Supports mechanical integrity initiatives, including fixed equipment evaluations, repair plans, inspection scope development, and root-cause investigations
  • Develops and improves procedures for inspection, maintenance, and engineering tasks to enhance consistency, effectiveness, and compliance with applicable standards
  • Collaborates with multi-disciplinary teams (Operations, Maintenance, Inspection, Process) to prioritize and execute reliability and mechanical integrity improvements
What we offer
What we offer
  • Medical Insurance
  • Vision Insurance
  • Dental Insurance
  • Paid Time-Off
  • 401(k) Retirement Plan with match
  • Educational Reimbursement
  • Parental Bonding Time
  • Employee Discounts
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Corporate Tools is looking for a Site Reliability Engineer. You will be a tradit...
Location
Location
United States
Salary
Salary:
175000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience
  • 5+ years of experience in software engineering
  • 2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
  • Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi
  • Strong proficiency with Kubernetes, Docker, and container orchestration in production environments
  • Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic
  • Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts
  • Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached)
  • Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement
  • Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis
Job Responsibility
Job Responsibility
  • Stop problems before they start
  • Fix issues quickly and learn from them
  • Help keep systems steady, secure, and running
  • Work closely with DevOps engineers to build out tools and automation
  • Take ownership
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days
  • Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Open concept office with friendly coworkers
  • Creative environment where you can make a difference
  • Fulltime
Read More
Arrow Right

Engineering Lead Analyst

Engineering Lead Analyst position in Citi's Cloud Technology Services (CTS) team...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12 Plus years of relevant experience in an Engineering role
  • Deep understanding of public cloud services adoption at scale
  • Expert-level understanding of AWS/GCP Cloud Network across Internet Application Hosting, B2B Connectivity, and Application Resiliency
  • Infrastructure as Code (IaC) Hands On Expertise with Python and Go
  • CI/CD experience with Terraform, Harness, Tekton, Jenkins, etc.
  • Testing Automation experience with Terratest, Cucumber, PytestBD, AWS Fault Injection Simulator (FIS), Chaos Mesh, etc.
  • Familiarity with Agile Development, DevOps, and SRE practices
  • Demonstrated ability to quickly learn new technologies and adapt to changing project requirements
  • Experience evaluating complex requirements and rationalizing them into consistent service offering
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Technical Expertise: hands-on technical contribution within product team focused on public cloud network
  • Collaborative Development: contribute to team of cloud engineers and full-stack software developers
  • Automation: Identify and develop automation initiatives to improve processes related to public cloud services consumption
  • Cross-Functional Partnership: collaborate with teams across Citi's technology landscape
  • Engineering Excellence: contribute to defining and measuring success criteria for service availability and reliability
  • Compliance Advocacy: ensure adherence to relevant standards, policies, and regulations
  • Serve as technology subject matter expert for internal and external stakeholders
  • Provide direction for firm mandated controls and compliance initiatives
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
What we offer
What we offer
  • Career growth opportunities
  • Opportunity to give back to community
  • Make real impact
  • Global team environment
  • Well-being support
  • Work-life balance programs
  • Fulltime
Read More
Arrow Right