CrawlJobs Logo

SRE Lead

https://www.randstad.com Logo

Randstad

Location Icon

Location:
Malaysia , Kuala Lumpur

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

15000.00 - 19000.00 MYR / Month

Job Description:

A top-tier global leader in the insurance and reinsurance industry with operations in over 50 countries. Distinguished by exceptional financial strength and a massive global distribution network. Employs a diverse workforce of approximately 43,000 professionals worldwide. Operates with a "global scale, local touch" approach to serve a wide range of clients. An inclusive employer committed to equal opportunity and fair hiring practices across all regions.

Job Responsibility:

  • Founding Leadership: You will be a key hire for a newly established department, responsible for building and scaling a new SRE team from the ground up
  • Global Systems Support: Provide senior-level SRE expertise for mission-critical applications deployed and used across the globe
  • Reliability Engineering: Lead initiatives to improve the availability, performance, and resilience of complex infrastructure
  • Strategic Automation: Champion an "automation-first" mindset to eliminate manual toil and optimize system monitoring and response
  • Performance Standards: Own the definition and measurement of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for the platform

Requirements:

  • Extensive hands-on experience managing and scaling environments within Azure and AWS
  • Deep technical knowledge of Linux/Unix systems, including advanced networking fundamentals
  • Proficiency in programming and configuration languages such as Python, Ansible, PowerShell, .Net, or Java
  • Expert-level experience with Docker and Kubernetes for managing distributed microservices
  • Mastery of monitoring tools like App Dynamics, Dynatrace, Grafana, and the ELK stack to drive data-driven reliability

Nice to have:

Any relevant certificates and studies would be good to have

Additional Information:

Job Posted:
February 18, 2026

Expiration:
March 21, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for SRE Lead

Lead SRE

We are looking for a Lead SRE to join our Inetum Team and be part of a work cult...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture
Job Responsibility
Job Responsibility
  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Engineering Lead Analyst

The Engineering Lead Analyst is a senior level position responsible for leading ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
Job Responsibility
Job Responsibility
  • Serve as a technology subject matter expert for internal and external stakeholders
  • Provide direction for all firm mandated controls and compliance initiatives
  • Lead projects within the group and create a technology domain roadmap
  • Ensure that all integration of functions meet business goals
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Exhibit knowledge of how own specialty area contributes to the business
  • Apply knowledge of competitors, products and services
  • Advise or mentor junior team members
  • Fulltime
Read More
Arrow Right

Director, Service Reliability Engineering

As Director of SRE, you will lead the team responsible for accelerating and auto...
Location
Location
United States , Bethesda
Salary
Salary:
125600.00 - 203700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate degree in computer science, software engineering, or a related field (or equivalent experience)
  • 10+ years of experience in SRE, devsecops or IT operations
  • At least 5 years’ experience in a previous leadership role within SRE, devsecops or IT Operations
  • At least five years of experience in the following technologies - Presentation Management: HTML, CSS, JS, Backbone, Node JS, Android, iOS, Application Platforms: NGINX, Java, Akana, Play Framework, Tomcat, Docker, Openshift, Application Data: PostgreSQL, Couchbase, Cassandra, Integration Services: Apache Kafka, Apache Spark, Akana, Analytics Platforms: Hadoop, dashDB, Cognos, Tableau, Security: Forgerock, OpenID, OAUTH, Ping Identity, Public Cloud: Azure, Google Cloud, AliCloud, Amazon Web Services, CI/CD: Harness
  • Experience with test automation
  • Working knowledge and proven track record of implementing disaster indifferent architecture
  • Experience with CDN and Akamai tools
  • Linux/Unix system administration experience
  • Proficient in scripting and programming languages (like Python, Go, Bash, Shell)
  • Hands on experience with infrastructure as code (like Terraform), container orchestration (like Kubernetes), and reliability automation
Job Responsibility
Job Responsibility
  • Define and execute Marriott’s SRE vision, aligning with business objectives and technology roadmaps
  • Build, mentor and lead a high-performing SRE team, fostering a culture of collaboration and innovation
  • Establish reliability, observability and automation goals to improve system uptime, performance and scalability
  • Partner with engineering, operations and security teams to drive best practices and continuous improvement
  • Implement reliability-focused engineering practices, including SLAs, SLOs/SLIs and error budgets
  • Design and maintain resilient, scalable and fault-tolerant architectures across cloud and hybrid environments
  • Develop strategies to proactively identify and mitigate risks to system performance and availability
  • Drive root cause analysis (RCA) and post-mortem processes to prevent recurring incidents
  • Champion automation in monitoring, deployment and incident resolution to reduce toil and enhance efficiency
  • Lead and optimize incident response processes, ensuring rapid detection, diagnosis, and resolution of system failures
What we offer
What we offer
  • Bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off (including sick leave where applicable)
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • Fulltime
Read More
Arrow Right

SRE Observability Lead Engineer

The SRE Observability Lead Engineer is a hands-on leader responsible for shaping...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Relevant experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including several years in senior leadership roles
  • Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
  • Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, GCP, Azure), and container platforms (ECS, Kubernetes)
  • Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
  • Experience leading teams and managing people across geographically distributed locations
  • Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
  • Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
  • Strong collaboration skills and experience working across federated teams, building consensus and delivering change
  • Ability to stay up to date with industry trends and apply them to improve internal tooling and design decisions
  • Excellent written and verbal communication skills
Job Responsibility
Job Responsibility
  • Define and own the strategic vision and multi-year roadmap for Observability across Services Technology, aligned with enterprise reliability and production goals
  • Translate strategy into an actionable delivery plan in partnership with Services Architecture & Engineering function, delivering incremental, high-value milestones toward a unified, scalable observability architecture
  • Lead and mentor SREs across Services, fostering a technical growth and SRE mindset
  • Build and offer a suite of central observability services across LoBs – including standardized telemetry libraries, onboarding templates, dashboard packs, and alerting standards
  • Drive reusability and efficiency by creating common patterns and golden paths for observability adoption across critical client flows and platforms
  • Partner with infrastructure, CTO and other SMBF tooling teams, to ensure observability tooling is scalable, resilient, and avoids duplication (“cottage industries”)
  • Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
  • Collaborate closely with the architecture function to support implementation of observability NFRs in the SDLC, ensuring new apps go live with sufficient coverage and insight
  • Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices like SLO adoption, Capacity Planning
  • Use Jira/Agile workflows to track and report on observability maturity across Services LoBs – coverage, adoption, and contribution to improved client experience
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Orion Tech SRE Lead - Senior Vice President

The Orion Tech- SRE Lead is a hands-on leader responsible for shaping and delive...
Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 16+ years of experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including 5+ years in senior leadership roles
  • Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
  • Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, Google Cloud), and container platforms (ECS, Kubernetes)
  • Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
  • Experience leading teams and managing people across geographically distributed locations
  • Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
  • Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
  • Strong collaboration skills and experience working across horizontal infrastructure teams, building consensus and delivering changes
  • Ability to stay up to date with market trends and apply them to improve internal tooling and design decisions
  • Good understanding of AI tech stack, should be able to create a business case and solve using Citibank AI solutions
Job Responsibility
Job Responsibility
  • Define and own the roadmap for Engineering enablers for Project Orion team aligned with enterprise reliability and SRE Services organization goals
  • Translate Organization strategy into an actionable delivery plan in partnership with Services Products, Operations & Engineering function, delivering incremental, high-value milestones
  • Understand Critical Business Services functional scope and translate into End-to-End monitoring solutions
  • Periodic review and analyze application monitoring TOIL and collaborate with stakeholders and remediate them as per organization goal
  • Identify manual operations use cases which are performed by Level 1 functions. Create a strategic plan to automate
  • Drive reusability and efficiency by tracking problem statements raised by Orion Level 1 Function by providing milestone delivery plan
  • Ability to Design & Build strategic observability dashboard including gold signals like SLO, SLI, Latency & business metrics in a single pane of glass
  • Lead and mentor SREs, fostering a technical growth and SRE mindset
  • Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
  • Use Jira/Agile workflows to track and report on strategic enablers coverage, adoption, and contribution to improved client experience
  • Fulltime
Read More
Arrow Right

Engineering Manager for Observability/CI/CD and Cloud

Lead the AI-Driven Evolution of Groupon’s Global Engineering Platform. At Groupo...
Location
Location
Dublin; Madrid; Prague; Valencia; Warsaw
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years’ experience leading infrastructure, DevOps, or SRE teams (5+ people), ideally in high-change, scale-up environments
  • Deep technical expertise in cloud-native platforms, observability, infrastructure as code, and CI/CD tooling
  • Proven success operationalizing AI tools within engineering workflows
  • Strategic, resilient, and pragmatic approach: ready to own results and thrive under shifting priorities
  • Exceptional communication: able to simplify complexity and effectively partner with C-level and global teams
  • Bachelor’s or Master’s in Computer Science (or similar)—or equivalent industry experience
Job Responsibility
Job Responsibility
  • Lead & Inspire: Build and mentor a high-performing, globally distributed team of CI/CD and Observability engineers (5-10 direct reports), coaching them in cutting-edge AI-assisted workflows and best practices
  • Modernize Core Infrastructure: Spearhead the migration from legacy platforms (Jenkins, ELK) to cloud-native solutions (GitHub Actions, Google Cloud Logging, GCP Prometheus/Grafana). Eliminate “straggler” pipelines and drive cost-efficient, reliable operations
  • AI-First Engineering: Operationalize AI tools (Claude Code, Copilot, ChatGPT, etc.) for everything from log analysis and incident summaries to automated infrastructure as code, making AI-augmented engineering a daily norm
  • Architect & Optimize: Oversee a hybrid tech stack (Kubernetes, Envoy, Terraform, GCP, AWS), ensuring platforms are fast, scalable, and “self-healing” via LLM integrations
  • Collaborate Globally: Act as a thought leader and cross-functional partner, advocating for AI-driven developer experience and collaborating with leaders in SRE, Product, and Cloud
  • Drive Transformation: Deliver strategic projects with tight deadlines and direct business impact, such as the Jenkins-to-GHA and ELK-to-GCP migrations, while maintaining a high standard of technical excellence and cost efficiency
What we offer
What we offer
  • Drive real, high-visibility change at the heart of a company undergoing major transformation
  • Work on complex technical and operational challenges in a fast-paced, AI-first environment
  • Accelerate your impact—and your team’s—using industry-leading AI and automation tools
  • Influence engineering practices across a global platform impacting millions of users
Read More
Arrow Right