CrawlJobs Logo

Lead DevOps Engineer (Reliability)

mastercard.com Logo

Mastercard

Location Icon

Location:
Norway , Oslo

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Mastercard Payment Services Team is looking for a “BizOps Lead” who will be based in Oslo – Norway. The role of business operations is to be the production readiness steward for the platform. This is accomplished by closely partnering with other technical teams to design, build, implement, and support technology services. Business operations teams ensure operational criteria like system availability, capacity, performance, monitoring, self-healing, and deployment automation are implemented throughout the delivery process.

Job Responsibility:

  • Involve in knowledge transfer sessions for new systems/platforms/applications
  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement
  • Align product and customer focused priorities with operational needs to protect the platform and customer experience
  • Proactively manage production events and involve in change activities to maximize customer experience and increase the overall value of supported applications
  • Practice sustainable and timely incident response (7/24/365) according to ITSM and Mastercard standards, create/update necessary incident and related problem records, engage with global and local teams to facilitate the incident resolution
  • Ensure necessary internal and external incident notifications are performed according to SLAs
  • Perform necessary incident follow-up tasks with relevant teams such as root cause analysis, preparation of incident documentation and perform blameless postmortems
  • Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
  • Support daily operations with a hyper focus on triage and then root cause by understanding the business implications of our products. Shift left to be more proactive and upfront in the development process
  • Ensure any new products or product enhancements have the appropriate operational support structure to deliver promised business outcomes
  • Ensure any documented service commitments are monitored and appropriate mitigation steps taken to restore or maintain service commitments
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Work with a global team spread across tech hubs in multiple geographies and time zones
  • Share knowledge and mentor junior colleagues

Requirements:

  • BS degree in Computer Science or related technical field or equivalent practical experience
  • Extensive experience in ITIL standards, service event management activities, incident management and application development lifecycle
  • Proven track record in supporting production applications to facilitate change and incident activities
  • Understanding algorithms, data structures, scripting, pipeline management, and software design
  • Systematic problem-solving approach coupled with strong communication skills and a sense of ownership and drive
  • Experience in dealing with difficult situations and making decisions with a sense of urgency
  • Interest in understanding, analyzing and troubleshooting large-scale distributed systems
  • Experience in Site Reliability Engineering (SRE) practices and “Run” activities
  • Experience in customer support and delivery roles
  • Experience with financial oversight and process efficiencies
  • Scripting
  • CI/CD Pipelines
  • Monitoring tools
  • Jenkins
  • Unix/Linux Operating commands
  • Network logic understanding
  • Networking concepts
  • F5 understanding

Additional Information:

Job Posted:
January 02, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead DevOps Engineer (Reliability)

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Lead DevOps Engineer

David Zwirner seeks an experienced and strategic Lead DevOps Engineer to guide t...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
davidzwirner.com Logo
David Zwirner Gallery
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Legal authorization to work in the UK
  • Track record in a senior/lead DevOps, SRE, or Platform role, including mentorship of engineers
  • Expert‑level Terraform (including importing existing resources and taming legacy estates)
  • Deep, hands‑on experience with AWS (ECS, RDS, ElastiCache, Lambda, ALB, WAF, S3, CloudFront, EventBridge, CloudWatch) and production networking/IAM
  • Proven design and maintenance of CI/CD pipelines (GitHub Actions) and container workflows (Docker, ECS Fargate or Kubernetes)
  • Proficiency with modern observability/monitoring (Datadog, CloudWatch, Sentry, PagerDuty), incident response, and incident retrospectives
  • Strong background in cloud security principles and practical hardening
  • Ability to define and execute a technical roadmap and communicate with both technical and non‑technical stakeholders
Job Responsibility
Job Responsibility
  • Lead direction and mentor for the DevOps team
  • set technical direction for infrastructure and security
  • foster a culture of ownership, reliability, and continuous improvement
  • Define, own, and drive the Infrastructure & Security Roadmap, prioritizing infrastructure ownership, profound monitoring, disaster recovery, developer experience, and security hardening
  • Inventory and capture unmanaged resources in Terraform (and CDK/SST where required)
  • create reusable modules and guardrails
  • institute code reviews and change management
  • Design and operate services built on ECS (Fargate), ECR, RDS, ElastiCache, S3, ALB/CloudFront, WAF, Lambda, EventBridge, CloudWatch
  • improve networking, IAM, and resilience
  • Modernize critical workloads
Read More
Arrow Right

Lead DevOps Engineer

David Zwirner seeks an experienced and strategic Lead DevOps Engineer to guide t...
Location
Location
France , Paris
Salary
Salary:
Not provided
davidzwirner.com Logo
David Zwirner Gallery
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Legal authorization to work in the EU
  • Track record in a senior/lead DevOps, SRE, or Platform role, including mentorship of engineers
  • Expert‑level Terraform (including importing existing resources and taming legacy estates)
  • Deep, hands‑on experience with AWS (ECS, RDS, ElastiCache, Lambda, ALB, WAF, S3, CloudFront, EventBridge, CloudWatch) and production networking/IAM
  • Proven design and maintenance of CI/CD pipelines (GitHub Actions) and container workflows (Docker, ECS Fargate or Kubernetes)
  • Proficiency with modern observability/monitoring (Datadog, CloudWatch, Sentry, PagerDuty), incident response, and incident retrospectives
  • Strong background in cloud security principles and practical hardening
  • Ability to define and execute a technical roadmap and communicate with both technical and non‑technical stakeholders
Job Responsibility
Job Responsibility
  • Leadership: Lead direction and mentor for the DevOps team
  • set technical direction for infrastructure and security
  • foster a culture of ownership, reliability, and continuous improvement
  • Roadmap Ownership & Strategy: Define, own, and drive the Infrastructure & Security Roadmap, prioritizing infrastructure ownership, profound monitoring, disaster recovery, developer experience, and security hardening
  • Infrastructure as Code (IaC): Inventory and capture unmanaged resources in Terraform (and CDK/SST where required)
  • create reusable modules and guardrails
  • institute code reviews and change management
  • Platform Operations (AWS‑first): Design and operate services built on ECS (Fargate), ECR, RDS, ElastiCache, S3, ALB/CloudFront, WAF, Lambda, EventBridge, CloudWatch
  • improve networking, IAM, and resilience
  • Resilience & Reliability: Modernize critical workloads
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Corporate Tools is looking for a Site Reliability Engineer. You will be a tradit...
Location
Location
United States
Salary
Salary:
175000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience
  • 5+ years of experience in software engineering
  • 2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
  • Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi
  • Strong proficiency with Kubernetes, Docker, and container orchestration in production environments
  • Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic
  • Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts
  • Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached)
  • Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement
  • Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis
Job Responsibility
Job Responsibility
  • Stop problems before they start
  • Fix issues quickly and learn from them
  • Help keep systems steady, secure, and running
  • Work closely with DevOps engineers to build out tools and automation
  • Take ownership
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days
  • Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Open concept office with friendly coworkers
  • Creative environment where you can make a difference
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Join our client, a leading financial institution at the forefront of innovation,...
Location
Location
United States , Austin
Salary
Salary:
57.00 - 63.33 USD / Hour
aquent.com Logo
Aquent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading engineering teams and delivering projects using Scrum and efficient release practices
  • Strong background in converting high-level designs into low-level designs and providing technical oversight
  • Demonstrated experience in designing, architecting, and deploying cloud-native applications, specifically on GCP
  • Proficiency with various database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL
  • Expertise in containerization technologies such as Docker and Kubernetes, and building/managing CI/CD pipelines
  • Experience leveraging AI-Driven software development tools to enhance productivity, code comprehension, and documentation
  • Proven track record of integrating and applying AI/Machine Learning models for data analytics, visualization, automation, and problem-solving
  • Ability to maintain high quality standards while delivering within tight schedules
  • Exceptional collaborative mindset with a bias for action, engaging effectively with product management, architects, and other domains
  • Strong ability to work with internal, external, and offshore stakeholders
Job Responsibility
Job Responsibility
  • Drive Technical Leadership & Project Delivery: Lead engineering teams through the entire project lifecycle, leveraging agile methodologies like Scrum to ensure efficient delivery and robust release practices
  • Architect & Design Cloud-Native Solutions: Translate high-level architectural visions into detailed low-level designs, providing expert technical oversight for the development and deployment of cutting-edge cloud-native applications
  • Champion Reliability & Scalability: Design, architect, and deploy highly available and scalable cloud-native applications on platforms such as GCP, ensuring optimal performance and resilience
  • Optimize Data Management: Leverage your expertise with diverse database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL, to build and maintain robust data solutions
  • Advance DevOps & Automation: Implement and optimize containerization strategies using technologies like Docker and Kubernetes, and establish sophisticated CI/CD pipelines to streamline development and deployment
  • Innovate with AI/ML: Integrate and apply AI/Machine Learning models to enhance data analytics, visualization, automation, and creatively solve complex business and technical challenges
  • Foster Collaboration & Mentorship: Work closely with diverse stakeholders across product management, architecture, and other engineering domains, while actively mentoring and coaching multiple teams to elevate technical capabilities
  • Influence & Present Solutions: Effectively engage subject matter experts, present complex architectural solutions to governance boards and stakeholders, and advocate for data-driven proposals
What we offer
What we offer
  • subsidized health, vision, and dental plans
  • paid sick leave
  • retirement plans with a match
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right

Public Cloud Engineering Lead

As the Public Cloud Infrastructure Lead, you will play a pivotal role in shaping...
Location
Location
United States , Irving, Texas
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of relevant experience in engineering
  • at least 5+ years in AWS and Cloud
  • experience in applications development
  • experience in management
  • experience managing global technology teams
  • working knowledge of industry practices and standards
  • consistently demonstrates clear and concise written and verbal communication
  • Bachelor's degree or equivalent work experience
Job Responsibility
Job Responsibility
  • provide technical authority for all engineering activities in the public cloud foundational infrastructure space
  • lead and grow a team of deeply technical cloud specialists and full-stack software developers
  • drive client satisfaction and business value by identifying and developing process improvement and automation initiatives
  • establish partnerships across the broader Citi technology landscape to align with business growth initiatives and priorities
  • define measurable success criteria and routinely assess service availability and reliability
  • drive compliance with applicable standards, policies, and regulations, always assessing risk with Citi's reputation, clients, and assets in mind
What we offer
What we offer
  • medical, dental & vision coverage
  • 401(k)
  • life, accident & disability insurance
  • wellness programs
  • planned time off (vacation)
  • unplanned time off (sick leave)
  • paid holidays
  • Fulltime
Read More
Arrow Right