CrawlJobs Logo

Lead SRE

https://www.inetum.com Logo

Inetum

Location Icon

Location:
Portugal, Lisbon

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for a Lead SRE to join our Inetum Team and be part of a work culture focused on innovation!

Job Responsibility:

  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support

Requirements:

  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture

Additional Information:

Job Posted:
July 14, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead SRE

Internal Kubernetes Platform Lead SRE

HSBC is seeking an IKP Support Engineer (SRE) to join the IKP Team within the Hy...
Location
Location
Poland
Salary
Salary:
Not provided
https://www.hsbc.com Logo
HSBC
Expiration Date
February 17, 2026
Flip Icon
Requirements
Requirements
  • Solid technical knowledge and experience with Kubernetes administration
  • 3+ years of hands-on experience with Kubernetes administration
  • Strong knowledge of Kubernetes concepts and operations and troubleshooting tools
  • Understanding of containerization and orchestration
  • Experience with Unix administration skills
  • Experience with Service Meshes is a plus
  • Understanding of ITIL processes and automation skills
  • Familiarity with infrastructure as a code
  • Strong analytical and communication skills
  • Proficiency in English.
Job Responsibility
Job Responsibility
  • Ensure the reliability, availability, and performance of the infrastructure platform
  • Collaborate in diagnosing and resolving IKP infrastructure issues
  • Support the deployment, configuration, and maintenance of Kubernetes platform
  • Troubleshoot and resolve incidents, performance issues, and integration failures
  • Perform root cause analysis and implement reliability improvements
  • Provide 24x7 support as part of an on-call Rota
  • Plan duties and the other administrative tasks for a team in line with Polish Labor Code.
What we offer
What we offer
  • Competitive salary
  • Annual performance-based bonus
  • Additional bonuses for recognition awards
  • Multisport card
  • Private medical care
  • Life insurance
  • One-time reimbursement of home office set-up (up to 800 PLN)
  • Corporate parties & events
  • CSR initiatives
  • Nursery discounts
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right
New

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Engineering Lead Analyst

The Engineering Lead Analyst is a senior level position responsible for leading ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
Job Responsibility
Job Responsibility
  • Serve as a technology subject matter expert for internal and external stakeholders
  • Provide direction for all firm mandated controls and compliance initiatives
  • Lead projects within the group and create a technology domain roadmap
  • Ensure that all integration of functions meet business goals
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Exhibit knowledge of how own specialty area contributes to the business
  • Apply knowledge of competitors, products and services
  • Advise or mentor junior team members
  • Fulltime
Read More
Arrow Right

Director, Service Reliability Engineering

As Director of SRE, you will lead the team responsible for accelerating and auto...
Location
Location
United States , Bethesda
Salary
Salary:
125600.00 - 203700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate degree in computer science, software engineering, or a related field (or equivalent experience)
  • 10+ years of experience in SRE, devsecops or IT operations
  • At least 5 years’ experience in a previous leadership role within SRE, devsecops or IT Operations
  • At least five years of experience in the following technologies - Presentation Management: HTML, CSS, JS, Backbone, Node JS, Android, iOS, Application Platforms: NGINX, Java, Akana, Play Framework, Tomcat, Docker, Openshift, Application Data: PostgreSQL, Couchbase, Cassandra, Integration Services: Apache Kafka, Apache Spark, Akana, Analytics Platforms: Hadoop, dashDB, Cognos, Tableau, Security: Forgerock, OpenID, OAUTH, Ping Identity, Public Cloud: Azure, Google Cloud, AliCloud, Amazon Web Services, CI/CD: Harness
  • Experience with test automation
  • Working knowledge and proven track record of implementing disaster indifferent architecture
  • Experience with CDN and Akamai tools
  • Linux/Unix system administration experience
  • Proficient in scripting and programming languages (like Python, Go, Bash, Shell)
  • Hands on experience with infrastructure as code (like Terraform), container orchestration (like Kubernetes), and reliability automation
Job Responsibility
Job Responsibility
  • Define and execute Marriott’s SRE vision, aligning with business objectives and technology roadmaps
  • Build, mentor and lead a high-performing SRE team, fostering a culture of collaboration and innovation
  • Establish reliability, observability and automation goals to improve system uptime, performance and scalability
  • Partner with engineering, operations and security teams to drive best practices and continuous improvement
  • Implement reliability-focused engineering practices, including SLAs, SLOs/SLIs and error budgets
  • Design and maintain resilient, scalable and fault-tolerant architectures across cloud and hybrid environments
  • Develop strategies to proactively identify and mitigate risks to system performance and availability
  • Drive root cause analysis (RCA) and post-mortem processes to prevent recurring incidents
  • Champion automation in monitoring, deployment and incident resolution to reduce toil and enhance efficiency
  • Lead and optimize incident response processes, ensuring rapid detection, diagnosis, and resolution of system failures
What we offer
What we offer
  • Bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off (including sick leave where applicable)
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • Fulltime
Read More
Arrow Right
New

Engineering Manager for Observability/CI/CD and Cloud

Lead the AI-Driven Evolution of Groupon’s Global Engineering Platform. At Groupo...
Location
Location
Dublin; Madrid; Prague; Valencia; Warsaw
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years’ experience leading infrastructure, DevOps, or SRE teams (5+ people), ideally in high-change, scale-up environments
  • Deep technical expertise in cloud-native platforms, observability, infrastructure as code, and CI/CD tooling
  • Proven success operationalizing AI tools within engineering workflows
  • Strategic, resilient, and pragmatic approach: ready to own results and thrive under shifting priorities
  • Exceptional communication: able to simplify complexity and effectively partner with C-level and global teams
  • Bachelor’s or Master’s in Computer Science (or similar)—or equivalent industry experience
Job Responsibility
Job Responsibility
  • Lead & Inspire: Build and mentor a high-performing, globally distributed team of CI/CD and Observability engineers (5-10 direct reports), coaching them in cutting-edge AI-assisted workflows and best practices
  • Modernize Core Infrastructure: Spearhead the migration from legacy platforms (Jenkins, ELK) to cloud-native solutions (GitHub Actions, Google Cloud Logging, GCP Prometheus/Grafana). Eliminate “straggler” pipelines and drive cost-efficient, reliable operations
  • AI-First Engineering: Operationalize AI tools (Claude Code, Copilot, ChatGPT, etc.) for everything from log analysis and incident summaries to automated infrastructure as code, making AI-augmented engineering a daily norm
  • Architect & Optimize: Oversee a hybrid tech stack (Kubernetes, Envoy, Terraform, GCP, AWS), ensuring platforms are fast, scalable, and “self-healing” via LLM integrations
  • Collaborate Globally: Act as a thought leader and cross-functional partner, advocating for AI-driven developer experience and collaborating with leaders in SRE, Product, and Cloud
  • Drive Transformation: Deliver strategic projects with tight deadlines and direct business impact, such as the Jenkins-to-GHA and ELK-to-GCP migrations, while maintaining a high standard of technical excellence and cost efficiency
What we offer
What we offer
  • Drive real, high-visibility change at the heart of a company undergoing major transformation
  • Work on complex technical and operational challenges in a fast-paced, AI-first environment
  • Accelerate your impact—and your team’s—using industry-leading AI and automation tools
  • Influence engineering practices across a global platform impacting millions of users
Read More
Arrow Right

Engineering Manager, Platform

We are looking for an engineering manager to help us scale, improve organisation...
Location
Location
Salary
Salary:
Not provided
airalo.com Logo
Airalo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5 years of hands-on technical experience in cloud-native environments, specifically with distributed systems and platform development
  • Minimum 2 years of experience in directly leading and managing platform, DevOps, or SRE teams
  • Expertise in designing, building, refactoring, and operating distributed systems and scalable cloud infrastructure at scale
  • Expertise in event-driven architecture and various Messaging systems (e.g., Kafka, SQS, RabbitMQ, Pub/Sub)
  • Strong knowledge of both relational (SQL) and NoSQL database technologies and their operational considerations in cloud environments
  • Extensive hands-on experience and deep understanding of core AWS services (e.g., EC2, EKS, Lambda, SQS, Security Groups, IAM, Aurora, DynamoDB, S3, RDS, CloudWatch, CloudTrail)
  • Proven expertise with Infrastructure as Code (e.g., Terraform, CloudFormation)
  • Strong experience with containerisation technologies (Docker) and orchestration platforms (Kubernetes), including Helm and related ecosystem tools
  • Extensive experience with modern monitoring, logging, and observability platforms (e.g., Datadog, Prometheus, Grafana, ELK Stack, Jaeger/OpenTelemetry)
  • Strong familiarity with DevSecOps practices and the implementation of automated security tooling throughout the CI/CD pipeline (e.g., SAST, DAST, secret management, vulnerability scanning)
Job Responsibility
Job Responsibility
  • Lead the strategy, architecture, and execution of our core platform technologies
  • Extend and improve engineering best practices across the organisation
  • Maintain and improve a collaborative environment, acting as a key bridge between application development teams and the platform team
  • Motivate and instil a strong sense of ownership in your team for the end-to-end lifecycle, stability, scalability, and performance of our core platform services
  • Mentor and guide the professional and technical development of your team members
  • Ensures that the team delivers high quality products and solutions by following the best practices
  • Build and scale teams that are collaborative, inclusive, and respectful of each other
  • Provide continuous, actionable feedback, address underperformance proactively, and recognise the individual strengths and contributions of your team members
  • Work closely with engineers and collaborate with key stakeholders to define, maintain a prioritised backlog, and establish clear short-term and long-term goals for the platform roadmap
  • Own your team’s deliverables and ensure the continuous delivery of scalable, highly-available, and cost-efficient platform services and infrastructure
What we offer
What we offer
  • Health Insurance
  • work-from-anywhere stipend
  • annual wellness & learning credits
  • annual all-expenses-paid company retreat in a gorgeous destination
  • Fulltime
Read More
Arrow Right

Executive Director – AI and Machine Learning

At CVS Health, we’re building a world of health around every consumer and surrou...
Location
Location
United States , Work At Home, New Jersey
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
December 31, 2025
Flip Icon
Requirements
Requirements
  • PhD or Master's degree in AI/ML, Computer Science, Statistics, Engineering, or equivalent experience
  • 15+ years leading Enterprise Machine Learning, Infrastructure, Data Science, and/or SRE practices
  • 5+ years applying Machine Learning to optimize technology operations (AIOps)
  • 10+ years at a leadership level or above, within a Fortune 500 company with significant scale
  • Proven experience leading AI governance, establishing and maintaining robust ML Ops environments, leading development of large-scale AI and ML platforms and solutions, and developing strategic partnerships with internal clients, industry experts, and vendors
  • Ability to develop and implement a comprehensive AI/ML strategy that aligns with the organization's business goals
  • Deep understanding of AI/ML technologies, including model development, deployment, MLOps, GenAIOps, and LLMOps practices
  • Demonstrated knowledge of and significant experience building and operating on-premise AI processor (e.g., GPU clusters) and platform architectures for the deployment and management of enterprise AI workloads
  • Experience with and commitment to ensuring AI/ML solutions are developed and deployed ethically, with a focus on fairness, transparency, and accountability
  • Familiarity with industry standards and regulations related to AI and Machine Learning
Job Responsibility
Job Responsibility
  • Develop, implement, and enhance governance frameworks and policies to ensure effective oversight of operational and security-focused AI and ML solutions
  • Establish and enforce standards for the build, management, governance, and utilization of AI models and model execution platforms
  • Establish and socialize a framework for the documentation, proposal, evaluation, build, delivery, and ongoing value assessment of scalable operations and security-focused AI/ML solutions
  • Evaluate and certify foundational models for use within CVS Health, ensuring alignment with organizational goals and security requirements
  • Regularly assess and enhance the governance model and associated standards to address emerging challenges and opportunities
  • Establish and maintain robust MLOps, GenAIOps, and LLMOps practices
  • Build and manage pipelines to enable teams to design AI-powered applications, develop and experiment with models, and deploy, monitor, and maintain them in production
  • Drive delivery of AI and ML solutions providing provide deep insights and reporting on operations and security data
  • Develop proactive AI-driven solutions to measurably reduce time to detect security and operational issues, provide adaptive recommendations, and automate remediation
  • Deliver solutions to enable users to interact with operational data driving measurable improvements in productivity, performance, and innovation
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.