CrawlJobs Logo

Lead SRE

Portugal, Lisbon · Job Posted July 14, 2025
Apply Position
Job Link Share

Job Description

We are looking for a Lead SRE to join our Inetum Team and be part of a work culture focused on innovation!

Job Responsibility

  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support

Requirements

  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Lead SRE

8 matching positions

Lead SRE

About the Role Deliver cloud-native solutions and patterns that are highly elast...
Location
Location
United States of America , Fort Lauderdale
Salary
Salary:
Not provided
bhsg.com Logo
Beacon Hill
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deliver cloud-native solutions and patterns that are highly elastic
  • Empower stakeholders and reduce toil through self-service pipelines
  • Mentor your team in solving deep technical issues, advanced cloud infrastructure topics, and complex coding problems
  • Set an example of methodical, systematic task execution for your team
  • Work with project managers and stakeholders to provide status and reporting
  • Act as an ambassador to other teams, finding common ground and defining clear agreements
  • Drive projects to schedule
  • Perform code reviews with an eye toward rigor and best practice
  • Apply continuous process improving techniques across the operation
  • Automate everything
Job Responsibility
Job Responsibility
  • Deliver cloud-native solutions and patterns that are highly elastic
  • Empower stakeholders and reduce toil through self-service pipelines
  • Mentor your team in solving deep technical issues, advanced cloud infrastructure topics, and complex coding problems
  • Set an example of methodical, systematic task execution for your team
  • Work with project managers and stakeholders to provide status and reporting
  • Act as an ambassador to other teams, finding common ground and defining clear agreements
  • Drive projects to schedule
  • Perform code reviews with an eye toward rigor and best practice
  • Apply continuous process improving techniques across the operation
  • Automate everything
  • Fulltime
Read More
Arrow Right

Lead SRE

We have a 6 month contract to hire for a senior, hands-on Site Reliability Engin...
Location
Location
United States , St Louis
Salary
Salary:
Not provided
zeektek.com Logo
Zeektek
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • AWS Certified DevOps Engineer – Professional
  • Dynatrace Professional
  • One SaaS tool certifications (Prometheus Certified Associate (PCA), Datadog, New Relic)
  • 7+ years in SRE/Production Engineering/Platform roles
  • 2+ years leading initiatives or teams
  • Strong in Linux, networking fundamentals (HTTP, TLS, DNS, TCP), and distributed systems concepts
  • Proficiency with Go, Python, Shell Scripting, SQL, Java or JVM, JavaScript/TypeScript, YAML/HCL/JSON
  • Hands-on with IaC (Terraform) and CI/CD (GitLab CI, GitHub Actions, AWS/Azure DevOps)
  • Deep experience in AWS Cloud infrastructure
Job Responsibility
Job Responsibility
  • Lead SRE to drive reliability, scalability, observability (monitoring & alerts) and performance across the production platforms
  • Own the SLO/SLI strategy, modernize observability and incident response, and partner with application teams to deliver resilient systems
  • Define and govern SLOs/SLIs/Error Budgets for critical services
  • enforce guardrails and drive reliability roadmaps
  • Lead performance tuning collaboration with application teams to ensure high availability and low latency
  • Define and own infrastructure tuning to ensure scalability leading to high availability
  • Lead Metrics and automation driven Reliability
  • Dedug systems across layers
  • Architect and evolve CI/CD, infrastructure-as-code (IaC- Terraform)
  • Design and build serverless APIs (Lambda, API Gateway, SQS, SNS, DynamoDB, etc.)
What we offer
What we offer
  • Weekly Direct Deposit
  • 401K Matching
  • Competitive medical, dental and vision insurance
  • Consistent communication throughout your project
  • ZeekTek Referral Program
  • Fulltime
Read More
Arrow Right

Credit Risk Support Lead- SRE

Join Barclays as a Credit Risk Support Lead- SRE role, where to effectively moni...
Location
Location
India , Pune
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 14+ years’ experience in production support
  • High energy, hands-on and results & goal-oriented
  • Expertise in log debugging, root cause analysis and troubleshooting live issues
  • Experience on observability tools like ESaaS, AppD / ITRS , Netcool
  • Experience in data analysis to identify underlying themes impacting stability, performance, and customer experience
  • Ensures and promotes ITIL best practices for Incident, Problem, Change, Release management (including managing and running triages, conducting root cause analysis, post incident reviews etc)
  • Strong Credit Risk business knowledge
  • Negotiate SLAs/OLAs with customer and other support elements
  • Business (IT) Continuity Management
  • KPI reporting and monitoring
Job Responsibility
Job Responsibility
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients. Develop the support model and service offering to improve the service to customers and stakeholders.
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance.
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing.
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes.
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management.
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues.
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency. Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur.
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right

Credit Risk Support Lead- SRE

Embark on a transformative journey as a Credit Risk Support Lead-SRE. At Barclay...
Location
Location
United States , Whippany
Salary
Salary:
150000.00 - 215000.00 USD / Year
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Good domain knowledge with end-to-end responsibility of IT services, including day-to-day operations, incidents and changes
  • Robust understanding of regulatory compliance, risk frameworks, audit, and metric monitoring of service health and control effectiveness
  • Overseeing support teams, effective delegation, and communication with business users and senior stakeholders
  • Ability to prioritize issues, refine support procedures, and drive continuous improvement across RTB and support processes
  • Solid understanding of the software development lifecycle and how application support integrates to enhance delivery, stability, and reliability
Job Responsibility
Job Responsibility
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients
  • Develop the support model and service offering to improve the service to customers and stakeholders
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
  • Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur
What we offer
What we offer
  • Medical coverage
  • Dental coverage
  • Vision coverage
  • 401(k)
  • Life insurance
  • Paid leave
  • Incentive award
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering (SRE) / Lead Engineer

We are currently seeking a Site Reliability Engineering (SRE) / Lead Engineer to...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
  • Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation
  • Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
  • Strong proficiency in Infrastructure as Code (IaC) using Terraform
  • Solid understanding of cloud platforms including AWS, GCP, or Azure
  • Experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
  • Experience managing Kubernetes and containerized environments (Docker, Helm)
  • Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
  • Excellent leadership, communication, and collaboration skills
Job Responsibility
Job Responsibility
  • Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
  • Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
  • Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
  • Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
  • Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
  • Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
  • Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
  • Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
  • Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence
  • Fulltime
Read More
Arrow Right

SRE Lead Design & Support Engineer

This is a critical enabler achieving a high resiliency during operations and als...
Location
Location
Mexico , Miguel Hidalgo
Salary
Salary:
Not provided
pepsico.com Logo
Pepsico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of work experience evolving to a SRE engineer
  • 3-5 years of experience in continuously improving and transforming IT operations ways of working
  • Bachelor’s degree in Computer Science, Information Technology or a related field
  • Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
  • Highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams
  • A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes
  • Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
  • A firm understanding of cloud archticture for distributed environments
  • Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
  • Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
Job Responsibility
Job Responsibility
  • Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes
  • Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation
  • Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s
  • Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
  • Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
  • Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
  • Collaborates with Engineering & support teams, including participation in escalations, and blameless postmortems
  • Work closely with customer-facing support teams to empower them with SRE insights and tooling
  • Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
  • Continuously optimize the L2/support operations work via SRE workflow automation
What we offer
What we offer
  • Opportunities to learn and develop every day through a wide range of programs
  • Internal digital platforms that promote self-learning
  • Development programs according to Leadership skills
  • Specialized training according to the role
  • Learning experiences with internal and external providers
  • Recognition programs for seniority, behavior, leadership, moments of life, among others
  • Financial wellness programs that will help you reach your goals in all stages of life
  • A flexibility program that will allow you to balance your personal and work life, adapting your working day to your lifestyle
  • Wellness Line, thousands of Agreements and Discounts, Scholarship programs for your children, Aid Plans for different moments of life
Read More
Arrow Right

Lead Mainframe SRE

Location
Location
Greece , Athens
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or related field (or equivalent experience)
  • 6-8 years of experience in relevant roles
  • Extensive experience managing enterprise middleware and batch-processing platforms within large-scale production environments
  • Strong expertise in re-hosted middleware services and production-control ecosystems, including technologies such as MQ, CICS/TX, Stonebranch UAC, scheduling platforms, and cross-platform operational tooling
  • Strong experience supporting Linux-hosted middleware environments, DB2 LUW interfaces, queue and log monitoring, and disaster recovery operations
  • Strong understanding of operational standards, root cause analysis, change-window governance, service continuity, and platform resilience practices
  • Experience leading modernization, migration, or decommissioning initiatives within critical enterprise platforms
  • Proven ability to lead small technical teams and coordinate multiple stakeholders, vendors, and operational groups
  • Strong stakeholder-management and communication skills, with the ability to gather requirements and align technical solutions with business needs
  • Comfortable operating within highly critical, governance-driven, and cross-functional enterprise environments
Job Responsibility
Job Responsibility
  • Own the end-to-end technical service for re-platformed middleware and production-control platforms across production, non-production, and disaster recovery environments
  • Lead and mentor a small team of engineers, providing technical leadership, operational guidance, and coordination of day-to-day platform activities
  • Act as the primary technical point of contact for stakeholders, client representatives, and operational teams, gathering requirements and translating business needs into reliable technical solutions
  • Lead Stonebranch UAC and batch operations activities, including scheduling standards, post-batch support, bundle promotion governance, and resolution of cross-domain batch-processing issues
  • Drive service improvements, platform upgrades, cluster and scheduler health initiatives, and operational standards across highly critical enterprise services
  • Act as the command point for high-severity incidents, out-of-hours change windows, root cause analysis activities, and cross-team operational coordination
  • Coordinate closely with IBM, Stonebranch, infrastructure teams, and application owners to ensure operational continuity and service resilience
  • Support disaster recovery readiness, service recovery testing, and operational governance activities across the platform landscape
  • Guide the controlled migration, modernization, or decommissioning of legacy batch-processing services while protecting production stability
  • Define and maintain standards for availability, operational excellence, monitoring, service governance, and platform reliability
What we offer
What we offer
  • Health insurance for the employee and one dependent family member (100% paid by NTT DATA)
  • Meal vouchers of 120€ per month (x12)
  • Corporate mobile phone: subscription & device
  • Teleworking equipment allowance
  • Internal Trainings Platform Account
  • Access to Open Up mental health service
  • 28 days of paid annual leave consisting of your legal holidays and compensation days
Read More
Arrow Right

Sre Team Lead (Fedramp / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
United States , Los Angeles
Salary
Salary:
230000.00 - 270000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience as a Team Lead / Tech Lead
  • At least 5 years of experience as a DevOps Engineer/ SRE in production environments
  • At least 2 years of experience Experience with FedRAMP compliance (High/Moderate levels), vulnerability management, and continuous monitoring, including scanning, patching, and reporting - Advantage
  • In-depth experience with Kubernetes - operating & monitoring are key parts
  • High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
  • Experience in AWS or other cloud providers
  • Experience with infrastructure as a code (Terraform, Crossplane, etc.)
  • Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
  • Some software engineering experience, preferably in Golang
  • An advantage - operating data pipelines
Job Responsibility
Job Responsibility
  • Lead and mentor a team of engineers, including hiring, onboarding, and performance management
  • Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
  • Adopt cutting edge technologies with end-to-end responsibility
  • Building internal tools to expand our platform capabilities
  • Collaborate with R&D to improve stability & reliability of the system
  • Lead the product roadmap - our product is designed for engineers. Therefore, our engineers promote, enhance, and take a crucial part in influencing the product roadmap
  • Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management
What we offer
What we offer
  • comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • paid sick time and paid time off
  • Fulltime
Read More
Arrow Right