CrawlJobs Logo

SRE Networking Team Lead

Germany, Berlin · Job Posted March 22, 2026
Apply Position
Job Link Share

Job Description

Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, trace and security events with features such as APM, RUM, SIEM, Kubernetes monitoring and more, all enhancing operational efficiency and reducing observability spend by up to 70%. The Networking Platform team absorbs networking complexity so product teams can ship safely without thinking about traffic, connectivity, or network failure modes. We are Looking for a Team Lead to the Networking Platform team to own the reliability of Coralogix’s network layer end-to-end. This is a reliability-first leadership role, responsible for ingress, egress, and traffic behavior across a large-scale, distributed system. You will lead a high-impact team, with a clear mandate to bring structure, standards, and automation - and to reduce firefighting over time.

Job Responsibility

  • Lead and develop a distributed team of Engineers
  • End-to-end reliability of the network layer (ingress, egress, service mesh, traffic flows)
  • Final accountability for network-related failure modes and incidents
  • Traffic-related SLOs and reliability outcomes
  • Identifying and addressing network-level cost anomalies
  • Defining and enforcing networking standards and guardrails

Requirements

  • Experience leading DevOps, SRE, or Platform teams
  • Strong background in Kubernetes and knowledgeable on Networking standards (CNI, ingress/egress, network policies)
  • Hands-on experience with service mesh technologies (Istio, Envoy, or similar)
  • Solid understanding of cloud networking (AWS: VPCs, PrivateLink, NAT, multi-AZ)
  • Experience operating systems through incidents, outages, and shifting priorities
  • Strong communication skills - able to explain complex networking topics clearly

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

SRE Networking Team Lead

8 matching positions

SRE Networking Team Lead

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience leading DevOps, SRE, or Platform teams
  • Strong background in Kubernetes and knowledgeable on Networking standards (CNI, ingress/egress, network policies)
  • Hands-on experience with service mesh technologies (Istio, Envoy, or similar)
  • Solid understanding of cloud networking (AWS: VPCs, PrivateLink, NAT, multi-AZ)
  • Experience operating systems through incidents, outages, and shifting priorities
  • Strong communication skills - able to explain complex networking topics clearly
Job Responsibility
Job Responsibility
  • Lead and develop a distributed team of Engineers
  • End-to-end reliability of the network layer (ingress, egress, service mesh, traffic flows)
  • Final accountability for network-related failure modes and incidents
  • Traffic-related SLOs and reliability outcomes
  • Identifying and addressing network-level cost anomalies
  • Defining and enforcing networking standards and guardrails
  • Fulltime
Read More
Arrow Right

Sre Team Lead (Fedramp / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
United States , Los Angeles
Salary
Salary:
230000.00 - 270000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience as a Team Lead / Tech Lead
  • At least 5 years of experience as a DevOps Engineer/ SRE in production environments
  • At least 2 years of experience Experience with FedRAMP compliance (High/Moderate levels), vulnerability management, and continuous monitoring, including scanning, patching, and reporting - Advantage
  • In-depth experience with Kubernetes - operating & monitoring are key parts
  • High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
  • Experience in AWS or other cloud providers
  • Experience with infrastructure as a code (Terraform, Crossplane, etc.)
  • Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
  • Some software engineering experience, preferably in Golang
  • An advantage - operating data pipelines
Job Responsibility
Job Responsibility
  • Lead and mentor a team of engineers, including hiring, onboarding, and performance management
  • Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
  • Adopt cutting edge technologies with end-to-end responsibility
  • Building internal tools to expand our platform capabilities
  • Collaborate with R&D to improve stability & reliability of the system
  • Lead the product roadmap - our product is designed for engineers. Therefore, our engineers promote, enhance, and take a crucial part in influencing the product roadmap
  • Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management
What we offer
What we offer
  • comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • paid sick time and paid time off
  • Fulltime
Read More
Arrow Right

Sre Team Lead (Fedramp / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
United States , New York
Salary
Salary:
230000.00 - 270000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience as a Team Lead / Tech Lead
  • At least 5 years of experience as a DevOps Engineer/ SRE in production environments
  • In-depth experience with Kubernetes - operating & monitoring are key parts
  • High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
  • Experience in AWS or other cloud providers
  • Experience with infrastructure as a code (Terraform, Crossplane, etc.)
  • Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
  • Some software engineering experience, preferably in Golang.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of engineers, including hiring, onboarding, and performance management
  • Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
  • Adopt cutting edge technologies with end-to-end responsibility
  • Building internal tools to expand our platform capabilities
  • Collaborate with R&D to improve stability & reliability of the system
  • Lead the product roadmap
  • Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management.
What we offer
What we offer
  • Healthcare benefits
  • Dental benefits
  • Mental health benefits
  • 401(k) plan and match
  • Paid sick time
  • Paid time off
  • Fulltime
Read More
Arrow Right

SRE Team Lead (FedRAMP / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
United States , New York
Salary
Salary:
230000.00 - 270000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience as a Team Lead / Tech Lead
  • At least 5 years of experience as a DevOps Engineer/ SRE in production environments
  • At least 2 years of experience Experience with FedRAMP compliance (High/Moderate levels), vulnerability management, and continuous monitoring, including scanning, patching, and reporting - Advantage
  • In-depth experience with Kubernetes - operating & monitoring are key parts
  • High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
  • Experience in AWS or other cloud providers
  • Experience with infrastructure as a code (Terraform, Crossplane, etc.)
  • Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
  • Some software engineering experience, preferably in Golang.
  • An advantage - operating data pipelines
Job Responsibility
Job Responsibility
  • Lead and mentor a team of engineers, including hiring, onboarding, and performance management.
  • Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
  • Adopt cutting edge technologies with end-to-end responsibility
  • Building internal tools to expand our platform capabilities
  • Collaborate with R&D to improve stability & reliability of the system
  • Lead the product roadmap - our product is designed for engineers. Therefore, our engineers promote, enhance, and take a crucial part in influencing the product roadmap.
  • Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management.
What we offer
What we offer
  • comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits, a 401(k) plan and match, paid sick time and paid time off
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering (SRE) Team Lead

We are looking for a highly skilled and experienced Site Reliability Engineering...
Location
Location
United States , Irving
Salary
Salary:
Not provided
onemainfinancial.com Logo
OneMain Financial
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BA/BS in Computer Science, Engineering, related field, or equivalent experience
  • 7+ years of experience in site reliability engineering, systems engineering, or related roles, with at least 2 years in a leadership position
  • Proven experience leading and scaling high-performing engineering teams
  • Deep expertise in cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker)
  • Strong skills in infrastructure as code tools (Terraform, Ansible, CloudFormation) and CI/CD pipelines
  • Proficiency with monitoring and alerting systems (Prometheus, Grafana, ELK, Datadog)
  • Solid programming and scripting skills (Python, Go, Bash, or similar)
  • Strong understanding of distributed systems, networking, security, and databases
  • Excellent leadership, communication, and collaboration skills
  • Experience managing incident response and on-call rotations
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow a team of site reliability engineers, promoting a culture of reliability, automation, and continuous improvement
  • Drive the design, implementation, and maintenance of scalable and fault-tolerant infrastructure to support high-availability services
  • Oversee incident management processes, including triage, root cause analysis, and postmortems to improve system reliability and prevent recurrence
  • Collaborate cross-functionally with software engineering, product, and operations teams to integrate reliability best practices into the software development lifecycle
  • Define and implement operational metrics, SLIs/SLOs, and dashboards to monitor system health and drive proactive improvements
  • Manage and assess the observability of critical environments proactively addressing gaps that may arise
  • Oversee the release management processes, artifacts and tools that drive a repeatable software delivery lifecycle
  • Champion automation efforts to reduce manual intervention, improve deployment pipelines, and optimize infrastructure management
  • Lead capacity planning, disaster recovery, and performance tuning efforts
  • Ensure security and compliance standards are upheld across infrastructure and operations
What we offer
What we offer
  • Health and wellbeing options including medical, prescription, dental, vision, hearing, accident, hospital indemnity, and life insurances
  • Up to 4% matching 401(k)
  • Employee Stock Purchase Plan (10% share discount)
  • Tuition reimbursement
  • Paid time off (15 days’ vacation per year, plus 2 personal days, prorated based on start date)
  • Paid sick leave as determined by state or local ordinance, prorated based on start date
  • Paid holidays (7 days per year, based on start date)
  • Paid volunteer time (3 days per year, prorated based on start date)
  • Access to Talkspace and Hinge for on-demand physical therapy via an app
  • Family back-up care
  • Fulltime
Read More
Arrow Right

Credit Risk Support Lead- SRE

Join Barclays as a Credit Risk Support Lead- SRE role, where to effectively moni...
Location
Location
India , Pune
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 14+ years’ experience in production support
  • High energy, hands-on and results & goal-oriented
  • Expertise in log debugging, root cause analysis and troubleshooting live issues
  • Experience on observability tools like ESaaS, AppD / ITRS , Netcool
  • Experience in data analysis to identify underlying themes impacting stability, performance, and customer experience
  • Ensures and promotes ITIL best practices for Incident, Problem, Change, Release management (including managing and running triages, conducting root cause analysis, post incident reviews etc)
  • Strong Credit Risk business knowledge
  • Negotiate SLAs/OLAs with customer and other support elements
  • Business (IT) Continuity Management
  • KPI reporting and monitoring
Job Responsibility
Job Responsibility
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients. Develop the support model and service offering to improve the service to customers and stakeholders.
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance.
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing.
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes.
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management.
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues.
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency. Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur.
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right

Credit Risk Support Lead- SRE

Embark on a transformative journey as a Credit Risk Support Lead-SRE. At Barclay...
Location
Location
United States , Whippany
Salary
Salary:
150000.00 - 215000.00 USD / Year
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Good domain knowledge with end-to-end responsibility of IT services, including day-to-day operations, incidents and changes
  • Robust understanding of regulatory compliance, risk frameworks, audit, and metric monitoring of service health and control effectiveness
  • Overseeing support teams, effective delegation, and communication with business users and senior stakeholders
  • Ability to prioritize issues, refine support procedures, and drive continuous improvement across RTB and support processes
  • Solid understanding of the software development lifecycle and how application support integrates to enhance delivery, stability, and reliability
Job Responsibility
Job Responsibility
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients
  • Develop the support model and service offering to improve the service to customers and stakeholders
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
  • Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur
What we offer
What we offer
  • Medical coverage
  • Dental coverage
  • Vision coverage
  • 401(k)
  • Life insurance
  • Paid leave
  • Incentive award
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Fulltime
Read More
Arrow Right

Lead SRE

We have a 6 month contract to hire for a senior, hands-on Site Reliability Engin...
Location
Location
United States , St Louis
Salary
Salary:
Not provided
zeektek.com Logo
Zeektek
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • AWS Certified DevOps Engineer – Professional
  • Dynatrace Professional
  • One SaaS tool certifications (Prometheus Certified Associate (PCA), Datadog, New Relic)
  • 7+ years in SRE/Production Engineering/Platform roles
  • 2+ years leading initiatives or teams
  • Strong in Linux, networking fundamentals (HTTP, TLS, DNS, TCP), and distributed systems concepts
  • Proficiency with Go, Python, Shell Scripting, SQL, Java or JVM, JavaScript/TypeScript, YAML/HCL/JSON
  • Hands-on with IaC (Terraform) and CI/CD (GitLab CI, GitHub Actions, AWS/Azure DevOps)
  • Deep experience in AWS Cloud infrastructure
Job Responsibility
Job Responsibility
  • Lead SRE to drive reliability, scalability, observability (monitoring & alerts) and performance across the production platforms
  • Own the SLO/SLI strategy, modernize observability and incident response, and partner with application teams to deliver resilient systems
  • Define and govern SLOs/SLIs/Error Budgets for critical services
  • enforce guardrails and drive reliability roadmaps
  • Lead performance tuning collaboration with application teams to ensure high availability and low latency
  • Define and own infrastructure tuning to ensure scalability leading to high availability
  • Lead Metrics and automation driven Reliability
  • Dedug systems across layers
  • Architect and evolve CI/CD, infrastructure-as-code (IaC- Terraform)
  • Design and build serverless APIs (Lambda, API Gateway, SQS, SNS, DynamoDB, etc.)
What we offer
What we offer
  • Weekly Direct Deposit
  • 401K Matching
  • Competitive medical, dental and vision insurance
  • Consistent communication throughout your project
  • ZeekTek Referral Program
  • Fulltime
Read More
Arrow Right