SRE Team Lead Job at Venzo Technologies (Chennai)

Sre Team Lead (Fedramp / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...

Location

United States , Los Angeles

Salary:

230000.00 - 270000.00 USD / Year

Coralogix

Expiration Date

Until further notice

Requirements

2+ years of experience as a Team Lead / Tech Lead
At least 5 years of experience as a DevOps Engineer/ SRE in production environments
At least 2 years of experience Experience with FedRAMP compliance (High/Moderate levels), vulnerability management, and continuous monitoring, including scanning, patching, and reporting - Advantage
In-depth experience with Kubernetes - operating & monitoring are key parts
High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
Experience in AWS or other cloud providers
Experience with infrastructure as a code (Terraform, Crossplane, etc.)
Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
Some software engineering experience, preferably in Golang
An advantage - operating data pipelines

Job Responsibility

Lead and mentor a team of engineers, including hiring, onboarding, and performance management
Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
Adopt cutting edge technologies with end-to-end responsibility
Building internal tools to expand our platform capabilities
Collaborate with R&D to improve stability & reliability of the system
Lead the product roadmap - our product is designed for engineers. Therefore, our engineers promote, enhance, and take a crucial part in influencing the product roadmap
Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management

What we offer

comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
401(k) plan and match
paid sick time and paid time off

Fulltime

Sre Team Lead (Fedramp / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...

Location

United States , New York

Salary:

230000.00 - 270000.00 USD / Year

Coralogix

Expiration Date

Until further notice

Requirements

2+ years of experience as a Team Lead / Tech Lead
At least 5 years of experience as a DevOps Engineer/ SRE in production environments
In-depth experience with Kubernetes - operating & monitoring are key parts
High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
Experience in AWS or other cloud providers
Experience with infrastructure as a code (Terraform, Crossplane, etc.)
Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
Some software engineering experience, preferably in Golang.

Job Responsibility

Lead and mentor a team of engineers, including hiring, onboarding, and performance management
Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
Adopt cutting edge technologies with end-to-end responsibility
Building internal tools to expand our platform capabilities
Collaborate with R&D to improve stability & reliability of the system
Lead the product roadmap
Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management.

What we offer

Healthcare benefits
Dental benefits
Mental health benefits
401(k) plan and match
Paid sick time
Paid time off

Fulltime

SRE Team Lead (FedRAMP / Security)

Coralogix is a modern, full-stack observability platform transforming how busine...

Location

United States , New York

Salary:

230000.00 - 270000.00 USD / Year

Coralogix

Expiration Date

Until further notice

Requirements

2+ years of experience as a Team Lead / Tech Lead
At least 5 years of experience as a DevOps Engineer/ SRE in production environments
At least 2 years of experience Experience with FedRAMP compliance (High/Moderate levels), vulnerability management, and continuous monitoring, including scanning, patching, and reporting - Advantage
In-depth experience with Kubernetes - operating & monitoring are key parts
High familiarity with monitoring tools such as Coralogix, Grafana, Prometheus
Experience in AWS or other cloud providers
Experience with infrastructure as a code (Terraform, Crossplane, etc.)
Understanding of networking - from networking layers to different networking protocols (http, grpc, ssl)
Some software engineering experience, preferably in Golang.
An advantage - operating data pipelines

Job Responsibility

Lead and mentor a team of engineers, including hiring, onboarding, and performance management.
Work in high scale environments - Coralogix data pipeline processes 55Tb of data each day
Adopt cutting edge technologies with end-to-end responsibility
Building internal tools to expand our platform capabilities
Collaborate with R&D to improve stability & reliability of the system
Lead the product roadmap - our product is designed for engineers. Therefore, our engineers promote, enhance, and take a crucial part in influencing the product roadmap.
Perform operational duties for FedRAMP cloud products, including deployments, on-call support, and incident management.

What we offer

comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits, a 401(k) plan and match, paid sick time and paid time off

Fulltime

Site Reliability Engineering (SRE) Team Lead

We are looking for a highly skilled and experienced Site Reliability Engineering...

Location

United States , Irving

Salary:

Not provided

OneMain Financial

Expiration Date

Until further notice

Requirements

BA/BS in Computer Science, Engineering, related field, or equivalent experience
7+ years of experience in site reliability engineering, systems engineering, or related roles, with at least 2 years in a leadership position
Proven experience leading and scaling high-performing engineering teams
Deep expertise in cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker)
Strong skills in infrastructure as code tools (Terraform, Ansible, CloudFormation) and CI/CD pipelines
Proficiency with monitoring and alerting systems (Prometheus, Grafana, ELK, Datadog)
Solid programming and scripting skills (Python, Go, Bash, or similar)
Strong understanding of distributed systems, networking, security, and databases
Excellent leadership, communication, and collaboration skills
Experience managing incident response and on-call rotations

Job Responsibility

Lead, mentor, and grow a team of site reliability engineers, promoting a culture of reliability, automation, and continuous improvement
Drive the design, implementation, and maintenance of scalable and fault-tolerant infrastructure to support high-availability services
Oversee incident management processes, including triage, root cause analysis, and postmortems to improve system reliability and prevent recurrence
Collaborate cross-functionally with software engineering, product, and operations teams to integrate reliability best practices into the software development lifecycle
Define and implement operational metrics, SLIs/SLOs, and dashboards to monitor system health and drive proactive improvements
Manage and assess the observability of critical environments proactively addressing gaps that may arise
Oversee the release management processes, artifacts and tools that drive a repeatable software delivery lifecycle
Champion automation efforts to reduce manual intervention, improve deployment pipelines, and optimize infrastructure management
Lead capacity planning, disaster recovery, and performance tuning efforts
Ensure security and compliance standards are upheld across infrastructure and operations

What we offer

Health and wellbeing options including medical, prescription, dental, vision, hearing, accident, hospital indemnity, and life insurances
Up to 4% matching 401(k)
Employee Stock Purchase Plan (10% share discount)
Tuition reimbursement
Paid time off (15 days’ vacation per year, plus 2 personal days, prorated based on start date)
Paid sick leave as determined by state or local ordinance, prorated based on start date
Paid holidays (7 days per year, based on start date)
Paid volunteer time (3 days per year, prorated based on start date)
Access to Talkspace and Hinge for on-demand physical therapy via an app
Family back-up care

Fulltime

SRE Networking Team Lead

Coralogix is a modern, full-stack observability platform transforming how busine...

Location

Germany , Berlin

Salary:

Not provided

Coralogix

Expiration Date

Until further notice

Requirements

Experience leading DevOps, SRE, or Platform teams
Strong background in Kubernetes and knowledgeable on Networking standards (CNI, ingress/egress, network policies)
Hands-on experience with service mesh technologies (Istio, Envoy, or similar)
Solid understanding of cloud networking (AWS: VPCs, PrivateLink, NAT, multi-AZ)
Experience operating systems through incidents, outages, and shifting priorities
Strong communication skills - able to explain complex networking topics clearly

Job Responsibility

Lead and develop a distributed team of Engineers
End-to-end reliability of the network layer (ingress, egress, service mesh, traffic flows)
Final accountability for network-related failure modes and incidents
Traffic-related SLOs and reliability outcomes
Identifying and addressing network-level cost anomalies
Defining and enforcing networking standards and guardrails

Fulltime

SRE Networking Team Lead

Coralogix is a modern, full-stack observability platform transforming how busine...

Location

Germany , Berlin

Salary:

Not provided

Coralogix

Expiration Date

Until further notice

Requirements

Experience leading DevOps, SRE, or Platform teams
Strong background in Kubernetes and knowledgeable on Networking standards (CNI, ingress/egress, network policies)
Hands-on experience with service mesh technologies (Istio, Envoy, or similar)
Solid understanding of cloud networking (AWS: VPCs, PrivateLink, NAT, multi-AZ)
Experience operating systems through incidents, outages, and shifting priorities
Strong communication skills - able to explain complex networking topics clearly

Job Responsibility

Lead and develop a distributed team of Engineers
End-to-end reliability of the network layer (ingress, egress, service mesh, traffic flows)
Final accountability for network-related failure modes and incidents
Traffic-related SLOs and reliability outcomes
Identifying and addressing network-level cost anomalies
Defining and enforcing networking standards and guardrails

Fulltime

Credit Risk Support Lead- SRE

Join Barclays as a Credit Risk Support Lead- SRE role, where to effectively moni...

Location

India , Pune

Salary:

Not provided

Barclays

Expiration Date

Until further notice

Requirements

14+ years’ experience in production support
High energy, hands-on and results & goal-oriented
Expertise in log debugging, root cause analysis and troubleshooting live issues
Experience on observability tools like ESaaS, AppD / ITRS , Netcool
Experience in data analysis to identify underlying themes impacting stability, performance, and customer experience
Ensures and promotes ITIL best practices for Incident, Problem, Change, Release management (including managing and running triages, conducting root cause analysis, post incident reviews etc)
Strong Credit Risk business knowledge
Negotiate SLAs/OLAs with customer and other support elements
Business (IT) Continuity Management
KPI reporting and monitoring

Job Responsibility

Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients. Develop the support model and service offering to improve the service to customers and stakeholders.
Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance.
Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing.
Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes.
Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management.
Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues.
Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency. Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur.

What we offer

Competitive holiday allowance
Life assurance
Private medical care
Pension contribution

Fulltime

Credit Risk Support Lead- SRE

Embark on a transformative journey as a Credit Risk Support Lead-SRE. At Barclay...

Location

United States , Whippany

Salary:

150000.00 - 215000.00 USD / Year

Barclays

Expiration Date

Until further notice

Requirements

Good domain knowledge with end-to-end responsibility of IT services, including day-to-day operations, incidents and changes
Robust understanding of regulatory compliance, risk frameworks, audit, and metric monitoring of service health and control effectiveness
Overseeing support teams, effective delegation, and communication with business users and senior stakeholders
Ability to prioritize issues, refine support procedures, and drive continuous improvement across RTB and support processes
Solid understanding of the software development lifecycle and how application support integrates to enhance delivery, stability, and reliability

Job Responsibility

Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients
Develop the support model and service offering to improve the service to customers and stakeholders
Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance
Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing
Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues
Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management
Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues
Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur

What we offer

Medical coverage
Dental coverage
Vision coverage
401(k)
Life insurance
Paid leave
Incentive award
Competitive holiday allowance
Life assurance
Private medical care

Fulltime

Select Country

SRE Team Lead

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

SRE Team Lead

Sre Team Lead (Fedramp / Security)

Sre Team Lead (Fedramp / Security)

SRE Team Lead (FedRAMP / Security)

Site Reliability Engineering (SRE) Team Lead

SRE Networking Team Lead

SRE Networking Team Lead

Credit Risk Support Lead- SRE

Credit Risk Support Lead- SRE

Our AI answers in your language