CrawlJobs Logo

Senior Service Reliability Engineer

Plusnet

Location Icon

Location:
India , Gurugram

Category Icon
Category:
IT - Administration

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The role holder will be required to support Incident Management data team for Lloyds Banking Group. The role holder will be working 2nd line support engineer responsible for diagnosing break fix incidents and resolve medium and complex incidents in the right first place. If the incident is not fixed, engineer is responsible to coordinate with 3rd line team for a resolution. Also, they are responsible to work with third-party vendors like Cisco, Phoenix, checkpoint in raising TAC cases for troubleshooting complex technical and Hardware issues.

Job Responsibility:

  • Support Incident Management data team for Lloyds Banking Group
  • Working as 2nd line support engineer responsible for diagnosing break fix incidents and resolve medium and complex incidents in the right first place
  • If the incident is not fixed, coordinate with 3rd line team for a resolution
  • Work with third-party vendors like Cisco, Phoenix, checkpoint in raising TAC cases for troubleshooting complex technical and Hardware issues
  • Troubleshooting and maintaining data good understanding of database functionality and data management processes
  • To enhance and modify existing programs and initiatives so that effectiveness is improved with every cycle
  • Attempt first time fix, dealing with customer, suppliers as necessary
  • Escalate support calls to the appropriate 3rd line team if unable to resolve the incident
  • Demonstrate high levels of customer service when dealing with clients. Look to exceed expectation
  • Update customers by telephone or e-mail on the progress of a support call or to ask for additional information
  • Liaise with Vendors and Third parties for circuit or hardware issues and escalate as per SLA's defined to restore services in minimal possible time
  • Handle incoming emails from customers ensuring they are acted upon in a timely manner
  • Escalate service exceptions and high priority incident tickets appropriately within the business
  • Liaise effectively with colleagues and stakeholders to meet customer requirements

Requirements:

  • 5 - 9 years of experience in Network operations and Data Networking, especially in the enterprise network environment
  • Excellent hands-on experience on Data Networking technologies and products
  • Experience in implementation, Fault fix restoration activity, Remote support to onsite engineer and fault analysis
  • Should have exposure on ITIL based ticketing tool and good understanding to maintain network uptime and SLA for LAN, WAN, WLAN
  • Troubleshooting Routing protocols i.e. EIGRP, RIP, OSPF and BGP
  • Carrying-out re-engineering of network designs, trouble shooting and problem resolution of elusive customer network difficulties
  • Implementing pre-approved and T1 changes
  • Good hands-on experience on the following technologies: Routing protocols - EIGRP, OSPF, BGP, RIP
  • LAN Switching - VLAN, VTP, STP, Ether Channel etc
  • WAN - MPLS, MPLS/VPN, MP-BGP
  • TCP/IP protocol suite, Quality of Service
  • Application protocols - HTTP, HTTPS, FTP, SMTP, SNMP, SSL etc.
  • Network Security - IPSec VPN, AAA Architecture, TACACS+, RADIUS
  • WLAN - Cisco Prime, access points and WLC
  • SDWAN

Nice to have:

  • Desirable Certifications: CCNA
  • CCNP
  • ITIL foundation

Additional Information:

Job Posted:
December 26, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Service Reliability Engineer

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
  • Applicants must be authorized to work for any employer in the U.S.
  • Unable to sponsor or take over sponsorship of an employment visa at this time.
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.
What we offer
What we offer
  • Support for Parents
  • Continuing Education/Professional Development
  • Employee Health & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • Medical and dental coverage starting day one
  • Insurance coverage for basic life, accident, short-term and long-term disability
  • Business travel accident insurance
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are looking for a Senior Site Reliability Engineer who is passionate about sc...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring, tweaking dashboards, defining alerts, writing runbooks, etc.
  • 5+ years of hands on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Strong emphasis to debug, improve code, and automate routine tasks
  • Strong backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
  • An ability and desire to mentor and coach engineers
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Architect, develop, and troubleshoot large-scale infrastructure, maintain and im...
Location
Location
United States , San Francisco
Salary
Salary:
180960.00 - 230900.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Software Engineering, Information Technology or a closely related field
  • four years of experience as a Site Reliability Engineer architecting, developing, and troubleshooting large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash
  • networking technologies such as TCP/IP or security
  • four years of experience in automation development and infrastructure as code implementation using tools such as Terraform, AWS CloudFormation, Ansible, or Salt
  • knowledge of Linux and Windows systems
  • cloud technologies within AWS, GCP, Azure
  • continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices
  • must pass technical interview
Job Responsibility
Job Responsibility
  • Architect, develop, and troubleshoot large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash and networking technologies such as TCP/IP or security
  • provide real-time feedback on production systems
  • work with product family and platform developers to maintain and improve services and performance with a strong customer focus
  • utilize a variety of data collection, enrichment, analytics, and visualizations to support our complex systems
  • responsible for automation development and infrastructure-as-code implementation using tools such as Terraform, AWS CloudFormation, Ansible, and/or Salt
  • build solutions to enhance availability, performance, and stability for hundreds of Atlassian enterprise customers in the cloud as well as automate repetitive work
  • help secure the cloud architecture with penetration testing, vulnerability resolution, and compliance audit responses
  • responsible for continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices
What we offer
What we offer
  • Health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Baxter International is seeking a skilled Senior Principal Site Reliability Engi...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes
What we offer
What we offer
  • Healthcare benefits
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Flexible Spending Accounts
  • Educational assistance programs
  • Paid holidays
  • Paid time off
  • Paid parental leave
  • Commuting benefits
  • Employee Discount Program
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Site Reliability

Babylist is looking for a Senior Software Engineer, Site Reliability to join our...
Location
Location
United States; Canada
Salary
Salary:
186818.00 - 224183.00 USD; CAD / Year
babylist.com Logo
Babylist
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience as a Site Reliability Engineer or similar role
  • Experience supporting high-traffic consumer-facing websites
  • Proficiency with Terraform
  • Strong experience working with AWS cloud-based infrastructure and services
  • Proficiency with Docker and Kubernetes
  • Solid understanding of cloud-native systems design
  • Troubleshooting and debugging skills
  • Experience designing and supporting CI systems
  • Familiar with monitoring and alerting best practices
  • Proven experience in on-call management best practices
Job Responsibility
Job Responsibility
  • Manage and build our AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform
  • Improve the speed and reliability of our Continuous Integration (CI) systems
  • Provide support to developers in troubleshooting issues
  • Establish, communicate, and support best practices for monitoring and alerting
What we offer
What we offer
  • Company-paid medical, dental, and vision insurance
  • Retirement savings plan with company matching and flexible spending accounts
  • Generous paid parental leave and PTO
  • Remote work stipend
  • Perks for physical, mental, and emotional health, parenting, childcare, and financial planning
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Observability and Reliability

We are growing the engineering team and looking for engineers who have the chops...
Location
Location
United States , New York City
Salary
Salary:
150000.00 - 220000.00 USD / Year
sigmacomputing.com Logo
Sigma Computing
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Computer Science fundamentals
  • 5+ years industry experience building and maintaining high-quality software, especially software other engineers use
  • You apply a product mindset to infrastructure systems and feel accomplished enabling others
  • Desire to be a great teammate and have fun at work
  • Strong sense of craftsmanship, and a healthy academic curiosity
Job Responsibility
Job Responsibility
  • Build observability tools and platforms, including: metrics, logging, distributed tracing, dashboarding, alerting, application performance management
  • Build with modern tools and languages like Go, Open Telemetry and Kubernetes
  • Participate in on-call rotation and ensure uptime of services
  • Create runtime tools/processes that optimize cloud triaging and limit downtime
  • Define best practices around making our systems and services measurable
  • Collaborate with peers and stakeholders through design and code reviews to ensure best practices amongst available technologies. We expect successful candidates to be coding a majority of their time
What we offer
What we offer
  • Equity
  • Generous health benefits
  • Flexible time off policy. Take the time off you need!
  • Paid bonding time for all new parents
  • Traditional and Roth 401k
  • Commuter and FSA benefits
  • Lunch Program
  • Dog friendly office
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Senior Site Reliability Engineer role in HSBC's Digital Business Services organi...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
https://www.hsbc.com Logo
HSBC
Expiration Date
December 31, 2025
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Information Technology, or a related field
  • Minimum of 5 years of experience in site reliability engineering, software development, or systems engineering, preferably in financial services
  • Proficiency in Python, Go, Java, or Ruby for automation
  • Deep knowledge of Linux/Unix systems
  • Expertise in AWS, Azure, or GCP, and Infrastructure as Code tools like Terraform or Ansible
  • Experience with Docker and Kubernetes
  • Proficiency in Prometheus, Grafana, Splunk, or Datadog
  • Familiarity with Jenkins, GitLab CI, or GitHub Actions
  • Knowledge of TCP/IP, DNS, and load balancing
  • Experience with chaos engineering tools
Job Responsibility
Job Responsibility
  • Design, develop, and implement automation tools and scripts to reduce manual operational tasks
  • Ensure high availability (e.g., 99.99% uptime) of critical banking applications
  • Conduct capacity planning and chaos engineering
  • Participate in on-call rotations to respond to production incidents
  • Collaborate with production support teams for rapid incident resolution
  • Work with application development teams to embed reliability practices into SDLC
  • Engage with operation resilience project team for regulatory compliance
  • Coordinate with global and regional SRE and DevOps teams
  • Implement and maintain monitoring solutions
  • Drive continuous improvement in reliability practices
What we offer
What we offer
  • Continuous professional development
  • Flexible working
  • Opportunities to grow within inclusive and diverse environment
  • Fulltime
!
Read More
Arrow Right

Senior Site Reliability Engineer

Digital Business Services (DBS) Our GCIO organisation plays a critical role for ...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
https://www.hsbc.com Logo
HSBC
Expiration Date
December 31, 2025
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Information Technology, or a related field. Advanced degrees or certifications (e.g., ITIL, AWS Certified Solutions Architect, Google SRE) are a plus
  • Minimum of 5 years of experience in site reliability engineering, software development, or systems engineering, preferably in a financial services environment
  • Proven experience in automating operational processes and managing high-availability systems
  • Experience collaborating with production support, application development, and global teams in a distributed environment
  • Programming: Proficiency in Python, Go, Java, or Ruby for automation and tool development
  • Systems: Deep knowledge of Linux/Unix systems for administration, performance tuning, and debugging
  • Cloud and Infrastructure: Expertise in AWS, Azure, or GCP, and Infrastructure as Code (IaC) tools like Terraform or Ansible
  • Containerization: Experience with Docker and Kubernetes for managing containerized banking applications
  • Monitoring: Proficiency in Prometheus, Grafana, Splunk, or Datadog for observability and performance monitoring
  • CI/CD: Familiarity with Jenkins, GitLab CI, or GitHub Actions for integrating reliability into deployment pipelines
Job Responsibility
Job Responsibility
  • Design, develop, and implement automation tools and scripts to reduce manual operational tasks ("toil") and enhance system resilience
  • Ensure high availability (e.g., 99.99% uptime) of critical banking applications, including core banking, payment systems, and global platforms/local system
  • Conduct capacity planning and chaos engineering to test and improve system resilience under failure conditions
  • Participate in on-call rotations to respond to production incidents, troubleshoot issues, and conduct post-mortems to prevent recurrence
  • Collaborate with production support teams for rapid incident resolution and escalate complex issues to application teams or vendors as needed
  • Work closely with production support teams to streamline incident handling and integrate automated solutions into support processes
  • Partner with application development teams to embed reliability practices into the software development lifecycle (SDLC)
  • Engage with the bank's operation resilience project team to align on initiatives for regulatory compliance, disaster recovery, and system robustness
  • Coordinate with global and regional SRE and DevOps teams to ensure consistency in tools, processes, and standards across distributed banking systems
  • Implement and maintain monitoring solutions to track service-level indicators (SLIs) and ensure service-level objectives (SLOs) are met
  • Fulltime
!
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.