Site Reliability Engineering Specialist Job at Plusnet (Bengaluru)

Site Reliability Engineering Specialist

Professional Services was formed as a progressive development towards the conver...

Location

United Kingdom , Snowhill, Birmingham; Ipswich; London

Salary:

Not provided

Plusnet

Expiration Date

Until further notice

Requirements

A strong understanding of multi-vendor IP/MPLS networks (Nokia, Cisco, Juniper etc)
A strong understanding of network routing protocols such as IS-IS, LDP, RSVP, segment routing, OSPF, eBGP, iBGP, MP-BGP
A strong understanding of fundamental protocols such as DNS, DHCP & NTP
A strong understanding of network change & incident management best practice
A good understanding of Linux operating systems
An intermediate level of proficiency in atleast one programming language preferably Python
You will be confident and professional in communicating with all stakeholders, both locally and with members of the Senior Management Team.
You will have the ability to work in a high-pressure environment.

Job Responsibility

Builds network engineering change processes for complex end-to-end technology introduction in the live network, utilising automation & CI/CD pipelines.
Leads on major incident resolution acting as a final technical escalation point within BT.
Leads blameless post‑incident reviews to uncover systemic root causes and convert learnings into concrete reliability, automation, and process improvements.
Champions a reliability‑first change culture, promoting safe deployment patterns, blameless learning, and continuous improvement across engineering teams.
Collaborates with design & platform teams to support the implementation of flawless change into the live network.
Acts as a subject matter expert within the network engineering domain. Applying this expertise to troubleshoot faults on our infrastructure crossing multiple platform domains.
Embeds secure by design principles when building new change processes and solutions.
Will champion and build effective working relationships, both internally and externally to deliver business outcomes.
Champions the adoption of Site Reliability Engineering practices within Professional Services, driving cultural change towards automation, observability, and reduced operational toil.

What we offer

Tailored training and development opportunities to continue to build your career
10% on target bonus
25 days’ annual leave (not including bank holidays), increasing with service
Life Assurance
Pension scheme - If you pay in a minimum of 5% of your pensionable salary every month we will pay in 10%
Direct Share scheme
Option to join the Healthcare Cash Plan or other benefits such as dental insurance, gym memberships etc.
50% off EE mobile pay monthly or SIM only plans
Exclusive colleague discounts on our latest and greatest BT broadband packages BT TV, including TNT Sports and NOW entertainment
Shared Parental leave - maximum amount of leave you can share with your partner is 50 weeks

Fulltime

Site Reliability Engineering Specialist

BTI Professionals provide expert third-line reliability and operational support ...

Location

Hungary , Budapest

Salary:

Not provided

Plusnet

Expiration Date

Until further notice

Requirements

Experience supporting large-scale, high-availability services in an ISP / NaaS / network-centric environment
Strong Linux troubleshooting and systems knowledge
Hands-on Kubernetes experience operating applications in production
Experience delivering changes using GitOps and CI/CD pipelines (including release validation and rollback awareness)
Working knowledge of incident/problem management in ServiceNow and delivery tracking in Jira (Scrum / PI planning)
Experience with observability tooling: Dynatrace, Prometheus, Elasticsearch, plus event/messaging platforms such as Kafka
Solid networking fundamentals to support effective troubleshooting
Automation experience with Ansible and at least one of Python / Go / Bash
Experience integrating or operating services with LDAP (authentication/authorisation, troubleshooting access issues)

Job Responsibility

Provide SRE ownership for the Global Fabric NaaS service, ensuring availability, performance, and resilience
Support safe, automated change into production using CI/CD, GitOps, and automated testing
Operate and improve monitoring and observability using Dynatrace, Prometheus, and Elasticsearch
Troubleshoot incidents across Kubernetes-hosted applications, Linux systems, networking, and service integrations
Act as a third-line escalation point, participating in a 24x7 on-call rota
Manage incidents via ServiceNow and track defects and improvements in Jira
Contribute to Scrum ceremonies and PI planning, supporting Agile delivery
Drive automation using Ansible and scripting to reduce operational toil
Mentor and support L2 engineers, improving runbooks, troubleshooting practices, and operational readiness

What we offer

Cafeteria package - HUF 600,000/ year
Performance-based bonus
Comprehensive private health care package for all the employees, which can be extended to family members
Nursery support for mothers returning from maternity
Extended paternity leave: 10+10 day fully paid days
Commuting allowance
Home office allowance
Employee discount opportunities
Highly affordable mobile packages for the family as well
Car allowance

Fulltime

Site Reliability Engineering Specialist

This role will specialise in system administration and server management with a ...

Location

United Kingdom , Birmingham

Salary:

Not provided

Plusnet

Expiration Date

Until further notice

Requirements

Experience in an ISP Environment: Proven experience in a fast-paced ISP setting, managing and troubleshooting large-scale networks
Sysadmin/Server Management: Strong skills in system administration, server management, and compute resources with experience in deploying and managing containerised applications using orchestration tools such as Kubernetes
Technical Proficiency: Strong understanding of network architecture, design, and implementation
Monitoring and Logging Solutions: Familiarity with monitoring and logging solutions such as Elastic search, Apache Kafka, and Prometheus
Programming Proficiency: Proficiency in at least one programming language, such as Python, Ansible or Go
Growth Mindset: Self-driven attitude towards learning new skills and aiding the development of others

Job Responsibility

Network Delivery: Support the Implementation of flawless change into the live network, utilising automation and CI/CD pipelines
Network Monitoring: Configure, maintain, and monitor systems and network infrastructure to ensure optimal health, performance, and reliability
Automation Tools: Utilise tools such as Ansible to provision and manage infrastructure resources in a scalable and efficient manner
Technical Acumen: Apply your understanding of network principles to troubleshoot network faults within our systems and look at how you can optimise performance and enhance security across our infrastructure
Incident Management and Resolution: Be prepared to support a 365x24/7 callout, providing third line technical resolution covering an extensive range of technologies
Customer Focus: Be a technical expert who understands the end-to-end journey of our customers
Growth and Development: As a technically talented expert you should enhance the brand of the team and support those around you to be accountable and perform at their best

What we offer

Competitive salary
10% on target bonus
BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
25 days annual leave (not including bank holidays), increasing with service
Huge range of flexible benefits including cycle to work, healthcare, season ticket loan
World-class training and development opportunities
Option to join BT Shares Saving schemes
Discounted broadband, mobile and TV packages
Access to 100’s of retail discounts including the BT shop
On call allowances and overtime

Fulltime

AI Platform Site Reliability Engineering Specialist

The AI Platform Site Reliability Engineering Specialist will operate and maintai...

Location

India , Bengaluru

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science or related field, or equivalent job experience
5 years of production experience in SRE / Infrastructure / ops for large-scale systems
Strong programming/scripting skills (Python, Go, Java, or equivalent)
Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
Networking and systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
Solid experience in capacity planning, performance tuning, scaling, and incident response
Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements

Job Responsibility

Operate, monitor, and maintain the infrastructure supporting GenAI applications ( training, inference, feature store, data ingestion, model serving)
Design and build automation for core platform capabilities, reducing manual toil
Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
Establish, monitor and enforce SLOs/SLIs/LSAs, error budgets, alerting, and dashboards
Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
Perform capacity planning, scaling strategies, workload scheduling and resource forecasting
Optimize cost vs. performance trade-offs in large-scale compute environments
Harden systems for security, compliance, auditability, and data governance
Collaborate across teams (cloud engineers, data engineers, infrastructure, security) to ensure safe deployment, rollout, rollback, and integration of new systems
Define disaster recover (DR) strategies, back/restore practices, fault tolerance mechanisms

Sr Engineering Specialist

The Automations and Electrical Controls Engineer has responsibility for overall ...

Location

United States , Aiken

Salary:

Not provided

Owens Corning

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Engineering
Five years of automation and control engineering experience in a manufacturing environment
Strong safety awareness, commitment and safety leadership
Experience working with 480 V
Experience leading projects (Capital, Focused Improvement)
Strong knowledge of PLC based controls, HMI applications, and programming (Siemens)
Availability to work nights, weekends, and holidays as required by operational support needs

Job Responsibility

Lead Safety for an injury free work environment
Educates team members on safe maintenance work processes and procedures
Adheres to, and continuously improves, all Plant and position-specific safety policies, procedures, and standards
Ensures a safe, clean and environmentally compliant work environment and builds a culture where safety is a first priority
Effectively communicates Owens Corning’s stand of safety with external parties and ensures that they work according to our safety standards
Good knowledge of NEC NFPA 70 and 70E, including Arc Flash
Developing Talent
Develops and executes training plans for maintenance personnel and creates a continuous learning environment for employees
Co-leads and coaches’ primary maintenance workforce and drives their engagement
Promotes a work environment characterized by mutual trust and respect, open and honest communications, teamwork and a passion for winning

Fulltime

Asset Health and Reliability Specialist

Reporting to the Mobile Maintenance Manager, the Asset Health and Reliability Sp...

Location

Australia , Pilbara

Salary:

Not provided

PLS

Expiration Date

Until further notice

Requirements

Relevant nationally recognized trade qualification
Current Driver’s Licence (C Class minimum)
Minimum 5 years of experience in a mining or heavy industry environment, with a focus on HME reliability or maintenance
Strong knowledge of heavy mobile equipment systems and components (e.g., engines, hydraulics, powertrain, electrical, undercarriage)
Experience conducting root cause analysis and implementing reliability improvements
Excellent communication and interpersonal skills for cross-functional collaboration
Strong commitment to safety and continuous improvement
Highly developed attention to detail with the ability to analysis condition monitoring information and provide accurate reported recommendation
Data exploration skills and capability to review datasets across systems, inclusive of time-series VIMS, KOMTRAX and equivalent data

Job Responsibility

Monitor and analyse the reliability performance of Heavy Mobile Equipment (HME) fleet
Develop and maintain equipment health strategies using reliability tools such as RCM, FMEA, and condition monitoring techniques
Collaborate with maintenance, operations, and OEMs to drive continuous improvement in equipment performance and availability
Identify and implement initiatives to improve mean time between failures (MTBF) and reduce mean time to repair (MTTR)
Review and optimise preventative and predictive maintenance strategies and schedules
Prepare and present reliability reports, KPIs, and improvement plans to senior management
Support the implementation and usage of reliability software systems (e.g., Pronto, AMT or similar CMMS tools)
Ensure all activities comply with site safety standards, environmental policies, and legislative requirements

What we offer

Quarterly short-term incentive bonus recognising individual and business performance
PLS employee share scheme
Access to newly refurbished facilities at Pilgangoora, including gym, tennis, pickleball and volleyball courts, sports oval and scenic walking tracks
18 weeks parental leave for primary carers and 4 weeks for secondary carers
Health and wellbeing allowance
Novated leasing through salary sacrifice
Paid community leave
Monthly employee recognition awards
Access to PLS’ KidsCo School Holiday Program
Access to our Employee Assistance Program and Company Chaplains

Fulltime

Sr Reliability Specialist

The Reliability Specialist is responsible for providing technical guidance on op...

Location

United States , Tyler

Salary:

Not provided

Delek US

Expiration Date

Until further notice

Requirements

2 year / Associate Degree (Required)
Six (6) or more years Oil & Gas or related experience (Required)
No Licensure or Certification Required
Reliability Management
Asset Management
Fixed Equipment
Rotating Equipment
Oil & Gas Refining
Pipeline Knowledge (DKL)
Pressure Control Devices

Job Responsibility

Provide technical guidance on operating units and equipment that maintains and improves the safety, environmental standards, overall reliability, and operating cost
Demonstrate accountability for increasing equipment reliability by improving time between failures of industrial equipment while reducing equipment downtime and manufacturing costs
Work collaboratively with the engineering functions as well as other departments to develop, implement and maintain standard mechanical and/or electrical processes incorporating industry best practices
Create maintenance technical standards and standardized work practices in collaboration with subject matter experts
Develop strategies to manage assets at peak performance, optimize lifetime return on investment, mitigate reliability risk, and supports capital improvements in support of long-term sustainable performance

What we offer

Up to a 10% match on 401K on your hire start, with a vesting timeline of only one year
Medical benefits that start on day one with a 30% premium rebate annually
Access to the Calm app for FREE
Performance management program to earn additional annual incentives

Fulltime

Cloud Engineering Specialist- VMware

Network Cloud is responsible for delivering BT’s strategic private cloud platfor...

Location

United Kingdom , Manchester; Ipswich; London

Salary:

Not provided

Plusnet

Expiration Date

Until further notice

Requirements

VMware vSphere and vCenter
VMware Cloud Foundation
Compute and virtualisation platform design and operation
Storage technologies including vSAN or equivalent
Experience producing High Level and Low Level Designs
Strong troubleshooting and diagnostic skills
Solid understanding of IP networking fundamentals
Ability to work across both design and operational activities
3 years or more in a hands-on infrastructure role, including 2nd and 3rd line support, AND 3 years+ Design Experience
Solid understanding of IP networking fundamentals and data centre infrastructure

Job Responsibility

Design, build and support VMware-based private cloud infrastructure
Contribute to High Level and Low Level Designs aligned to BT architecture and engineering standards
Support the deployment and lifecycle management of VMware Cloud Foundation environments
Operate and optimise compute and storage platforms, including performance, availability and capacity planning
Troubleshoot and resolve complex, high-severity infrastructure issues
Work closely with network and security teams to integrate compute platforms with NSX and underlay networks
Contribute to automation and infrastructure-as-code initiatives to improve consistency and efficiency
Produce and maintain technical documentation, diagrams, runbooks and operational procedures
Drive standardisation across platforms to ensure operational consistency and reliability
Provide technical guidance and support to other engineering teams and stakeholders

What we offer

On target 10% on target bonus
BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
From January 2025, equal family leave: receive 18 weeks at full pay, 8 weeks at half pay and 26 weeks at the statutory rate. It’s for all parents, no matter how your family is made up
Enhanced women’s health support: including help with menopause symptoms, cancer screenings, period care and more
25 days annual leave (not including bank holidays), increasing with service
24/7 private virtual GP appointments for UK colleagues
2 weeks carer’s leave
World-class training and development opportunities
Option to join BT Shares Saving schemes

Fulltime

Select Country

Site Reliability Engineering Specialist

Job Description

Job Responsibility

Requirements

Looking for more opportunities?

Site Reliability Engineering Specialist

Site Reliability Engineering Specialist

Site Reliability Engineering Specialist

Site Reliability Engineering Specialist

AI Platform Site Reliability Engineering Specialist

Sr Engineering Specialist

Asset Health and Reliability Specialist

Sr Reliability Specialist

Cloud Engineering Specialist- VMware

Our AI answers in your language