CrawlJobs Logo

Site Reliability Engineering Specialist

plus.net Logo

Plusnet

Location Icon

Location:
India , Bengaluru

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Site Reliability Engineering Specialist independently executes activities that help ensures BT is in the best position to deliver the service performance, reliability and availability that internal and external customers expect, through enabling cross-team engineering discussions to achieve scalable, measurable, fault-tolerant, and cost-effective cloud services.

Job Responsibility:

  • Executes the implementation of new software development life cycle automation tools, frameworks, and code pipelines
  • Coordinates a diverse team and creates the initial test schedule
  • Executes the implementation of automation technologies
  • Proactively identifies and manages risk
  • Leads scale testing to measure, tune and optimise system performance
  • Executes metric/monitoring analysis
  • Designs, analyses, develops and troubleshoots highly distributed large-scale production systems
  • Executes approaches that scale systems sustainably
  • Writes and delivers infrastructure as code software
  • Implements robust monitoring and alerting systems and performs root cause analysis
  • Inspects queue and support processing
  • Executes retrospective and preventive actions after each high severity production incident
  • Analyses complex systems from a reliability and resilience perspective
  • Champions, continuously develops and shares with team knowledge on emerging trends
  • Mentors other site reliability engineers
  • Uses the network of site reliability engineers, removing BTs organisational boundaries

Requirements:

  • A degree in IT, Maths or Science
  • A deep understanding of full stack monitoring solutions such as Dynatrace
  • Strong proficiency in one or more programming languages (e.g. Java, Python)
  • Experience with cloud platforms (AWS, Azure, or GCP)
  • Solid understanding of software architecture, design patterns, and microservices
  • Familiarity with CI/CD tools and DevOps practices
  • High levels of quality presentation and reporting capabilities
  • Resilience to ensure support teams are engaged 24x7x365
  • Ability to adapt to latest industry trends
  • CI/CD/CT Pipeline management
  • Micro-Service functionality
  • Business Process Improvement
  • Growth mindset
  • AI driven Observability & AIOps
  • Incident Response with AI
  • ML Ops for Reliability
  • AI enhanced Automation & CI/CD
  • AI + Chaos Engineering (Resilience)
  • Platform & Tool Literacy (AI ready)
  • Governance, Safety & Measurement

Additional Information:

Job Posted:
March 19, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineering Specialist

Manager, Reliability

Responsible for sustaining and continuously improving various mechanical compone...
Location
Location
United States , Big Spring
Salary
Salary:
Not provided
delekus.com Logo
Delek US
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4 year / Bachelor's Degree (Required)
  • Four (4) or more years Experience in a related field (Required)
  • No Licensure or Certification Required
  • Manages and leads the activities of the Reliability engineers and specialists
  • Ensures compliance to Engineering Practices/Mechanical Integrity at the site level
  • Champions initiatives, projects, and programs that support the reliability vision
  • Guides Reliability Engineers to grow their technical and leadership skills
  • Develops working relationships with site leaders to guide teams on reliability centered processes and investigations
  • SPOC between Corporate Reliability and site activities
  • Reliability Department budget owner
Job Responsibility
Job Responsibility
  • Responsible for sustaining and continuously improving various mechanical components for equipment and tools
  • Ensures the safe, effective operations of the organization's production and supports continuous improvement
  • Manages reliability engineering projects
  • Performs analytical verification
  • Evaluates, tests and tracks results of reliability interventions
  • Initiates reporting for internal or third-party reported incidents
  • Creates, documents, and follows up on corrective actions
  • Prepares routine reports and memos and coordinate communications across all necessary functional groups of the organization
What we offer
What we offer
  • up to a 10% match on 401K on your hire start, with a vesting timeline of only one year
  • medical benefits that start on day one with a 30% premium rebate annually
  • access to the Calm app for FREE
  • additional annual incentives through performance management program
  • Fulltime
Read More
Arrow Right

Construction Maintenance Specialist

Join Galp and bring your curiosity and passion every day. With a customer-centri...
Location
Location
Portugal , Madeira
Salary
Salary:
Not provided
https://www.galp.com/ Logo
Galp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s/Master’s degree in Mechanical or Civil Engineering
  • Minimum of 3 years of prior experience in similar roles
  • Professional Gas Technician certification
  • Membership in the Order of Engineers
  • Experience in Project Management
  • Proficiency in English
  • Strong computer skills, including MS Office 365, Power BI, and SharePoint
  • Excellent written and verbal communication skills
  • Customer and results-oriented mindset
  • Strong leadership skills and experience in managing service providers and teams
Job Responsibility
Job Responsibility
  • Lead and coordinate multidisciplinary teams of service providers and collaborate with other departments in executing engineering projects for Galp LPG clients or potential clients
  • Promote and manage LPG construction projects to support Residential and Enterprise business development
  • Ensure the maintenance and requalification of LPG assets in the Madeira archipelago, including networks, parks, and gas cabins
  • Participate and collaborate in the licensing process for LPG assets in Madeira
  • Manage Galp Madeira’s internal installation teams
  • Assume Technical Responsibility for the Operating Entity
  • Participate in the emergency and urgent maintenance response team of Galp Madeira
  • Ensure compliance with Health, Safety, and Environmental (HSE) standards and procedures on-site
  • Understand and follow technological development trends, acting as an agent for change management, business challenges, and requirements
  • Contribute to continuous improvement in the processes of the Construction and Renovation unit within the Technical Operations area
  • Fulltime
Read More
Arrow Right

Senior Applications Specialist

Location
Location
Canada , Mississauga
Salary
Salary:
Not provided
advancedtechsearch.com Logo
Advanced Technology Search Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A Degree in Electrical Engineering, Computer Science, or related technical discipline or equivalent experience
  • At least 3 years of experience in advanced systems engineering support, focusing on complex technical problem resolution
  • Proven generalized understanding of computer networking (LAN, WAN, NAT, DNS, Basic Firewalls, etc.)
  • Hands-on experience with Linux and/or Windows (CMD, Bash, PS, regedit, etc.)
  • Demonstrated ability to diagnose sophisticated technical issues and implement effective solutions
  • Ability to work collaboratively with cross-functional teams
  • Willing to adapt to evolving technologies and industry standards
Job Responsibility
Job Responsibility
  • Analyze complex technical issues and system integrations to identify root causes and develop effective solutions
  • Conduct systematic analysis to diagnose customer system issues and implement effective technical solutions
  • Travel to customer’ sites in Canada and US for advanced troubleshooting and customer support
  • Collaborate with designers, developers, and stakeholders and well as technical support team to endure seamless product integration and customer satisfaction
  • Manage the deployment and configuration of integrated systems, ensuring optimal performance and reliability
  • Develop detailed Product Support Documents, and train internal technical support as appropriate
  • Develop comprehensive technical manuals and field installation guides to support customers during product installation, commissioning, and troubleshooting
  • Investigate and review recurring product issues to drive product improvements
  • Equip and support the team with in-depth product knowledge and configuration strategies
  • Provide post-sales customer support, including consultation on product configuration, installation, and usage
  • Fulltime
Read More
Arrow Right
New

Sr. Specialist - Site Reliability Engineer

The Production Support SRE Engineer is responsible for ensuring the reliability,...
Location
Location
United States , Southlake; Austin
Salary
Salary:
115000.00 - 131000.00 USD / Year
schwab.com Logo
Charles Schwab
Expiration Date
March 23, 2026
Flip Icon
Requirements
Requirements
  • 2+ yrs experience in production support, incident management, and real‑time troubleshooting for high‑availability systems
  • Solid understanding of SRE principles, including SLIs, SLOs, error budgets, and incident response frameworks
  • Hands-on experience with observability and monitoring tools such as Splunk, Grafana, Moogsoft, or xMatters
  • Proficiency with structured logging, log analysis, and alert tuning
  • Ability to create and maintain runbooks, operational guides, and incident playbooks
  • Familiarity with automation concepts and ability to identify and reduce operational toil through scripts, tooling, or process improvements
  • Strong communication skills with the ability to translate complex technical issues into clear, business-friendly language
  • Ability to partner with product, engineering, and delivery teams to embed reliability into the development lifecycle
  • Experience participating in on-call rotations, including market‑hours support and after‑hours escalations
  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Serve as the primary production support engineer for assigned Workplace Services applications, ensuring high availability, rapid incident response, and effective participation in both market‑hour and after‑hours on‑call rotations
  • Lead root‑cause analysis, support SLO breach investigations, and partner with product and delivery teams to restore and maintain service health
  • Champion Schwab’s SRE principles by improving observability, structured ELI logging, meaningful alerting, automation, and standardized dashboard/reporting patterns
  • Ensure new features, releases, and operational changes meet reliability, monitoring, and readiness expectations
  • Develop and maintain runbooks, operational guides, incident playbooks, and service documentation
  • Identify sources of operational toil, drive automation efforts, rationalize alerts, and deliver data‑driven insights and trends to product and engineering teams for proactive reliability improvements
  • Act as the embedded SRE partner for your service area—attending key ceremonies, advising teams on operational risks, and promoting best practices in reliability engineering
  • Foster a culture of blameless postmortems, continuous learning, and cross‑team enablement
What we offer
What we offer
  • 401(k) with company match and Employee stock purchase plan
  • Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
  • Paid parental leave and family building benefits
  • Tuition reimbursement
  • Health, dental, and vision insurance
  • Fulltime
!
Read More
Arrow Right

Site Reliability Engineering Specialist

This role will specialise in system administration and server management with a ...
Location
Location
United Kingdom , Birmingham
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in an ISP Environment: Proven experience in a fast-paced ISP setting, managing and troubleshooting large-scale networks
  • Sysadmin/Server Management: Strong skills in system administration, server management, and compute resources with experience in deploying and managing containerised applications using orchestration tools such as Kubernetes
  • Technical Proficiency: Strong understanding of network architecture, design, and implementation
  • Monitoring and Logging Solutions: Familiarity with monitoring and logging solutions such as Elastic search, Apache Kafka, and Prometheus
  • Programming Proficiency: Proficiency in at least one programming language, such as Python, Ansible or Go
  • Growth Mindset: Self-driven attitude towards learning new skills and aiding the development of others
Job Responsibility
Job Responsibility
  • Network Delivery: Support the Implementation of flawless change into the live network, utilising automation and CI/CD pipelines
  • Network Monitoring: Configure, maintain, and monitor systems and network infrastructure to ensure optimal health, performance, and reliability
  • Automation Tools: Utilise tools such as Ansible to provision and manage infrastructure resources in a scalable and efficient manner
  • Technical Acumen: Apply your understanding of network principles to troubleshoot network faults within our systems and look at how you can optimise performance and enhance security across our infrastructure
  • Incident Management and Resolution: Be prepared to support a 365x24/7 callout, providing third line technical resolution covering an extensive range of technologies
  • Customer Focus: Be a technical expert who understands the end-to-end journey of our customers
  • Growth and Development: As a technically talented expert you should enhance the brand of the team and support those around you to be accountable and perform at their best
What we offer
What we offer
  • Competitive salary
  • 10% on target bonus
  • BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
  • 25 days annual leave (not including bank holidays), increasing with service
  • Huge range of flexible benefits including cycle to work, healthcare, season ticket loan
  • World-class training and development opportunities
  • Option to join BT Shares Saving schemes
  • Discounted broadband, mobile and TV packages
  • Access to 100’s of retail discounts including the BT shop
  • On call allowances and overtime
  • Fulltime
Read More
Arrow Right

Platform Specialist

Hermeus is a high-speed aircraft manufacturer focused on the rapid design, build...
Location
Location
United States , Atlanta
Salary
Salary:
105750.00 - 129250.00 USD / Year
hermeus.com Logo
Hermeus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Associates degree in information technology/systems, Computer Science, Engineering, or related STEM field
  • 5+ years of experience as a Network Engineer or in a similar role (Site Reliability Engineer, Platform Engineer, Cybersecurity Engineer, Software Engineer)
  • Proficient in Linux
  • Proficient in basic electronics maintenance and repair
  • Proficient in the maintenance/termination serial port connectors (RS232, etc.), and CAT5/6
  • Proficient in basic electronics packaging (design, assembly, cable routing, cooling, etc.)
  • Proficiency working at all layers of the OSI network model
  • Proficiency designing and maintaining layer 2 networks
  • Comfortable reading and maintaining low voltage wiring diagrams
  • Experience creating, explaining, and maintaining network architectures, network topology diagrams, and other interface diagrams/specifications
Job Responsibility
Job Responsibility
  • Build, maintain, troubleshoot, and repair Ground Control Stations
  • Install, configure, and maintain flight critical IT systems, communication tools, and mission-enhancing hardware/software
  • Install, configure, and maintain pilot-in-the-loop cockpits for Remotely Piloted Aircraft (RPAs)
  • Collaborate closely with other engineering teams to ensure seamless integration of mission systems
  • Design, configure, and maintain network hardware and software, including routers, switches, firewalls, Access Points (wireless), and VPNs
  • Monitor network performance and ensure system availability and reliability
  • Perform network troubleshooting to isolate and diagnose common network problems
  • Implement and maintain network security, including access controls, intrusion detection systems, and threat prevention
  • Manage and administer network services such as DNS, DHCP, and IP address management (IPAM)
  • Develop and maintain comprehensive documentation for network configurations, processes, and procedures
What we offer
What we offer
  • 100% employer-paid health care
  • 401k & retirement plans
  • Unlimited PTO
  • Weekly paid office lunches
  • Fully stocked breakrooms
  • Stock options
  • Paid Parental Leave
  • Fulltime
Read More
Arrow Right

Platform Specialist

Location
Location
United States , Atlanta
Salary
Salary:
Not provided
hermeus.com Logo
Hermeus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Associates degree in information technology/systems, Computer Science, Engineering, or related STEM field
  • 5+ years of experience as a Network Engineer or in a similar role (Site Reliability Engineer, Platform Engineer, Cybersecurity Engineer, Software Engineer)
  • Proficient in Linux
  • Proficient in basic electronics maintenance and repair
  • Proficient in the maintenance/termination serial port connectors (RS232, etc.), and CAT5/6
  • Proficient in basic electronics packaging (design, assembly, cable routing, cooling, etc.)
  • Proficient working at all layers of the OSI network model
  • Proficient designing and maintaining layer 2 networks
  • Legally authorized to work for any employer in the United States
  • Will not require employment visa sponsorship
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Specialist

BTI Professionals provide expert third-line reliability and operational support ...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience supporting large-scale, high-availability services in an ISP / NaaS / network-centric environment
  • Strong Linux troubleshooting and systems knowledge
  • Hands-on Kubernetes experience operating applications in production
  • Experience delivering changes using GitOps and CI/CD pipelines (including release validation and rollback awareness)
  • Working knowledge of incident/problem management in ServiceNow and delivery tracking in Jira (Scrum / PI planning)
  • Experience with observability tooling: Dynatrace, Prometheus, Elasticsearch, plus event/messaging platforms such as Kafka
  • Solid networking fundamentals to support effective troubleshooting
  • Automation experience with Ansible and at least one of Python / Go / Bash
  • Experience integrating or operating services with LDAP (authentication/authorisation, troubleshooting access issues)
Job Responsibility
Job Responsibility
  • Provide SRE ownership for the Global Fabric NaaS service, ensuring availability, performance, and resilience
  • Support safe, automated change into production using CI/CD, GitOps, and automated testing
  • Operate and improve monitoring and observability using Dynatrace, Prometheus, and Elasticsearch
  • Troubleshoot incidents across Kubernetes-hosted applications, Linux systems, networking, and service integrations
  • Act as a third-line escalation point, participating in a 24x7 on-call rota
  • Manage incidents via ServiceNow and track defects and improvements in Jira
  • Contribute to Scrum ceremonies and PI planning, supporting Agile delivery
  • Drive automation using Ansible and scripting to reduce operational toil
  • Mentor and support L2 engineers, improving runbooks, troubleshooting practices, and operational readiness
What we offer
What we offer
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • Car allowance
  • Fulltime
Read More
Arrow Right