SRE/ Observability Engineer Job at Realign (Toronto)

Site Reliability Engineer / Observability Engineer

Rackspace is building up its Professional Services Center of Excellence on Appli...

Location

Egypt , Giza

Salary:

Not provided

Rackspace

Expiration Date

Until further notice

Requirements

Bachelor’s degree in engineering/computer science or equivalent
Senior-level experience with Site Reliability Engineering, DevOps, Code level application support and troubleshooting, AWS Infrastructure design, implementation and optimization, Automation for deployment, scaling and reliability
Experience with observability solutions tools like Splunk, Datadog, SignalFx, etc.
Experience deploying, maintaining and supporting software applications/services in the AWS ecosystem
Proactive approach to identifying problems and solutions
Experience writing code with one or more interpreted languages such as Python, PHP, Perl, Ruby, Linux Shell
Experience with Terraform or Cloud Formation scripting
Experience with configuration management tools like Ansible, Chef or Puppet
Experience with standard software development best practices and tools such as code repositories (Git preferred)
Experience executing in an agile software development environment

Job Responsibility

Work with customers and implement Observability solutions
Build and maintain scalable systems and robust automation that supports engineering goals
Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance
Proactively gather and analyze both metric and log data from systems and applications to perform anomaly detection, performance tuning, capacity planning and fault isolation
Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security and performance standards
Collaborate with team members to document and share solutions
Maintain a deep understanding of the customer’s business as well as their technical environment
Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues

Fulltime

New

Senior Systems Operations Engineer - SRE and AIOps

Wells Fargo is seeking a Senior Systems Operations Engineer within the Enterpris...

Location

India , Hyderabad

Salary:

Not provided

Wells Fargo

Expiration Date

June 22, 2026

Requirements

4+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Strong Java / backend service development experience
Distributed systems and API-based service design
CI/CD pipelines and Git-based workflows
3+ years of experience with scripting and infrastructure automation using Terraform
3+ years of hands-on experience with OpenShift, GCP or Azure platform enablement and application migrations, build out of complex infrastructure programmable patterns using Infrastructure as Code (IaC)
2+ years of knowledge and understanding of Cloud service offerings such as data, analytics, AL/ML on GCP or Azure
2+ years of experience with key services provided by Azure and/or GCP such as BigQuery, Vertix AI, DataProc, Functions. AKS, Service Fabric
2+ years working in a globally distributed team to provide innovative and robust cloud centric solutions
2+ years gathering and analyzing data to diagnose the root cause of cloud workload issues, recommending and implementing solutions to resolve issues in timely manner

Job Responsibility

Lead or participate in managing all installed systems and infrastructure within the Systems Operations functional area
Contribute in increasing system efficiencies and lowering the human intervention time on related tasks
Review and analyze moderately complex operational support systems, application software, and system management tools to ensure the highest levels of systems and infrastructure availability
Work with vendors and other technical personnel for problem resolution
Lead team to meet technical deliverables while leveraging solid understanding of technical process controls or standards
Collaborate with vendors and other technical personnel to resolve technical issues and achieve highest levels of systems and infrastructure availability

Fulltime

!

Senior Site Reliability Engineer (SRE)

The Senior SRE is responsible for deployment, updates, and operational support f...

Location

India , Chennai

Salary:

Not provided

Dalet

Expiration Date

Until further notice

Requirements

Cloud platforms: AWS, Azure
Containerisation & Orchestration: Kubernetes
Infrastructure as Code: Terraform
Configuration Management: Ansible
Packaging & Deployment: Helm
Databases: MariaDB, MongoDB
Monitoring, observability, networking, and cloud security.

Job Responsibility

Act as a senior technical authority for APAC Site Reliability Engineering activities
Drive best practices in reliability, operations, and engineering standards
Promote technical excellence, collaboration, and accountability across stakeholders
Make infrastructure complexity transparent to both internal teams and customers, ensuring a consistently excellent client experience
Implement, track, and evolve service performance measures such as SLAs, SLOs, and SLIs
Anticipate risks related to service availability, capacity, performance regressions, and security vulnerabilities
Drive continuous improvement, including leading and facilitating Root Cause Analysis (RCA) activities
Ensure timely execution of deployments, upgrades, maintenance activities, and change requests
Anticipate workload, plan deliverables, and ensure qualification/validation of upcoming tasks
Collaborate closely with engineering to improve platform components, automation, and operational processes

What we offer

Great career opportunities around the world
Truly collaborative environment with supportive leadership
Cutting edge technologies (AI, Cloud, Cybersecurity...)
Talented and passionate team members
Fun working environment

Fulltime

Senior Software Engineer - Sre

Hybrid: This role is categorized as hybrid and is expected to report to Austin ...

Location

United States , Austin; Warren

Salary:

Not provided

General Motors

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science or a related field, or equivalent work experience
7-10 years software experience with strong proficiency in PostgreSQL and at least one other (Oracle, SQL Server) database technologies
Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems
Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures
Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures
Ability to optimize or troubleshoot code by understanding its execution and the impact on system resources
Experience handling production incidents, including root cause analysis, mitigation, and working through complex system failures
Strong communication skills, with an ability to explain technical concepts to both engineering and business stakeholders
Commitment to collaborative problem-solving and shared ownership of services
Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems

Job Responsibility

Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention
Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents
Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution
Work alongside developers to ensure the quality, scalability, and reliability of our database services
Practice shared ownership of services in production, fostering a "You build it, you run it" culture
Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively
Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence
Champion a culture of continuous improvement
Evaluate system performance and advocate for optimizations that reduce infrastructure costs while maintaining service reliability

Fulltime

Network Automation Observability Engineer

Piper Companies is seeking a Network Automation Observability Engineer for a wor...

Location

United States , Raleigh Durham

Salary:

140000.00 - 180000.00 USD / Year

Piper Companies

Expiration Date

Until further notice

Requirements

5+ years of experience in network engineering, network automation, or related roles
Strong hands-on experience with Python for automation and scripting
Proven experience using Ansible or similar automation and configuration management tools
Deep understanding of core network protocols and enterprise network architectures
Experience with network observability platforms and concepts such as telemetry, monitoring, alerting, and logging
Familiarity with APIs, data models (YANG), and modern network operating systems is a plus
Strong problem-solving skills with the ability to collaborate in a fast-paced environment

Job Responsibility

Design, develop, and maintain network automation solutions using Python, Ansible, and related frameworks
Build automated workflows for network provisioning, configuration management, validation, and remediation
Apply strong expertise in network protocols including TCP/IP, BGP, OSPF, routing, switching, and VLANs
Implement and enhance network observability solutions using telemetry, SNMP, streaming data, logs, and metrics
Integrate network automation and observability tooling with CI/CD pipelines and source control systems
Partner with network, systems, and SRE teams to improve network reliability, performance, and scalability
Troubleshoot complex network, automation, and observability issues in production environments

What we offer

medical
dental
vision
401(k)
PTO
Sick Leave as required by law

Fulltime

DevOps Engineer / SRE

As a DevOps Engineer / SRE, you will be a generalist with a broad impact on our ...

Location

Serbia

Salary:

Not provided

Fundraise Up

Expiration Date

Until further notice

Requirements

4+ years of experience as a DevOps Engineer, SRE, or Linux Systems Administrator
A strong foundation in Linux (we use Ubuntu), including core CLI troubleshooting tools
Solid experience with configuration management tools, particularly Ansible
Experience working with servers (VMs and/or bare metal), including setup and troubleshooting at the OS level
Proficiency in building and maintaining complex CI/CD pipelines (Jenkins experience is a major plus)
A good understanding of networking fundamentals, including TCP/IP and firewall configuration (iptables)
Experience with monitoring and observability principles (Prometheus/VictoriaMetrics stack preferred)
Experience working with Git
Scripting ability in Bash or Python
A high sense of ownership, responsibility, and attention to detail. We value professionals who are proactive and reliable

Job Responsibility

Work with servers (VMs and bare metal) at the OS level and below: configuration, maintenance, and troubleshooting
Automate infrastructure and routine operational tasks using Ansible and custom scripting (Bash / Python)
Build, maintain, and support complex CI/CD pipelines. We use scripted pipelines in Jenkins
Develop and support our monitoring and observability stack (Prometheus-style metrics, VictoriaMetrics, Grafana, Graylog)
Work with databases and data systems, including ClickHouse and MongoDB, with a focus on monitoring and operational stability
Investigate and resolve issues across Linux OS, networking, and application layers
Collaborate with engineers across teams to improve system reliability and automation
Take ownership of production systems and ensure stability and predictability in day-to-day operations

What we offer

31 days off
100% paid telemedicine plan
Home Office Setup Assistance: the company offers assistance with purchasing furniture (office chair, office desk, monitor) and other items to create a comfortable workspace
English learning courses
Relevant professional education
Gym or swimming pool
Co-working
Remote working
Stock options

Fulltime

Lead Software Engineer - SRE

Wells Fargo is seeking a Lead Site Reliability Engineer (SRE) to join the WIMT P...

Location

United States , CHARLOTTE; SAINT LOUIS

Salary:

119000.00 - 187000.00 USD / Year

Wells Fargo

Expiration Date

Until further notice

Requirements

5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience leading observability and monitoring tooling - Splunk, AppDynamics, Splunk Observability, Grafana, Open Telemetry
5+ years in infrastructure (windows and Linux) support
5+ years proven success in toil reduction initiatives
5+ years in cloud application management especially OpenShift Container Platform

Job Responsibility

Design and implement scalability, reliability, and observability strategies for cloud and on-premise environments
Define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and Error Budgets to improve system reliability
Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
Maintain knowledge of industry best practices and new technologies and recommend innovations that enhance operations or provide a competitive advantage to the organization
Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
Drive adoption of NFRs, best practices-quality and compliance across observability and performance engineering
Ensure high availability and performance of production systems through proactive monitoring and incident response
Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
Lead projects, teams, or serve as a peer mentor

What we offer

Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance
Parental leave
Critical caregiving leave
Discounts and savings
Commuter benefits
Tuition reimbursement

Fulltime

Devops Sre Engineer

We are looking for a mid-senior SRE/DevOps Engineer (5–8 years) to build and sca...

Location

India , Bengaluru

Salary:

Not provided

Acuver Consulting

Expiration Date

Until further notice

Requirements

5–8 years of experience in DevOps / SRE roles
Strong hands-on experience with AWS (preferred) and/or GCP
Expertise in: Kubernetes & Docker
Terraform (Infrastructure as Code)
CI/CD tools (GitLab, Jenkins, or similar)
Experience with: Event-driven / asynchronous architectures (Kafka, Pub/Sub, etc.)
Monitoring & logging tools (Prometheus, Grafana, ELK, etc.)
Microservices and distributed systems
Solid understanding of: Networking, load balancing, scaling strategies
High availability and fault-tolerant systems

Job Responsibility

Design and implement robust CI/CD pipelines (GitLab CI, Jenkins, or similar)
Enable automated build, test, and deployment workflows
Implement blue-green / canary deployments for zero-downtime releases
Ensure release traceability, rollback mechanisms, and deployment governance
Design, provision, and manage infrastructure on AWS (primary) and/or GCP
Build infrastructure using Infrastructure as Code (Terraform preferred)
Create reusable modules for scalable, secure, and standardized environments
Optimize cost, performance, and scalability of cloud resources
Deploy and manage applications using Docker & Kubernetes
Manage Kubernetes workloads using Helm charts

Fulltime

Select Country

SRE/ Observability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?