Azure Site Reliability Engineer Job at Myticas Consulting (Markham)

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...

Location

India , Bangalore

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Minimum 2 years of experience managing or leading cloud operations teams
Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
Familiarity with modern CI/CD automation and tools
Excellent communication, stakeholder management, and team-building skills
Experience scaling SRE practices in high-growth or large-scale environments
Ability to balance long-term reliability initiatives with short-term delivery needs.

Job Responsibility

Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
Define and track key reliability metrics, and report on team performance and system health to leadership
Contribute to hiring, onboarding, and career development for SREs.

What we offer

Health & Wellbeing benefits for physical, financial, and emotional wellbeing
Personal & Professional Development programs
Unconditional inclusion in the workplace.

Fulltime

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team responsible for Private and Public...

Location

Singapore , Singapore

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor’s degree or equivalent work experience
6+ years of relevant work experience
Highly motivated self-starter with excellent interpersonal and communication skills
Certification or formal training in site reliability engineering concepts and practices
Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
Experience working on observability, logging and metrics toolsets
Experience of k8s and container technologies such as Docker, Openshift and EKS
Experience with public cloud technologies such as AWS, GCP or Azure
Experience with Secrets products such as HashiCorp Vault or CyberArk

Job Responsibility

Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
Architecting and building tools and platforms that provide capabilities for SRE
Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
Actively owning production level incidents till resolution.

What we offer

Equal opportunity employer
Accessibility support for persons with disabilities.

Fulltime

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...

Location

United States , Sunnyvale

Salary:

175000.00 - 250000.00 USD / Year

Figure

Expiration Date

Until further notice

Requirements

Strong experience with Linux/Unix systems administration
Proficiency in programming/scripting
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
Ability to work in cross-functional teams with developers, infra, and product teams
Excellent verbal and written communication skills

Job Responsibility

Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
Migrate SaaS to self-hosted solutions to enhance security and reliability
Implement monitoring and alerting systems, and define incident response plans and runbooks
Reduce human workload through automation to automate deployment and scaling
Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
Use a data driven approach to demonstrate service robustness and track optimization work
Partner with the security team to ensure that security remediations and updates are applied in a timely manner

Fulltime

Site Reliability Engineer

As a highly skilled Site Reliability Engineer (SRE), you will contribute to buil...

Location

United States , New York City; San Francisco

Salary:

160000.00 - 300000.00 USD / Year

Hebbia

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
5+ years software development experience at a venture-backed startup or top technology firm
Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role
Strong expertise in managing CI/CD pipelines and deployment automation
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud (we are an AWS shop)
Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes
Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar
Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
Familiarity with security best practices and tools for infrastructure and application security
Excellent problem-solving skills and the ability to troubleshoot complex issues

Job Responsibility

Assist in managing deployment pipelines to facilitate smooth and efficient software releases
Help implement and maintain observability solutions for monitoring system performance and reliability
Support local development environments to optimize developer workflows
Work with development teams to ensure infrastructure aligns with project requirements
Contribute to improving the security of our infrastructure by assisting with proactive measures and audits
Assist in developing and maintaining automation scripts and tools to enhance operational efficiency
Help troubleshoot and resolve infrastructure and application issues to minimize downtime and maintain smooth operations
Participate in evaluating and integrating new technologies to enhance the scalability, reliability, and security of our infrastructure

What we offer

PTO: Unlimited
Insurance: Medical + Dental + Vision + 401K
Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late
Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent
Fertility benefits: $15k lifetime benefit
New hire equity grant: competitive equity package with unmatched upside potential

Fulltime

Site Reliability Engineer

You develop cloud platform according to modern principles. You advise our custom...

Location

Spain , Valencia

Salary:

Not provided

MaibornWolff GmbH

Expiration Date

Until further notice

Requirements

Ideally, a degree in computer science or comparable training
Sound technical understanding
Idea of how to build and run a secure application in the cloud
Experience with container orchestration, ideally with Kubernetes
Experience with Infrastructure-as-Code tools such as Terraform, Helm, Ansible, or CDK
Experience in setting up the release management process using modern CI/CD systems
Knowledge of a cloud provider (AWS, Azure, Google Cloud) certified in the best case
Development skills in at least one object-oriented, functional or scripting language
Very good English and good German Skills

Job Responsibility

Develop cloud platform according to modern principles
Advise customers on the sensible use of services in the cloud with regard to effort, costs and maintenance
Live a vibrant DevOps culture internally and carry it to customers
Help the customer to introduce the correct release processes and implement them based on the modern CI/CD tools (Azure DevOps, Gitlab, Github)
Develop and integrate monitoring and logging infrastructure to improve application maintainability
Design and develop scalable and fail-safe IT architectures

What we offer

Home Office & Office
Flexible Working Hours
Part-Time Models
Working Time Account
Sabbatical
30 days of paid vacation
An annual training budget of 1.5 gross monthly salaries for training, certifications, conferences, and more
Corporate seminars
Christmas parties
Private health and dental insurance

Senior Site Reliability Engineer

As a Senior Site Reliability Engineer on the Platform team, you will identify is...

Location

United States , Denver; San Francisco

Salary:

138000.00 - 191000.00 USD / Year

Checkr

Expiration Date

Until further notice

Requirements

Degree in Computer Science (or related field)
6+ years of experience in building tools with Python (preferred), GoLang, or Ruby
6+ years of experience in maintaining and observing production customer-facing environments in AWS or Azure
6+ years of experience as a member of an incident response team
Deep understanding of the fundamental infrastructure and platform concepts behind a micro-service architecture, REST APIs, and asynchronous queueing models
Experience with observability platforms and frameworks like Datadog, Splunk, Grafana, Prometheus, or OpenTelemetry
Strong collaboration, documentation, communication, and project management skills
Experience with container orchestration using Kubernetes/Docker/Terraform
Experience driving platform adoption across engineering teams, guided by a self-service and product-first approach
A passion for customer-centricity and building relationships with other teams

Job Responsibility

Collaborate, drive, and execute architectural discussions with cross-functional teams
Lead cross-team projects and SREs' technical roadmap to enable engineering and help Checkr customers
Design, build, ship, and maintain the core observability libraries, tools, and patterns used by all of Checkr’s engineering teams
Proactively engage across teams to foster service reliability, efficiency, and scalability
Troubleshoot complex production issues across the stack, with respect to performance, availability, and data quality
Present detailed technical information and benefits of the Checkr platform to a wide array of customers, including operations, developers, technical architects, and executives

What we offer

A fast-paced and collaborative environment
Learning and development allowance
Competitive cash and equity compensation and opportunities for advancement
100% medical, dental, and vision coverage
Up to $25K reimbursement for fertility, adoption, and parental planning services
Flexible PTO policy
Monthly wellness stipend, home office stipend
In-office perks such as lunch four times a week, commuter stipend, and an abundance of snacks and beverages

Fulltime

Junior Site Reliability Engineer

As a Jr. Site Reliability Engineer, you will 'make things scale' which includes ...

Location

United Kingdom

Salary:

Not provided

accesso

Expiration Date

Until further notice

Requirements

Some practical exposure to cloud platforms (AWS/Azure/GCP)—coursework, internships, or self-led projects
Ability to self-learn with assistance from Senior Engineers
Basic scripting ability using Python or Bash
Familiarity with basic Linux systems and general command–line
Understanding of Git and basic CI/CD concepts
Good written and verbal communication
customer-focused approach
Ability to work with minimal direction
Willingness to learn, take direction and work within a team

Job Responsibility

Assisting with provisioning and deploying accesso Horizon components to customer cloud accounts using Infrastructure as Code (Terraform)
Help maintain CI/CD pipelines (GitHub Actions) for application and infrastructure deployments
Support monitoring, logging and alerting (Prometheus, Grafana & Coralogix) and respond to basic alerts with supervision
Implement and improve basic automation and scripting
Participate in incident triage, root cause investigation and follow-up tasks
Follow security and compliance requirements for customer cloud environments (identity, secrets, network controls)
Produce and maintain operational runbooks, deployment guides and change notes
Participate in on-call rotation as a L1 responder
Normal workday may require time outside the normal working day
Learn and apply accesso Horizon product architecture and configuration

What we offer

Competitive compensation package including an annual bonus opportunity
8-days of paid bank holiday leave and 26-days of paid annual leave (paid leave increases with tenure)
8 hours of paid Volunteer Time Off
Inclusive Family Benefits, including a $7,500 benefit for surrogacy, adoption, and fertility
Robust health insurance scheme with the opportunity to participate in private medical scheme after satisfactory performance
Matching pension scheme (up to 8%)
Unlimited access to Udemy for Business
Flexible work schedule

Fulltime

Site Reliability Engineer

Corporate Tools is looking for a Site Reliability Engineer. You will be a tradit...

Location

United States

Salary:

175000.00 USD / Year

Corporate Tools

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience
5+ years of experience in software engineering
2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi
Strong proficiency with Kubernetes, Docker, and container orchestration in production environments
Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic
Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts
Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached)
Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement
Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis

Job Responsibility

Stop problems before they start
Fix issues quickly and learn from them
Help keep systems steady, secure, and running
Work closely with DevOps engineers to build out tools and automation
Take ownership

What we offer

100% employer-paid medical, dental and vision for employees
Annual review with raise option
22 days Paid Time Off accrued annually, and 4 holidays
After 3 years, PTO increases to 29 days
Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
Paid Parental Leave
Up to 6% company matching 401(k) with no vesting period
Quarterly allowance
Open concept office with friendly coworkers
Creative environment where you can make a difference

Fulltime

Select Country

Azure Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Looking for more opportunities?