CrawlJobs Logo

Site Reliability Engineer II

Canada Employment contract 170000.00 - 200000.00 CAD / Year · Job Posted May 27, 2026
Apply Position
Job Link Share

Job Responsibility

  • Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, securely, and cost-effective
  • Exemplify cloud-native site reliability best practices
  • Write code that is performant, maintainable, clear, and concise
  • Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
  • Influence and educate the engineering organization to adopt new and improved architectural patterns
  • Provide robust documentation for use by engineers to promote self-service
  • Take calculated risks, champion new ideas, and cultivate your craft

Requirements

  • 5+ years of applicable experience
  • Experience managing cloud platforms such as Azure, AWS, or similar
  • Experience using managed languages such as Python, Go, C#, Java, or similar
  • Experience operating in Kubernetes platforms like AKS, EKS, or similar
  • Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
  • Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
  • Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, AWS CloudFormation, or similar
  • Builder-operator mindset with proven production ownership (uptime, SLOs, on-call, incident leadership)
  • Empathy to support the needs of software engineers

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer II

8 matching positions

Site Reliability Engineer II

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Work with all aspects of a high throughput and multi-tenant service
  • Collaborate effectively within the team and with partner teams across Microsoft
  • Be part of the on-call rotation for maintaining service health
  • Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams
  • Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement
  • Document and define existing data engineering processes, data and technology, while evaluating them for optimization
  • System Reliability & Uptime – Ensuring high availability of services
  • Incident Management – Detecting, responding to, and mitigating system failures
  • Performance Monitoring – Tracking system health and resolving bottlenecks
  • Automation & Tooling – Reducing manual work through scripts and automation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Location
Location
United States , Exton
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
  • 3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems
Job Responsibility
Job Responsibility
  • Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
  • Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
  • Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
  • Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
  • Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
  • Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support
Read More
Arrow Right

Site Reliability Engineer II

The IDEAS organization’s mission is to unlock the power of data to deliver actio...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
  • Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell)
  • Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions
  • Effective written and verbal communication skills, and experience collaborating across teams and disciplines
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter
  • The successful candidate must have an active U.S. Government Secret Security Clearance
Job Responsibility
Job Responsibility
  • Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning
  • Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort
  • Instrument services for observability
  • collect and analyze telemetry and health signals
  • and use data to guide reliability and performance improvements
  • Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions
  • Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
  • Support compliance with security, privacy, and accessibility requirements throughout service onboarding and ongoing operations
  • Continuously learn and adopt industry practices and internal tools to improve reliability, performance, and observability
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...
Location
Location
United States , Multiple Locations
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Active U.S. Government Top Secret Security Clearance
  • Ability to pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
  • Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
  • Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
  • Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
  • Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
  • Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
  • Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...
Location
Location
Canada , Toronto
Salary
Salary:
115000.00 - 165000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Hands-on experience operating Linux-based systems in production environments
  • Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
  • Experience with container orchestration (e.g., EKS, Kubernetes)
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)
Job Responsibility
Job Responsibility
  • Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
  • Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
  • Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

We are seeking an experienced Site Reliability Engineer II to help build, mainta...
Location
Location
United States , Alpharetta
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience in SRE, DevOps, or Cloud Infrastructure roles
  • Strong hands-on experience with Microsoft Azure services
  • Advanced experience with Terraform and Terragrunt
  • Proficiency with Kubernetes/AKS and container orchestration
  • Experience with CI/CD tools including GitHub Actions and ArgoCD
  • Solid understanding of observability tooling, especially Grafana
  • Hands-on experience with Java environments (for app debugging/support)
Job Responsibility
Job Responsibility
  • Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt
  • Maintain, monitor, and optimize Kubernetes clusters (AKS)
  • Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD in a GitOps model
  • Enhance reliability through monitoring, alerting, and observability using Grafana (Prometheus, Loki, Tempo is a plus)
  • Automate operational tasks to reduce manual toil
  • Participate in on-call rotations, incident response, and post-mortem reviews
  • Collaborate with development teams to improve application reliability, performance, and scalability
  • Implement and advocate for SRE practices including SLIs, SLOs, and error budgets
  • Continuously improve infrastructure performance, cost efficiency, and security posture
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
Read More
Arrow Right

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
allianceautomotive.co.uk Logo
Alliance Automotive UK LV Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Typically requires a bachelor's degree and three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an SRE Engineer II, you will be responsible for managing our multi-cloud infr...
Location
Location
United States , Sunnyvale
Salary
Salary:
138000.00 - 159000.00 USD / Year
illumio.com Logo
Illumio
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • or equivalent work experience
  • 2+ years of experience working as an SRE, DevOps Engineer, or similar role, with hands-on experience in Azure cloud platform in a production environment setting
  • Proficiency in scripting and programming languages such as PowerShell, Python, or Go for automation and infrastructure management tasks
  • Experience with CI/CD tools and methodologies, containerization technologies, and microservices architecture in cloud environments
  • Strong analytical, problem-solving, and communication skills, with the ability to collaborate effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain cloud infrastructure solutions on Azure, AWS, and/or GCP to support our applications and services
  • Implement infrastructure as code (IaC) principles using tools such as Terraform, ARM templates, or CloudFormation to automate provisioning and configuration management
  • Develop and maintain CI/CD pipelines for automated software delivery and deployment, leveraging tools such as Azure DevOps, AWS CodePipeline, or Jenkins
  • Monitor system performance, application health, and infrastructure metrics using cloud monitoring and logging services, and implement proactive measures to optimize performance and availability
  • Support incident response and resolution efforts, conduct root cause analysis, implement corrective actions, and document post-incident reviews
  • Collaborate with Engineering teams to design and implement scalable and reliable architectures, providing guidance on best practices for cloud-native application development
  • Implement security best practices and controls in cloud environments to protect data, applications, and infrastructure, and ensure compliance with regulatory requirements
  • Drive automation initiatives to streamline operational tasks, reduce manual effort, and improve overall efficiency in cloud operations
  • Stay current with cloud platform updates, trends, and best practices, and evaluate emerging technologies for potential adoption to drive innovation and efficiency
  • Provide support and guidance to junior team members, fostering a culture of learning, collaboration, and continuous improvement within the SRE/DevOps team
What we offer
What we offer
  • Medical, Dental, Vision Coverage
  • Health and Dependent Savings Accounts
  • Life and Disability Programs
  • Paid Parental Leave
  • Voluntary Benefit Programs
  • Company Sponsored Wellness Program
  • Wellness Reimbursement Program
  • Retirement Savings
  • Equity Opportunities
  • Paid time off and Paid Holidays
  • Fulltime
Read More
Arrow Right