CrawlJobs Logo

Site Reliability Engineer II

United States, Exton · Job Posted May 29, 2026
Apply Position
Job Link Share

Job Responsibility

  • Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
  • Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
  • Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
  • Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
  • Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
  • Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support

Requirements

  • U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
  • 3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer II

8 matching positions

Site Reliability Engineer II

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Work with all aspects of a high throughput and multi-tenant service
  • Collaborate effectively within the team and with partner teams across Microsoft
  • Be part of the on-call rotation for maintaining service health
  • Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams
  • Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement
  • Document and define existing data engineering processes, data and technology, while evaluating them for optimization
  • System Reliability & Uptime – Ensuring high availability of services
  • Incident Management – Detecting, responding to, and mitigating system failures
  • Performance Monitoring – Tracking system health and resolving bottlenecks
  • Automation & Tooling – Reducing manual work through scripts and automation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Location
Location
Canada
Salary
Salary:
170000.00 - 200000.00 CAD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of applicable experience
  • Experience managing cloud platforms such as Azure, AWS, or similar
  • Experience using managed languages such as Python, Go, C#, Java, or similar
  • Experience operating in Kubernetes platforms like AKS, EKS, or similar
  • Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
  • Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
  • Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, AWS CloudFormation, or similar
  • Builder-operator mindset with proven production ownership (uptime, SLOs, on-call, incident leadership)
  • Empathy to support the needs of software engineers
Job Responsibility
Job Responsibility
  • Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, securely, and cost-effective
  • Exemplify cloud-native site reliability best practices
  • Write code that is performant, maintainable, clear, and concise
  • Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
  • Influence and educate the engineering organization to adopt new and improved architectural patterns
  • Provide robust documentation for use by engineers to promote self-service
  • Take calculated risks, champion new ideas, and cultivate your craft
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

The IDEAS organization’s mission is to unlock the power of data to deliver actio...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
  • Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell)
  • Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions
  • Effective written and verbal communication skills, and experience collaborating across teams and disciplines
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter
  • The successful candidate must have an active U.S. Government Secret Security Clearance
Job Responsibility
Job Responsibility
  • Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning
  • Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort
  • Instrument services for observability
  • collect and analyze telemetry and health signals
  • and use data to guide reliability and performance improvements
  • Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions
  • Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
  • Support compliance with security, privacy, and accessibility requirements throughout service onboarding and ongoing operations
  • Continuously learn and adopt industry practices and internal tools to improve reliability, performance, and observability
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...
Location
Location
United States , Multiple Locations
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Active U.S. Government Top Secret Security Clearance
  • Ability to pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
  • Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
  • Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
  • Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
  • Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
  • Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
  • Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...
Location
Location
Canada , Toronto
Salary
Salary:
115000.00 - 165000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Hands-on experience operating Linux-based systems in production environments
  • Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
  • Experience with container orchestration (e.g., EKS, Kubernetes)
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)
Job Responsibility
Job Responsibility
  • Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
  • Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
  • Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

We are seeking an experienced Site Reliability Engineer II to help build, mainta...
Location
Location
United States , Alpharetta
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience in SRE, DevOps, or Cloud Infrastructure roles
  • Strong hands-on experience with Microsoft Azure services
  • Advanced experience with Terraform and Terragrunt
  • Proficiency with Kubernetes/AKS and container orchestration
  • Experience with CI/CD tools including GitHub Actions and ArgoCD
  • Solid understanding of observability tooling, especially Grafana
  • Hands-on experience with Java environments (for app debugging/support)
Job Responsibility
Job Responsibility
  • Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt
  • Maintain, monitor, and optimize Kubernetes clusters (AKS)
  • Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD in a GitOps model
  • Enhance reliability through monitoring, alerting, and observability using Grafana (Prometheus, Loki, Tempo is a plus)
  • Automate operational tasks to reduce manual toil
  • Participate in on-call rotations, incident response, and post-mortem reviews
  • Collaborate with development teams to improve application reliability, performance, and scalability
  • Implement and advocate for SRE practices including SLIs, SLOs, and error budgets
  • Continuously improve infrastructure performance, cost efficiency, and security posture
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
Read More
Arrow Right

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
allianceautomotive.co.uk Logo
Alliance Automotive UK LV Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Typically requires a bachelor's degree and three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an SRE Engineer II, you will be responsible for managing our multi-cloud infr...
Location
Location
United States , Sunnyvale
Salary
Salary:
138000.00 - 159000.00 USD / Year
illumio.com Logo
Illumio
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • or equivalent work experience
  • 2+ years of experience working as an SRE, DevOps Engineer, or similar role, with hands-on experience in Azure cloud platform in a production environment setting
  • Proficiency in scripting and programming languages such as PowerShell, Python, or Go for automation and infrastructure management tasks
  • Experience with CI/CD tools and methodologies, containerization technologies, and microservices architecture in cloud environments
  • Strong analytical, problem-solving, and communication skills, with the ability to collaborate effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain cloud infrastructure solutions on Azure, AWS, and/or GCP to support our applications and services
  • Implement infrastructure as code (IaC) principles using tools such as Terraform, ARM templates, or CloudFormation to automate provisioning and configuration management
  • Develop and maintain CI/CD pipelines for automated software delivery and deployment, leveraging tools such as Azure DevOps, AWS CodePipeline, or Jenkins
  • Monitor system performance, application health, and infrastructure metrics using cloud monitoring and logging services, and implement proactive measures to optimize performance and availability
  • Support incident response and resolution efforts, conduct root cause analysis, implement corrective actions, and document post-incident reviews
  • Collaborate with Engineering teams to design and implement scalable and reliable architectures, providing guidance on best practices for cloud-native application development
  • Implement security best practices and controls in cloud environments to protect data, applications, and infrastructure, and ensure compliance with regulatory requirements
  • Drive automation initiatives to streamline operational tasks, reduce manual effort, and improve overall efficiency in cloud operations
  • Stay current with cloud platform updates, trends, and best practices, and evaluate emerging technologies for potential adoption to drive innovation and efficiency
  • Provide support and guidance to junior team members, fostering a culture of learning, collaboration, and continuous improvement within the SRE/DevOps team
What we offer
What we offer
  • Medical, Dental, Vision Coverage
  • Health and Dependent Savings Accounts
  • Life and Disability Programs
  • Paid Parental Leave
  • Voluntary Benefit Programs
  • Company Sponsored Wellness Program
  • Wellness Reimbursement Program
  • Retirement Savings
  • Equity Opportunities
  • Paid time off and Paid Holidays
  • Fulltime
Read More
Arrow Right