CrawlJobs Logo

Site Reliability Engineer II

United States, Redmond Employment contract 102100.00 USD / Year · Job Posted June 03, 2026
Apply Position
Job Link Share

Job Description

Are you interested in working on cutting-edge cloud security products? Would you like to be part of one of the world’s most advanced cyber-security solutions and protect millions of computers from thousands of active attack attempts, every month? Look no further than the Microsoft Defender engineering team. We are looking for a Site Reliability Engineer II who will be building and delivering cloud solutions to meet the scale that few companies in the industry are required to support. Leveraging state-of-the-art technologies, you will be instrumental in delivering holistic protection within highly sensitive and secure government environments. The Microsoft Defender team is responsible for delivering a constantly evolving set of services and solutions to meet the challenging landscape of our ever-evolving attackers. This is a team which provides on-call operational support and improvements to the operational posture of the Microsoft Defender products within US Government clouds. You will operate our production services, and work closely with other engineering teams to ensure services and systems are highly stable, meet performance SLAs, and meet the expectations of internal and external customers and users. The Microsoft Defender team is responsible for delivering a constantly evolving set of services and solutions to meet the challenging landscape of our ever-evolving attackers. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Job Responsibility

  • Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
  • Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
  • Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
  • Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
  • Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
  • Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
  • Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems

Requirements

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements
  • Candidates must have an active TS and be willing and eligible to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing and eligible to upgrade to TS/SCI (with polygraph)
  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • 2+ years technical experience working with large-scale cloud or distributed systems
  • Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
  • Proficiency in one or more programming languages such as C#, Go, Java, or Python, with the ability to develop and maintain production-quality code
  • Experience with automation that results in measurable improvements (e.g., reduced toil, fewer manual steps, improved system reliability)
  • Experience with debugging and troubleshooting complex distributed systems in production environments
  • Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency
  • Hands-on experience with CI/CD pipelines, testing, deployment, and reliability tooling

Nice to have

  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • 2+ years technical experience working with large-scale cloud or distributed systems
  • Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
  • Proficiency in one or more programming languages such as C#, Go, Java, or Python
  • Experience with automation that results in measurable improvements
  • Experience with debugging and troubleshooting complex distributed systems in production environments
  • Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency
  • Hands-on experience with CI/CD pipelines, testing, deployment, and reliability tooling

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer II

8 matching positions

Site Reliability Engineer II

Location
Location
United States , Exton
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
  • 3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems
Job Responsibility
Job Responsibility
  • Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
  • Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
  • Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
  • Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
  • Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
  • Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support
Read More
Arrow Right

Site Reliability Engineer II

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Work with all aspects of a high throughput and multi-tenant service
  • Collaborate effectively within the team and with partner teams across Microsoft
  • Be part of the on-call rotation for maintaining service health
  • Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams
  • Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement
  • Document and define existing data engineering processes, data and technology, while evaluating them for optimization
  • System Reliability & Uptime – Ensuring high availability of services
  • Incident Management – Detecting, responding to, and mitigating system failures
  • Performance Monitoring – Tracking system health and resolving bottlenecks
  • Automation & Tooling – Reducing manual work through scripts and automation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Location
Location
Canada
Salary
Salary:
170000.00 - 200000.00 CAD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of applicable experience
  • Experience managing cloud platforms such as Azure, AWS, or similar
  • Experience using managed languages such as Python, Go, C#, Java, or similar
  • Experience operating in Kubernetes platforms like AKS, EKS, or similar
  • Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
  • Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
  • Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, AWS CloudFormation, or similar
  • Builder-operator mindset with proven production ownership (uptime, SLOs, on-call, incident leadership)
  • Empathy to support the needs of software engineers
Job Responsibility
Job Responsibility
  • Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, securely, and cost-effective
  • Exemplify cloud-native site reliability best practices
  • Write code that is performant, maintainable, clear, and concise
  • Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
  • Influence and educate the engineering organization to adopt new and improved architectural patterns
  • Provide robust documentation for use by engineers to promote self-service
  • Take calculated risks, champion new ideas, and cultivate your craft
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

The IDEAS organization’s mission is to unlock the power of data to deliver actio...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
  • Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell)
  • Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions
  • Effective written and verbal communication skills, and experience collaborating across teams and disciplines
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter
  • The successful candidate must have an active U.S. Government Secret Security Clearance
Job Responsibility
Job Responsibility
  • Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning
  • Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort
  • Instrument services for observability
  • collect and analyze telemetry and health signals
  • and use data to guide reliability and performance improvements
  • Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions
  • Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
  • Support compliance with security, privacy, and accessibility requirements throughout service onboarding and ongoing operations
  • Continuously learn and adopt industry practices and internal tools to improve reliability, performance, and observability
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...
Location
Location
United States , Multiple Locations
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Active U.S. Government Top Secret Security Clearance
  • Ability to pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
  • Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
  • Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
  • Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
  • Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
  • Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
  • Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...
Location
Location
Canada , Toronto
Salary
Salary:
115000.00 - 165000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Hands-on experience operating Linux-based systems in production environments
  • Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
  • Experience with container orchestration (e.g., EKS, Kubernetes)
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)
Job Responsibility
Job Responsibility
  • Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
  • Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
  • Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

We are seeking an experienced Site Reliability Engineer II to help build, mainta...
Location
Location
United States , Alpharetta
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience in SRE, DevOps, or Cloud Infrastructure roles
  • Strong hands-on experience with Microsoft Azure services
  • Advanced experience with Terraform and Terragrunt
  • Proficiency with Kubernetes/AKS and container orchestration
  • Experience with CI/CD tools including GitHub Actions and ArgoCD
  • Solid understanding of observability tooling, especially Grafana
  • Hands-on experience with Java environments (for app debugging/support)
Job Responsibility
Job Responsibility
  • Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt
  • Maintain, monitor, and optimize Kubernetes clusters (AKS)
  • Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD in a GitOps model
  • Enhance reliability through monitoring, alerting, and observability using Grafana (Prometheus, Loki, Tempo is a plus)
  • Automate operational tasks to reduce manual toil
  • Participate in on-call rotations, incident response, and post-mortem reviews
  • Collaborate with development teams to improve application reliability, performance, and scalability
  • Implement and advocate for SRE practices including SLIs, SLOs, and error budgets
  • Continuously improve infrastructure performance, cost efficiency, and security posture
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
Read More
Arrow Right

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
allianceautomotive.co.uk Logo
Alliance Automotive UK LV Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Typically requires a bachelor's degree and three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay
  • Fulltime
Read More
Arrow Right