CrawlJobs Logo

Site Reliability Engineer II

India, Hyderabad Employment contract · Job Posted May 28, 2026
Apply Position
Job Link Share

Job Description

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world. Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture. Within Azure Data, the business intelligence team builds and maintains Power BI which is the undisputed leader in BI tool in the industry today and serves as the bridge between data professionals & Office users. Power BI’s core value proposition is to enable end users to get answers & insights from their data through data visualization. Within this organization, the Analysis Services Engine (AS) team builds and evolves the analytical database engine that powers Power BI Semantic Models (Datasets) as well as SQL Server Analysis Services (SSAS) and Azure Analysis Services (AAS). These platforms deliver fast, enterprise-grade analytics at global scale and serve as a critical foundation for customers turning data into decisions. As a Site Reliability Engineer II, you will help ensure the reliability, scalability, availability, and performance of Analysis Services. You will apply a software engineering approach to operations by building automation, improving observability, strengthening incident response, and enhancing system resilience. This role is ideal for an engineer who is passionate about operating large-scale services with high quality and efficiency while enabling fast and safe delivery. We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.

Job Responsibility

  • Work with all aspects of a high throughput and multi-tenant service
  • Collaborate effectively within the team and with partner teams across Microsoft
  • Be part of the on-call rotation for maintaining service health
  • Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams
  • Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement
  • Document and define existing data engineering processes, data and technology, while evaluating them for optimization
  • System Reliability & Uptime – Ensuring high availability of services
  • Incident Management – Detecting, responding to, and mitigating system failures
  • Performance Monitoring – Tracking system health and resolving bottlenecks
  • Automation & Tooling – Reducing manual work through scripts and automation
  • Capacity Planning – Scaling infrastructure efficiently to handle demand
  • Postmortems & Continuous Improvement – Analyzing failures to prevent recurrence

Requirements

  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice to have

  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • 2+ years technical experience working with large-scale cloud or distributed systems

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer II

8 matching positions

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...
Location
Location
Canada , Toronto
Salary
Salary:
115000.00 - 165000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Hands-on experience operating Linux-based systems in production environments
  • Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
  • Experience with container orchestration (e.g., EKS, Kubernetes)
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)
Job Responsibility
Job Responsibility
  • Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
  • Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
  • Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...
Location
Location
Canada , Toronto
Salary
Salary:
115000.00 - 165000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Hands-on experience operating Linux-based systems in production environments
  • Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
  • Experience with container orchestration (e.g., EKS, Kubernetes)
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)
Job Responsibility
Job Responsibility
  • Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
  • Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
  • Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

The IDEAS organization’s mission is to unlock the power of data to deliver actio...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
  • Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell)
  • Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions
  • Effective written and verbal communication skills, and experience collaborating across teams and disciplines
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter
  • The successful candidate must have an active U.S. Government Secret Security Clearance
Job Responsibility
Job Responsibility
  • Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning
  • Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort
  • Instrument services for observability
  • collect and analyze telemetry and health signals
  • and use data to guide reliability and performance improvements
  • Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions
  • Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
  • Support compliance with security, privacy, and accessibility requirements throughout service onboarding and ongoing operations
  • Continuously learn and adopt industry practices and internal tools to improve reliability, performance, and observability
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...
Location
Location
United States , Multiple Locations
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Active U.S. Government Top Secret Security Clearance
  • Ability to pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
  • Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
  • Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
  • Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
  • Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
  • Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
  • Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Site Reliability Engineer II - (Microsoft 365 Enterprise + Cloud). We are lookin...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Mid-level years of software development: automation-related experience is most valued
  • Scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# are most relevant, but others are acceptable
  • Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, microservices, and so on
  • Associated troubleshooting skills, including the ability to follow RPC (Remote Procedure Call) call-chains across arbitrary network steps
  • Consequent understanding of monitoring in distributed systems
  • Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack
  • understanding of how applications are affected by the above, and ability to debug same
  • Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems
  • Practical experience running large scale online systems is always an advantage
Job Responsibility
Job Responsibility
  • Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies
  • identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance
  • Drives the adoption of innovative solutions across engineering teams working with related products within an organization
  • Apply advanced statistical and machine learning techniques to analyze large datasets and extract meaningful insights
  • Experience working with all service aspects of high throughput and multi-tenant services, ability to understand and design workflows carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability
  • Engages with product engineering teams by partaking in code/design reviews, participating in on-call rotations and incident responses throughout product development and operations cycles
  • leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention
  • Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale
  • reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization
  • Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale
  • Fulltime
Read More
Arrow Right