This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Azure Specialized is chartered to integrate mission critical workloads with Azure. We have shipped Oracle Databases@Azure, Azure NetApp Files, Cray Supercomputer, and more are in the pipeline. We deliver end-to-end networking solutions that connect non-cloud-ready hardware to Azure, enabling new services for our customers. It is a highly impactful team with strong growth potential. We are seeking a strong Infrastructure/Site Reliability Engineer with proven experience running large-scale production systems, managing incidents, and building resilient, automated infrastructure. If you are passionate about cutting-edge work that builds skills in Networking and Cloud Platforms, this is the team for you! You will gain experience in service-oriented network architectures and large-scale datacenter networking. Our work spans from designing high-performance, secure networking stacks to delivering automation and monitoring systems that keep Azure running at scale. You will partner with teams across Azure to deliver innovative solutions that solve real-world challenges while shaping the networking foundation for some of the most demanding cloud workloads.
Job Responsibility:
Design, build, produce and deliver software to improve the usability, reliability, scalability, performance, security and highly available infrastructure using Azure Networking services having independence, sense of ownership and drive for areas of ownership.
Own / Troubleshoot End to End hardware/software issues across L2/L3 networking stack, device OS, telemetry, and infrastructure dependencies.
Innovate on the Software-Defined Networking(SDN) platform to provide consistent connectivity for a heterogeneous mix of workloads.
Proactively identify and resolve production issues, performing root cause analysis and implementing long-term fixes to reduce operational toil.
Collaborate with multiple partner teams and gain broad exposure to core networking technologies end-to-end.
Develop, set up, and execute tests before feature releases.
Create monitoring and diagnostic tools to ensure quality of service.
Participate in on-call rotations, incident response, and postmortem analysis to ensure continuous learning and resilience improvement.
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Nice to have:
Bachelor's Degree in Computer Science OR related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python
OR Master's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Prior experience as an infrastructure, systems, or site reliability engineer in large-scale or high-availability environments.
Proficiency in cloud infrastructure concepts (compute, networking, storage, security) and in automating cluster or data center operations
Experience working with large-scale infrastructure, data centers, or cloud platforms and a good understanding of L2/L3 networking.
Solid Understanding of Datacenter Networking and Cloud Environments.
Experience with monitoring/logging platforms and strong problem solving and software troubleshooting skills.
Understanding of system performance, incident response, and troubleshooting in production environments.
Strong experience and knowledge working with databases such as SQL / KQL etc.