This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is seeking a passionate Software Engineer 2 to help build, operate, and support hyperscale cloud infrastructure for some of the world’s largest supercomputing deployments. You’ll work alongside experienced engineers to develop, monitor, and troubleshoot cloud-native supercomputing systems, contributing to the reliability and performance of Azure’s AI infrastructure offerings.
Job Responsibility:
Be proactive and innovative about adding new metrics for monitoring the health of the supercomputers
Collaborate with team members and stakeholders to understand requirements and produce detailed, data-driven, collaborative design for assigned features
Independently uses appropriate artificial intelligence tools and practices across the software development lifecycle to develop, test, debug, and maintain code for Supercomputer health monitoring systems
Remain current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
Act as a Designated Responsible Individual (DRI) working on-call to monitor system/product feature/service for degradation, downtime, or interruptions and gain approval to restore system/product/service for simple problems
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check
Nice to have:
Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience OR Master’s degree in Computer Science or related technical field AND 3+ years technical engineering experience
Experience with monitoring, profiling, or debugging distributed systems or cloud applications
Familiarity with AI/HPC workloads, GPU-based systems, AI assisted software development and secure software design practices
Familiarity with IaaS operating model and SLA commitments