This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Want to impact the foundation for future AI storage development in Azure, the world's computer? The Azure Managed Lustre File System (AMLFS) team leads development, deployment, and monitoring of the most popular High-Performance Computing (HPC) parallel file system in the world: Lustre, the Azure storage solution of choice for AI training and fine-tuning. The Pittsburgh-based AMLFS Platform Team is responsible for end-to-end delivery of AMLFS images, cluster deployment, logs and metrics, and configuration compliance. An ideal candidate will also have opportunities to impact cluster architecture and design of Lustre in the Azure ecosystem, performance analysis and optimization of AMLFS, and customer support for the most challenging parallel filesystem bugs or performance anomalies that arise within our product.
Job Responsibility:
Partners with appropriate stakeholders to determine user requirements for a set of scenarios
Leads identification of dependencies and the development of design documents for a product, application, service, or platform
Leads by example and mentors others to produce extensible and maintainable code used across products
Leverages subject-matter expertise of cross-product features with appropriate stakeholders (e.g., project managers) to drive multiple group's project plans, release plans, and work items
Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions
Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check
Nice to have:
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
3+ years of experience: working, developing, and debugging within a Linux operating system environment and at least broad understanding of Linux kernel fundamentals, AND working with filesystem design, development, and debugging, AND with high-performance computing OR distributed systems in an industry or academic setting
6+ years of experience: with high-performance computing OR distributed systems in an industry or academic setting, AND with the Lustre parallel file system OR an equivalent parallel or distributed file system
Experience performing performance analysis and root cause of a distributed or complex system