This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly. If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role on the forefront of AI/ML, you’ll be working on features for the largest clusters, with the largest customers, for the largest AI models. The org you would be joining is Annapurna Labs, an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed in Annapurna Labs. We specialize in designing software, systems and chips that optimize the AWS customer experience.
Job Responsibility:
Work on distributed AI/ML systems
Work on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers
Work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale
Work on features for the largest clusters, with the largest customers, for the largest AI models
Work side by side with infrastructure experts, hardware engineers, RTL engineers, scientists & architects
Mentor new and junior engineers
Requirements:
3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language
Solid knowledge of Linux, kernels, and performant code
Experience with embedded systems is valued
Experience with high-speed networking or HPC interconnects is valued highly
Nice to have:
3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
What we offer:
Sign-on payments
Restricted stock units (RSUs)
Health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)