This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for software engineers to help scale and improve the efficiency of large AI training and inference for HW accelerators. A core part of this is optimising collective operations to enable optimised network utilisation for data sharing. This is an opportunity to work within a highly skilled team, collaborating with a large set of cross-functional partners and help bringing next generation large cluster architectures to life.
Job Responsibility:
Work on collective communications stacks to optimise networking operations, leading to improved AI inference and training model performance
Drive implementation of latency and bandwidth critical networking operations, as well as out-of-band signalling
Debug custom and third party multi-host, accelerator enabled AI platforms
Software development using C++/C and Python
Work closely with other teams to deliver impact
develop & improve features and innovations
Extend and optimize large scale learning collective operations
Requirements:
3+ years of experience developing in C++/C and Python
Experience with High Performance Computing/Networking or AI systems applications frameworks
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
Specialized experience in one or more of the following machine learning/deep learning domains: Hardware accelerators, AI Infrastructure, or high performance networking
Solid experience in debugging of distributed systems, revision control systems, testing, and CI pipelines
Nice to have:
Experience and understanding of AI/HPC systems
Deep understanding of the transport stack (e.g. RDMA/RoCE, Infiniband, TCP/IP), its constraints and performance measures and how transport considerations enable the collective communications stack
Experience in one or more of the following machine learning/deep learning domains: hardware accelerators, AI Infrastructure, and/or high performance computing (HPC), particularly pertaining to interconnect and collective communications stacks
Familiarity with relevant tools, libraries, and frameworks (like PyTorch, NCCL, MPI, CUDA)