This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking skilled engineers to join our Training Tech team working on optimising large scale training jobs as we aim to scale our models through the next order of magnitude. A successful candidate will increase efficiency of training jobs in order to allow Wayve to train larger models faster.
Job Responsibility:
Profile training jobs to identify their bottlenecks, e.g. using NVIDIA Nsight Systems
Design and implement efficiency improvements to maximise MFU, e.g. tensor parallelism, model compilation, mixed precision
Design and implement observability tools, e.g. to track MFU
Collaborate closely with Research teams to integrate training efficiency improvements and create a culture of performance optimization
Requirements:
Experience optimize large scale training jobs on GPU compute clusters
Experience in working in platform teams and working with research teams
Experience in reporting and tracking over time benchmarked performance in an open and accessible way
Ability to write high quality, well-structured and tested Python code
BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience
Nice to have:
Solid experience working with concurrent, parallel and distributed computing
Experience using Nvidia NSight Systems
Experience implementing GPU kernels
Knowledge of computing fundamentals - what makes code fast, secure and reliable