This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for Machine Learning Systems Engineers who can help us build the world's largest end-to-end 3D native machine learning systems. You will help us build our end to end ML framework dedicated for 3D, from pretraining, to finetuning, inferencing, etc. We expect a combination of strong hands on engineering skills, eagerness to learn new things, and thrives in a fast-paced, high-ownership environment.
Job Responsibility:
Work closely with researchers to co-design the next frontier of 3D & Spatial AI
Build and debug on top of modern PyTorch, for maximum parallelism and efficiency, and build clean and intuitive training infrastructure for our in-house foundational models
Identifying bottlenecks and optimizing for high throughput & efficient distributed model training across hundreds to thousands of GPUs
Implementing and maintaining 3D specific custom operators in Triton or CUDA
Implementing and maintaining novel data-loading framework and libraries
Building efficient inference endpoints with complex multi-stage model pipelines
Optimizing models through compilation, fusion, quantization, etc
Requirements:
Experience in machine learning or high performance graphics
Solid practical understanding of at least one machine learning framework (e.g. PyTorch, JAX)
Strong ability to write beautiful and maintainable code in Python and/or C++
Ability to learn fast and dive into new concepts or complex codebases
Performance and efficiency oriented mindset, with a strong interest in the tiniest detail
Strong communication skills for working in a globally distributed team
Nice to have:
A strong passion to navigate through the PyTorch internals, with hands-on experience in areas like torch.compile , fully_shard (FSDP2) APIs
Experience with building Triton kernels
Experiences with large-scale distributed training, familiarity with modern parallelization techniques: DP, TP, CP, PP, zero redundancy optimizers, etc
Experience with diffusion models in 3D or video
Experience with low precision bf16 or fp8 training