This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Framework Engineer for Diffusion Model Inference, you will design, build, and evolve a production-grade inference framework for Diffusion Transformers (DiTs) powering state-of-the-art image and video generation. You will focus on framework-level engineering—model integration, scalable parallel inference, kernel plumbing, packaging, testing, and release management—ensuring diffusion workloads run out-of-the-box with exceptional performance on modern GPU systems.
Job Responsibility:
Develop and maintain a diffusion inference framework for image/video generation with clean APIs and strong compatibility with widely used diffusion ecosystems
Own scalable parallel inference features for DiT workloads—single-node and multi-node
Integrate optimized operator backends (attention, GEMM, quantized paths) by bridging Python/C++ layers and ensuring correctness and high performance
Ship production-grade packaging & releases including containers, versioned artifacts, dependency hygiene, and pip-installable distributions
Collaborate across the GPU software stack and translate framework needs into actionable upstream improvements
Support strategic customers by mapping real-world inference constraints into framework features, reference configurations, and reproducible deployment recipes
Communicate clearly around technical tradeoffs, performance bottlenecks, and roadmap decisions