This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a hands‑on Engineer to design, implement, and optimize AI model training and inference solutions for AMD platforms. The role focuses on end‑to‑end performance and accuracy improvements at the framework, model, and operator levels, with strong emphasis on low‑bitwidth quantization, model compression, and real‑world deployment. You will work closely with AMD hardware and software teams, support customers, and contribute to open‑source projects and inference/training frameworks.
Job Responsibility:
Design, implement, and optimize inference and training pipelines for AMD GPUs/accelerators at the framework, model, and operator levels
Lead research and development of model optimization algorithms: low‑bitwidth quantization, pruning/sparsity, compression, efficient attention mechanisms, and lightweight architectures
Implement and tune CUDA/ROCm/Triton kernels for critical operators
profile and eliminate performance bottlenecks
Integrate and optimize models for PyTorch/JAX and common distributed training/inference stacks (Torchtitan, Megatron, DeepSpeed, HF Transformers, etc.)
Reduce latency and increase throughput for large‑model inference (e.g., batching strategies, caching, speculative decoding)
Contribute to and/or maintain open‑source inference/training tools, ensuring production readiness and community adoption
Provide technical support and guidance to customers and internal teams to achieve target accuracy and performance on AMD platforms
Requirements:
Strong software engineering in Python and C/C++
Practical experience with PyTorch/JAX and building/extending deep learning frameworks
Hands‑on CUDA and/or ROCm development
experience writing or optimizing GPU kernels
Experience with Triton (kernel development/optimization) is highly desired
Proven experience with model optimization techniques, especially low‑bitwidth quantization and other compression methods
Familiarity with GenAI inference engines and optimizations (e.g., vLLM, SGLang, xDiT, continuous batching, speculative decoding)
Skilled at profiling and performance debugging across stack layers (operator → model → framework → hardware)
Nice to have:
Publications or contributions in model optimization / ML systems are a strong plus
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.