This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
WHAT YOU DO AT AMD CHANGES EVERYTHING. At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
Job Responsibility:
High-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize hardware utilization
Performance Optimization: Analyze and optimize kernel execution for latency and throughput, addressing bottlenecks in memory bandwidth, instruction latency, and thread divergence
Workload Analysis: Evaluate the end-to-end performance impact of individual kernels on full-stack AI models, ensuring that micro-optimizations translate to application-level speedups
Architecture Adaptation: Tailor implementation strategies to leverage specific features of modern GPU architectures (e.g., Matrix Cores, HBM characteristics)
Framework Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines
Requirements:
deep knowledge of Data Center AI workloads such as LLM, Generative AI, Recommendation, NLP, Video Analytics, and/or transformer
hands-on experiences with various AI models, end-to-end pipeline, industry framework / SDKs and solutions
GPU Architecture Mastery
Kernel Programming Expertise: Strong proficiency in C++ and parallel computing, with extensive hands-on experience in NVIDIA CUDA or AMD HIP kernel programming
Performance Engineering: Demonstrated ability to debug and profile complex GPU workloads
Systems Knowledge: Familiarity with asynchronous execution, stream management, and host-device memory transfers
Python DSLs & Triton: Experience implementing kernels using OpenAI Triton or other Python-based DSLs
Inference Engine Experience: Hands-on experience integrating custom kernels into large-scale inference frameworks such as vLLM, SGLang, or TensorRT-LLM
Deep Learning Frameworks: Familiarity with writing custom extensions or operators for PyTorch (C++/CUDA extensions)
Hardware Agnosticism: Experience porting kernels between NVIDIA and AMD architectures or working with cross-platform HPC libraries
BS required
MS preferred with several years of relevant industry experience