This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As an AI Performance Engineers you will focus on pushing machine learning workloads to peak hardware efficiency. The emphasis of this call is on analysis, profiling, debugging and optimization at application/workload-level; however a broad understanding of low-level GPU execution and kernel optimization is a major advantage.
Job Responsibility:
Explore and benchmark ML models and workloads (including diffusion models, LLMs, and multimodal systems) to identify bottlenecks across compute, memory, and networking layers
Optimize performance for inference and training on AMD GPUs, including parallelization strategies, quantization techniques, serving orchestration, network communication and distributed execution
Perform deep profiling to uncover inefficiencies in ML frameworks, data pipelines, compiler tools, and key tensor operations such GEMMs, Convs and Attention, to name a few
Support AMD top-tier customers to improve model throughput, reduce latency, and optimize resource utilization across multi-GPU and cluster environments
Work closely with hardware, compiler, and software teams to drive improvements across the full ROCm stack
Communicate performance bottlenecks, solutions, and optimization strategies to stakeholders
Work with international teams located across Europe, US and Asia
Requirements:
Experience with profiling, debugging, benchmarking, and optimization tools
Familiarity with ML frameworks (e.g., PyTorch, JAX, TF) and inference serving frameworks (e.g., vLLM, SGLang)
Strong C++ and/or Python skills, along the basics: unix, git, terminal, debugging, testing, thinking
Experience with Docker, container orchestration (Kubernetes), and job schedulers (Slurm)
Ability to work independently and collaboratively in a multi-cultural team
Excellent communication skills in a fast-moving environment
BSc, MSc, PhD or equivalent experience in Computer Science, Electrical Engineering or a related field
Nice to have:
Experience with AMD tooling (not mandatory if strong fundamentals)
GPU kernel development experience with HIP, CUDA, or OpenCL