This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
AMD is looking for a software engineer who is passionate about Distributed Inferencing on AMD GPUs and improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.
Job Responsibility:
Enable and benchmark AI models on large-scale distributed systems to evaluate performance, accuracy, and scalability
Optimize AI workloads across scale-up (multi-GPU), scale-out (multi-node), and scale-across distributed system configurations
Collaborate closely with internal GPU library teams to analyze and optimize distributed workloads for high throughput and low latency
Develop and apply optimal parallelization strategies for AI workloads to achieve best-in-class performance across diverse system configurations
Contribute to distributed model management systems, model zoos, monitoring frameworks, benchmarking pipelines, and technical documentation
Build and maintain real-time dashboards reporting performance, accuracy, and reliability metrics for internal stakeholders and external users
Requirements:
Undergraduate or Master’s or PhD degree in Computer Science, Computer Engineering, or a related field, or equivalent practical experience
Strong technical expertise in C++/ Python development
Experience solving performance and investigating scalability on multi-GPU, multi-node clusters
Passionate about quality assurance, benchmarking, and automation in the AI/ML space
Strong C/C++ and Python skills, with experience in software design, debugging, performance analysis, and test development
Experience running AI workloads on large-scale, heterogeneous compute clusters
Familiarity with cluster management and orchestration platforms such as SLURM and Kubernetes (K8s)
Experience with GitHub, Jenkins, or similar CI/CD tools and modern development workflows
Nice to have:
Hands-on experience with AI inference or serving frameworks such as vLLM, SGLang, and Llama.cpp
Understanding KV cache transfer mechanisms and technologies (e.g., Mooncake, NIXL/RIXL) and expert parallelization approaches (e.g., DeepEP, MORI, PPLX-Garden)