This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The successful candidate will assume responsibility for post-silicon activities related to performance characterization and optimization of AMD Datacenter products, spanning both single-node and multi-node deployments.
Job Responsibility
Develop and maintain automation frameworks for workload execution and performance data collection, enabling scalable and repeatable characterization across configurations
Become a key stakeholder in the product power and performance definition process, ensuring alignment between architectural goals and measured silicon performance
Develop, execute, and evolve performance characterization and optimization test plans across diverse usage scenarios, including High Performance Computing (HPC) and Machine Learning (ML) workloads
Drive performance attainment for both scale-up (intra-node) and scale-out (multi-node) configurations, including: Multi-GPU scaling efficiency within a node, Interconnect bandwidth utilization (e.g., XGMI / Infinity Fabric), Collective communication efficiency and communication-compute overlap, Workload scaling behavior (strong and weak scaling), Identification and mitigation of system-level bottlenecks across distributed environments
Analyze interactions between power management features and performance behavior, optimizing configurations to achieve the best performance and performance-per-watt tradeoffs
Identify architectural and system-level bottlenecks and develop strategies to stress, expose, and mitigate worst-case performance scenarios
Support prototyping and experimentation efforts to evaluate enhancements and new features that impact performance
Debug and troubleshoot system-level issues across hardware, firmware, and software stacks observed in lab and production test environments
Collaborate with cross-functional teams (architecture, firmware, drivers, platform, and workload teams) to drive root-cause analysis through to resolution and performance closure
Proactively drive continuous improvement of post-silicon performance methodologies, tools, and workflows
Requirements
Proven leadership skills with experience mentoring junior engineers, coordinating cross-functional teams, and driving complex performance characterization and optimization efforts across multiple locations
Strong programming skills, with preference for Python and experience with ML frameworks (e.g., TensorFlow or PyTorch)
Proficiency in C/C++, scripting (Shell), and familiarity with performance tooling and automation workflows
Strong understanding of computer architecture and system organization
Deep knowledge of HPC and ML workloads, including scaling behavior and performance bottlenecks
Experience with scale-up and scale-out performance analysis at rack-level and cluster-level deployments
Strong analytical and problem-solving skills, with a high level of attention to detail
Excellent interpersonal, collaboration, and communication skills
Bachelor's or Master's degree in Computer Engineering, Electrical Engineering, Computer Science, or related field