This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
AMD is looking for a skilled and motivated software engineer to join the Model Automation and Dashboarding (Framework MAD) team — a group focused on ensuring the reliability, performance, and scalability of AI models running on AMD hardware. As part of this team, you’ll build and maintain tools and infrastructure that automate functional and performance validation of deep learning models across ROCm and GPU platforms. Your contributions will directly impact developer confidence, model portability, and transparent benchmarking for internal teams and the open-source community.
Job Responsibility:
Model Testing & Validation: Automate functional and performance testing of AI models across ROCm-supported hardware using scalable tools and pipelines
Software Engineering Excellence: Proficiency in Python and C++ with deep experience in performance tuning, debugging, and robust test design, ensuring reliable, maintainable, high-performance codebases
Benchmarking Infrastructure: Develop tools for continuous benchmarking and regression tracking across hardware generations and ROCm releases
Dashboard & Metrics Development: Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics for both internal and public users
Ecosystem Integration: Collaborate with teams like Deep Learning Models (DLM) and MADengine to support a wide range of models, including public and private/NDA workloads
Client Enablement: Ensure out-of-box confidence for ROCm clients by validating model performance and functionality in standardized and reproducible environments
Scalable Tooling: Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments
Open-Source Contributions: Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm
Requirements:
Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
Strong Python development skills, with experience in test automation, CI/CD, and Linux scripting
Familiarity with AI frameworks (e.g., PyTorch, TensorFlow), model benchmarking, and ML model lifecycles
Strong experience with profiling tools, system monitoring, or regression tracking systems for deep learning models
Solid experience in performance dashboards, visualization tools (e.g., Grafana, Plotly), and metrics collection pipelines
Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development
Strong written and verbal communication skills with a proactive approach to defining and driving development efforts