This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a senior member of the pre-silicon performance attainment team, you will be a technical contributor that drives end-to-end delivery of solutions, directly contributing to, and coordinating workload optimization across multiple teams for inference and training of machine learning models. You will collaborate closely with software and hardware teams to plan, develop and optimize use cases. This is an exciting opportunity to work at the cutting edge of GPU computing, influencing design and software strategies to power future datacenter AI and ML deployments.
Job Responsibility:
Debug performance issues and analyze data from the full-chip Emulation Platform, RTL Simulator, and Architecture and Roofline Models
Analyze model projection results and identify algorithm issues to find novel solutions for improving the accuracy of projection for different families of products, and over multiple generations
Get performance projections for kernels using an analytical model
Identify technical problems, break them down, summarize multiple possible solutions, and help the team to make progress
Automate processes related to performance infrastructure and data collection tasks, to enhance productivity and refine processes for improved efficiency
Engage with the workloads team to acquire and align on required workloads, run the selected workload traces on the performance simulator, analyze the performance results and metrics to root cause any anomalies
Collaborate with simulator team to bridge gaps between the performance numbers and the performance targets
Influence design trade-offs and optimizations by working closely with compiler, driver, library, and hardware engineers to achieve the highest performance for selected workloads
Innovate new algorithmic improvements that exploit the strengths of the hardware architecture to deliver the best possible machine learning performance
Requirements:
Several years of experience in GPU pre-silicon performance analysis and debug
Proficiency with performance modeling and simulation tools
Strong understanding of GPGPU programming APIs and Machine Learning workloads
Expertise in C/C++ /Scripting (Python, Perl, Shell etc.)
Experience with hardware description languages such as Verilog is a plus
Familiarity with the software stack is a plus, preferably related to GPUs—such as applications, drivers, compilers, and firmware
Bachelor's or higher degree in Computer Science, Electrical Engineering, or a closely related field
Nice to have:
Experience with hardware description languages such as Verilog is a plus
Familiarity with the software stack is a plus, preferably related to GPUs—such as applications, drivers, compilers, and firmware