This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an AI Compiler Engineer to join this high impact team working in the growing field of on-device AI inference acceleration as an individual contributor or as a technical lead in the AI group (AIG). As an AI Compiler Engineer, you will design and optimize AI compiler stack and tools that enable efficient execution of state-of-the-art open source as well as proprietary AI models such as LLMs, transformer models, etc., to AMD NPUs for on-device AI inference use-cases.. You will work on transforming high-level AI models into efficient, low-level code that can run on NPU. Your work will directly impact the performance, efficiency, and scalability of our AI solutions.
Job Responsibility:
Graph transformation
Constant folding
Operator fusion: Identify and implement performance optimization opportunities by reducing memory traffic through operator fusion at different memory hierarchy levels e.g., attention block
Common subexpression elimination
Problem partitioning and dataflow orchestration: Design of algorithms to optimally map given AI operation to the NPU comprising of an interconnected array of AI engines
Design and implementation of algorithms to orchestrate dataflow through multi-level memory hierarchy
Kernel Design and Development: Design and implement highly optimized C++/intrinsic based kernels for AI related operators
Develop vectorized code that leverages SIMD (Single Instruction, Multiple Data) and VLIW (Very Long Instruction Word) for optimal performance
Perform performance, program memory and accuracy tradeoffs
Testing and Validation: Develop CPU models for the ML operators in C++/ Python to validate accuracy
Write unit tests and integration tests to ensure correctness and reliability
Performance Profiling and Tuning: Profile and analyze the performance of model layers
Identify performance/accuracy bottlenecks and alleviate those
Documentation and Collaboration: Effective technical communication of day-to-day work and document design specs
Follow good coding practices, using version control system
Collaborate with cross-functional teams spanning over AI research, core architecture and software engineering
Requirements:
Excellent C/C++ and Python coding skills
Good understanding of SIMD, VLIW processor architecture
Experience with vectorized programming (SIMD)
Thorough understanding of fixed and floating point arithmetic
Good understanding of various operators in state-of-the-art AI models
Knowledge of low-level hardware details (cache hierarchy, DMA programming)
Excellent problem-solving skills especially on debug and a passion for on-device AI
Prefer candidates with past experience on AI compiler design
BS/Masters/PhD degree in Computer Science, Electrical Engineering, or a related field