GPU Kernel Development Engineer Job at AMD (Shanghai)

AI Software Product Engineer (GPU Kernel)

AI Product Applications Engineer (Solution Architect) – China position is in the...

Location

China , Shanghai

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
BS required
MS preferred, with 6+ years of relevant industry experience

Job Responsibility

Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
Provide feedback and requirements for AI software across cloud, client, and edge deployments

Founding GPU Kernel Engineer

We're looking for a Founding GPU Kernel Engineer who lives right at the boundary...

Location

United States , San Francisco

Salary:

285000.00 - 315000.00 USD / Year

YC Work at a Startup

Expiration Date

Until further notice

Requirements

Deep expertise in GPU architecture
Proven track record of hand-writing kernels that match or beat vendor libraries (cuBLAS, cuDNN, CUTLASS)
Strong skills with low-level profiling tools: Nsight Compute, Nsight Systems, rocprof, or equivalents
Experience reading and reasoning about PTX/SASS or GPU assembly
Solid systems programming in C++ and CUDA (or ROCm/HIP)
Good understanding of how high-level ML operations map to hardware execution
Experience with distributed training systems: collective ops like all-reduce and all-gather, NCCL/RCCL, multi-node communication patterns

Job Responsibility

Write and hand-optimize GPU kernels for ML workloads (matmuls, attention, normalization, etc.) to set the performance ceilings
Profile at the microarchitectural level: look into SM utilization, warp stalls, memory bank conflicts, register pressure, instruction throughput
Debug performance issues by digging deep into things like clock speeds, thermal throttling, driver behavior, hardware errata
Turn your hand-optimization insights into automated compiler passes (working closely with our compiler team)
Develop performance models that predict how kernels will behave across different GPU architectures
Build tools and methods for systematic kernel optimization
Work with NVIDIA, AMD, and emerging AI accelerators - understand the common parts and what's vendor-specific

What we offer

bonus
equity
benefits
relocation assistance

Fulltime

GPU Kernel Performance Engineer

AMD is looking for an influential software engineer who is passionate about impr...

Location

China , Beijing

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Strong expertise in GPU, NPU, and FPGA architectures, with a deep understanding of accelerator micro‑architecture and computation pipelines
Solid knowledge of AI inference, including operator/kernel development, AI compilers, and inference frameworks such as PyTorch and ONNX Runtime
Extensive experience in GPU kernel development, with strong proficiency in CUDA and/or HIP programming models
Strong object‑oriented programming background
proficiency in C/C++ is highly preferred
Proven ability to write high‑quality, efficient, and maintainable code, with strong attention to detail and robustness
Excellent communication skills and strong analytical/problem‑solving capabilities
Doctor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Job Responsibility

Design and deliver high‑performance computing solutions, providing competitive architectures and implementations for customers
Develop high‑performance operators across GPU/NPU platforms, including GEMM, MHA, and CONV
Build and optimize inference frameworks and inference compilers
Conduct performance evaluation and benchmarking of models and operators
Track and study cutting‑edge research papers, reproduce key methodologies, and integrate them into production solutions
Document technical work, summarize team achievements, and contribute to patents and publications
Build and maintain strong technical relationships with internal teams, industry peers, and ecosystem partners

Sr. Software Development Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING. At AMD, our mission is to build great pro...

Location

Serbia , Belgrade

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

5+ years of professional software development experience
Excellent C/C++ programming and software design skills, including debugging, performance analysis and test design
Experience developing device drivers or other kernel-mode components in a Linux environment (Windows driver experience is a plus)
Familiarity with commonly used Linux development and debugging tools (gdb, perf, ftrace, systemtap, etc.)
Proven experience leading or owning complex software components or projects from conception to delivery
Practical experience in one or more of: GPU virtualization or cloud computing, HPC or AI/ML workloads, GPU architectures (experience with AMD GPU technologies is a plus)
Strong expertise in performance tuning and optimization of GPU or system-level software
Experience with containerization and orchestration technologies (Docker, Kubernetes, etc.) and their integration with GPU resources
Strong communication skills, with the ability to explain complex technical topics clearly to different audiences
The candidate must have an undergraduate degree in a related field (Computer Science, Computer or Software Engineering)

Job Responsibility

Design, implement and maintain kernel-mode and system-level components for AMD’s GPU virtualization stack on Linux and/or Windows
Integrate AMD’s GPU software stack with multiple hypervisors (KVM, Hyper‑V, VMware and others)
Debug complex issues across layers (driver, firmware, hypervisor, OS, containers, cloud stack)
Collaborate with internal component teams and external partners to deliver robust, scalable GPU solutions
Use and help refine AI-assisted development and analysis workflows within the team (e.g., for code exploration, test generation, log analysis)

What we offer

Benefits offered are described at AMD benefits at a glance

Fulltime

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...

Location

China , Shanghai

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
Skilled in Python and C++, with experience in debugging, performance tuning, and test design
Solid experience in running large-scale workloads on heterogeneous compute clusters

Job Responsibility

Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions

What we offer

Benefits offered are described: AMD benefits at a glance

Senior Software Development Engineer

We are seeking an experienced and highly technical SMTS Software Development Eng...

Location

United Kingdom

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related technical field
8+ years of software engineering experience in systems software, runtime libraries, GPU programming, or compiler/runtime interfaces
Strong proficiency in modern C++ (C++14/C++17 or newer), templates, memory models, and low‑level systems programming
Deep understanding of at least one GPU computing model (HIP, CUDA, SYCL, OpenCL, OpenMP offload)
Hands‑on experience with runtime systems, driver interfaces, or high‑performance compute libraries
Strong debugging skills using tools such as gdb, sanitizers, profilers, and GPU debugging tools
Solid understanding of parallel programming concepts—memory hierarchy, synchronization, concurrency, thread scheduling

Job Responsibility

Architect, implement, and optimize features in the HIP runtime, including memory management, kernel dispatch, device abstraction, multi‑GPU coordination, and synchronization primitives
Contribute to the evolution of the HIP programming model and interoperability with ROCr, HSA runtime, and compiler toolchains
Ensure functional correctness, performance, and scalability of runtime APIs across different GPU generations
Conduct root‑cause analysis and systems‑level debugging across the runtime, driver, compiler, and hardware layers
Profile GPU applications and internal runtime components to identify bottlenecks and design performance improvements
Optimize HIP runtime behavior for large-scale AI, HPC, and cloud workloads
Work closely with compiler teams (LLVM/Clang), driver teams, GPU architecture, and systems engineers to deliver end‑to‑end GPU software solutions
Contribute to API specifications and collaborate with upstream open-source communities where appropriate
Define and drive technical strategy for correctness, reliability, and conformance of the HIP runtime
Support enhancements in automated testing, CI, and stress/failure scenarios in the HIP test suite

AI Inference/GPU Kernel Engineer

AMD is looking for a specialized software engineer who is passionate about impro...

Location

China , Beijing

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Strong object-oriented programming background, C/C++ preferred
Ability to write high quality code with a keen attention to detail
Experience with modern concurrent programming and threading APIs
Experience with Windows, Linux and/or Android operating system development
Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
Effective communication and problem-solving skills
Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Job Responsibility

Work with AMD’s architecture specialists to improve future products
Apply a data minded approach to target optimization efforts
Stay informed of software and hardware trends and innovations, especially pertaining to algorithms and architecture
Design and develop new groundbreaking AMD technologies
Participating in new ASIC and hardware bring ups
Debugging/fix existing issues and research alternative, more efficient ways to accomplish the same work
Develop technical relationships with peers and partners

Senior ML Accelerator Engineer - GPU

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congest...

Location

United States , Austin

Salary:

128700.00 - 261300.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Minimum 3+ years of relevant industry experience or equivalent experience
BS, MS or PhD in CS, or related technical field
Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture
Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar
Strong background in software architecture, library design, and design patterns
Strong C++ programming skills with the ability to feel comfortable in large codebases
Solid background in system performance, high performance computing and/or architecture-aware optimizations
Strong communication skills and the ability to work collaboratively within a team
Excellent analytical and problem-solving skills

Job Responsibility

Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads
Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack
Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans
Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production
Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review
Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs

What we offer

medical
dental
vision
Health Savings Account
Flexible Spending Accounts
retirement savings plan
sickness and accident benefits
life insurance
paid vacation & holidays
tuition assistance programs

Fulltime

Select Country

GPU Kernel Development Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?