CrawlJobs Logo

GPU Kernel Development Engineer

China, Shanghai Employment contract · Job Posted June 10, 2026
Apply Position
Job Link Share

Job Description

As a core member of the team, you will play a pivotal role in optimizing and developing deep learning frameworks for AMD GPUs. Your strong experience will be critical in enhancing GPU kernels, deep learning models, and training/inference performance across multi-GPU and multi-node systems. You will engage with both internal GPU library teams and open-source maintainers to ensure seamless integration of optimizations, utilizing cutting-edge compiler technologies and advanced engineering principles to drive continuous improvement.

Job Responsibility

  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
  • Mentor and Guide: Provide mentorship to junior team members, fostering growth and collaboration through code reviews, knowledge sharing, and technical guidance

Requirements

  • Master's and/or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Strong technical and analytical expertise in C++ development within Linux environments
  • Expert skills in Python and C++
  • Strong experience in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA)
  • Sound understanding of compiler theory and tools like LLVM and ROCm

Nice to have

  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in running large-scale workloads on heterogeneous compute clusters
  • Mentoring junior team members

What we offer

Benefits offered are described: AMD benefits at a glance

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

GPU Kernel Development Engineer

8 matching positions

AI Software Product Engineer (GPU Kernel)

AI Product Applications Engineer (Solution Architect) – China position is in the...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
  • Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
  • Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
  • Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
  • BS required
  • MS preferred, with 6+ years of relevant industry experience
Job Responsibility
Job Responsibility
  • Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
  • Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
  • Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
  • Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
  • Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
  • Provide feedback and requirements for AI software across cloud, client, and edge deployments
Read More
Arrow Right

Founding GPU Kernel Engineer

We're looking for a Founding GPU Kernel Engineer who lives right at the boundary...
Location
Location
United States , San Francisco
Salary
Salary:
285000.00 - 315000.00 USD / Year
workatastartup.com Logo
YC Work at a Startup
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in GPU architecture
  • Proven track record of hand-writing kernels that match or beat vendor libraries (cuBLAS, cuDNN, CUTLASS)
  • Strong skills with low-level profiling tools: Nsight Compute, Nsight Systems, rocprof, or equivalents
  • Experience reading and reasoning about PTX/SASS or GPU assembly
  • Solid systems programming in C++ and CUDA (or ROCm/HIP)
  • Good understanding of how high-level ML operations map to hardware execution
  • Experience with distributed training systems: collective ops like all-reduce and all-gather, NCCL/RCCL, multi-node communication patterns
Job Responsibility
Job Responsibility
  • Write and hand-optimize GPU kernels for ML workloads (matmuls, attention, normalization, etc.) to set the performance ceilings
  • Profile at the microarchitectural level: look into SM utilization, warp stalls, memory bank conflicts, register pressure, instruction throughput
  • Debug performance issues by digging deep into things like clock speeds, thermal throttling, driver behavior, hardware errata
  • Turn your hand-optimization insights into automated compiler passes (working closely with our compiler team)
  • Develop performance models that predict how kernels will behave across different GPU architectures
  • Build tools and methods for systematic kernel optimization
  • Work with NVIDIA, AMD, and emerging AI accelerators - understand the common parts and what's vendor-specific
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • relocation assistance
  • Fulltime
Read More
Arrow Right

GPU Kernel Performance Engineer

AMD is looking for an influential software engineer who is passionate about impr...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in GPU, NPU, and FPGA architectures, with a deep understanding of accelerator micro‑architecture and computation pipelines
  • Solid knowledge of AI inference, including operator/kernel development, AI compilers, and inference frameworks such as PyTorch and ONNX Runtime
  • Extensive experience in GPU kernel development, with strong proficiency in CUDA and/or HIP programming models
  • Strong object‑oriented programming background
  • proficiency in C/C++ is highly preferred
  • Proven ability to write high‑quality, efficient, and maintainable code, with strong attention to detail and robustness
  • Excellent communication skills and strong analytical/problem‑solving capabilities
  • Doctor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Design and deliver high‑performance computing solutions, providing competitive architectures and implementations for customers
  • Develop high‑performance operators across GPU/NPU platforms, including GEMM, MHA, and CONV
  • Build and optimize inference frameworks and inference compilers
  • Conduct performance evaluation and benchmarking of models and operators
  • Track and study cutting‑edge research papers, reproduce key methodologies, and integrate them into production solutions
  • Document technical work, summarize team achievements, and contribute to patents and publications
  • Build and maintain strong technical relationships with internal teams, industry peers, and ecosystem partners
Read More
Arrow Right

Sr. Software Development Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING. At AMD, our mission is to build great pro...
Location
Location
Serbia , Belgrade
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software development experience
  • Excellent C/C++ programming and software design skills, including debugging, performance analysis and test design
  • Experience developing device drivers or other kernel-mode components in a Linux environment (Windows driver experience is a plus)
  • Familiarity with commonly used Linux development and debugging tools (gdb, perf, ftrace, systemtap, etc.)
  • Proven experience leading or owning complex software components or projects from conception to delivery
  • Practical experience in one or more of: GPU virtualization or cloud computing, HPC or AI/ML workloads, GPU architectures (experience with AMD GPU technologies is a plus)
  • Strong expertise in performance tuning and optimization of GPU or system-level software
  • Experience with containerization and orchestration technologies (Docker, Kubernetes, etc.) and their integration with GPU resources
  • Strong communication skills, with the ability to explain complex technical topics clearly to different audiences
  • The candidate must have an undergraduate degree in a related field (Computer Science, Computer or Software Engineering)
Job Responsibility
Job Responsibility
  • Design, implement and maintain kernel-mode and system-level components for AMD’s GPU virtualization stack on Linux and/or Windows
  • Integrate AMD’s GPU software stack with multiple hypervisors (KVM, Hyper‑V, VMware and others)
  • Debug complex issues across layers (driver, firmware, hypervisor, OS, containers, cloud stack)
  • Collaborate with internal component teams and external partners to deliver robust, scalable GPU solutions
  • Use and help refine AI-assisted development and analysis workflows within the team (e.g., for code exploration, test generation, log analysis)
What we offer
What we offer
  • Benefits offered are described at AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • Solid experience in running large-scale workloads on heterogeneous compute clusters
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
Read More
Arrow Right

Senior Software Development Engineer

We are seeking an experienced and highly technical SMTS Software Development Eng...
Location
Location
United Kingdom
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related technical field
  • 8+ years of software engineering experience in systems software, runtime libraries, GPU programming, or compiler/runtime interfaces
  • Strong proficiency in modern C++ (C++14/C++17 or newer), templates, memory models, and low‑level systems programming
  • Deep understanding of at least one GPU computing model (HIP, CUDA, SYCL, OpenCL, OpenMP offload)
  • Hands‑on experience with runtime systems, driver interfaces, or high‑performance compute libraries
  • Strong debugging skills using tools such as gdb, sanitizers, profilers, and GPU debugging tools
  • Solid understanding of parallel programming concepts—memory hierarchy, synchronization, concurrency, thread scheduling
Job Responsibility
Job Responsibility
  • Architect, implement, and optimize features in the HIP runtime, including memory management, kernel dispatch, device abstraction, multi‑GPU coordination, and synchronization primitives
  • Contribute to the evolution of the HIP programming model and interoperability with ROCr, HSA runtime, and compiler toolchains
  • Ensure functional correctness, performance, and scalability of runtime APIs across different GPU generations
  • Conduct root‑cause analysis and systems‑level debugging across the runtime, driver, compiler, and hardware layers
  • Profile GPU applications and internal runtime components to identify bottlenecks and design performance improvements
  • Optimize HIP runtime behavior for large-scale AI, HPC, and cloud workloads
  • Work closely with compiler teams (LLVM/Clang), driver teams, GPU architecture, and systems engineers to deliver end‑to‑end GPU software solutions
  • Contribute to API specifications and collaborate with upstream open-source communities where appropriate
  • Define and drive technical strategy for correctness, reliability, and conformance of the HIP runtime
  • Support enhancements in automated testing, CI, and stress/failure scenarios in the HIP test suite
Read More
Arrow Right

AI Inference/GPU Kernel Engineer

AMD is looking for a specialized software engineer who is passionate about impro...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong object-oriented programming background, C/C++ preferred
  • Ability to write high quality code with a keen attention to detail
  • Experience with modern concurrent programming and threading APIs
  • Experience with Windows, Linux and/or Android operating system development
  • Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
  • Effective communication and problem-solving skills
  • Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Work with AMD’s architecture specialists to improve future products
  • Apply a data minded approach to target optimization efforts
  • Stay informed of software and hardware trends and innovations, especially pertaining to algorithms and architecture
  • Design and develop new groundbreaking AMD technologies
  • Participating in new ASIC and hardware bring ups
  • Debugging/fix existing issues and research alternative, more efficient ways to accomplish the same work
  • Develop technical relationships with peers and partners
Read More
Arrow Right

Senior ML Accelerator Engineer - GPU

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congest...
Location
Location
United States , Austin
Salary
Salary:
128700.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 3+ years of relevant industry experience or equivalent experience
  • BS, MS or PhD in CS, or related technical field
  • Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture
  • Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar
  • Strong background in software architecture, library design, and design patterns
  • Strong C++ programming skills with the ability to feel comfortable in large codebases
  • Solid background in system performance, high performance computing and/or architecture-aware optimizations
  • Strong communication skills and the ability to work collaboratively within a team
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads
  • Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack
  • Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans
  • Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production
  • Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review
  • Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right