CrawlJobs Logo

GPU Kernel Performance Engineer

amd.com Logo

AMD

Location Icon

Location:
China , Beijing

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

AMD is looking for an influential software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. Deploy models on AMD Ryzen AI series devices to deliver high-performance, highly reliable deployment solutions. Engage in high-performance operator design, including GPU and NPU operators, and design and develop inference frameworks and inference compilers. Possess strong cross-team collaboration experience.

Job Responsibility:

  • Design and deliver high‑performance computing solutions, providing competitive architectures and implementations for customers
  • Develop high‑performance operators across GPU/NPU platforms, including GEMM, MHA, and CONV
  • Build and optimize inference frameworks and inference compilers
  • Conduct performance evaluation and benchmarking of models and operators
  • Track and study cutting‑edge research papers, reproduce key methodologies, and integrate them into production solutions
  • Document technical work, summarize team achievements, and contribute to patents and publications
  • Build and maintain strong technical relationships with internal teams, industry peers, and ecosystem partners

Requirements:

  • Strong expertise in GPU, NPU, and FPGA architectures, with a deep understanding of accelerator micro‑architecture and computation pipelines
  • Solid knowledge of AI inference, including operator/kernel development, AI compilers, and inference frameworks such as PyTorch and ONNX Runtime
  • Extensive experience in GPU kernel development, with strong proficiency in CUDA and/or HIP programming models
  • Strong object‑oriented programming background
  • proficiency in C/C++ is highly preferred
  • Proven ability to write high‑quality, efficient, and maintainable code, with strong attention to detail and robustness
  • Excellent communication skills and strong analytical/problem‑solving capabilities
  • Doctor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Additional Information:

Job Posted:
March 19, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for GPU Kernel Performance Engineer

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right
New

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices are essential
  • GPU Kernel Development & Optimization: Experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Deep Learning Integration: Experienced in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Software Engineering: Skilled in Python and C++
  • Experience in debugging, performance tuning, and test design
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right
New

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • Solid experience in running large-scale workloads on heterogeneous compute clusters
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
Read More
Arrow Right
New

Sr. Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Ability to define goals, manage development efforts, and deliver high-quality solutions
  • Strong problem-solving skills
  • Proactive approach
  • Keen understanding of software engineering best practices
  • Experience in GPU kernel development & optimization for AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch)
  • Skilled in Python and C++
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
Read More
Arrow Right

Member of Technical Staff - GPU Performance Engineer

Our models and workflows require performance work that generic frameworks don’t ...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Authored custom CUDA kernels (not only calling cuDNN/cuBLAS)
  • Strong understanding of GPU architecture and performance: memory hierarchy, warps, shared memory/register pressure, bandwidth vs compute limits
  • Proficiency with low-level profiling (Nsight Systems/Compute) and performance methodology
  • Strong C/C++ skills
Job Responsibility
Job Responsibility
  • Write high-performance GPU kernels for our novel model architectures
  • Integrate kernels into PyTorch pipelines (custom ops, extensions, dispatch, benchmarking)
  • Profile and optimize training and inference workflows to eliminate bottlenecks
  • Build correctness tests and numerics checks
  • Build/maintain performance benchmarks and guardrails to prevent regressions
  • Collaborate closely with researchers to turn promising ideas into shipped speedups
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right
New

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related fields
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • GPU Kernel Development & Optimization: Deep experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Deep Learning Integration: Strong experienced in integrating optimized GPU performance into machine learning and LLM frameworks (e.g., vLLM, SGlang,TensorFlow, PyTorch)
  • End to end solution optimization: Understand the latest market trend of LLM and multimodal, solid hands-on E2E performance tuning experience on distributed inference (e.g, P/D disaggregation and Large-EP) and RL
  • Software Engineering: Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • High-Performance Computing: Expert experienced in running large-scale workloads on heterogeneous computing clusters
Job Responsibility
Job Responsibility
  • End to end optimization: Build and optimize end to end distributed inference (e.g, P/D disaggregation and Large-EP) and RL solutions on mainstream frameworks like vLLM and SGlang
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right
New

GPU Performance Attainment Engineer

As a senior member of the pre-silicon performance attainment team, you will be a...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Several years of experience in GPU pre-silicon performance analysis and debug
  • Proficiency with performance modeling and simulation tools
  • Strong understanding of GPGPU programming APIs and Machine Learning workloads
  • Expertise in C/C++ /Scripting (Python, Perl, Shell etc.)
  • Experience with hardware description languages such as Verilog is a plus
  • Familiarity with the software stack is a plus, preferably related to GPUs—such as applications, drivers, compilers, and firmware
  • Bachelor's or higher degree in Computer Science, Electrical Engineering, or a closely related field
Job Responsibility
Job Responsibility
  • Debug performance issues and analyze data from the full-chip Emulation Platform, RTL Simulator, and Architecture and Roofline Models
  • Analyze model projection results and identify algorithm issues to find novel solutions for improving the accuracy of projection for different families of products, and over multiple generations
  • Get performance projections for kernels using an analytical model
  • Identify technical problems, break them down, summarize multiple possible solutions, and help the team to make progress
  • Automate processes related to performance infrastructure and data collection tasks, to enhance productivity and refine processes for improved efficiency
  • Engage with the workloads team to acquire and align on required workloads, run the selected workload traces on the performance simulator, analyze the performance results and metrics to root cause any anomalies
  • Collaborate with simulator team to bridge gaps between the performance numbers and the performance targets
  • Influence design trade-offs and optimizations by working closely with compiler, driver, library, and hardware engineers to achieve the highest performance for selected workloads
  • Innovate new algorithmic improvements that exploit the strengths of the hardware architecture to deliver the best possible machine learning performance
Read More
Arrow Right

Founding GPU Kernel Engineer

We're looking for a Founding GPU Kernel Engineer who lives right at the boundary...
Location
Location
United States , San Francisco
Salary
Salary:
285000.00 - 315000.00 USD / Year
workatastartup.com Logo
YC Work at a Startup
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in GPU architecture
  • Proven track record of hand-writing kernels that match or beat vendor libraries (cuBLAS, cuDNN, CUTLASS)
  • Strong skills with low-level profiling tools: Nsight Compute, Nsight Systems, rocprof, or equivalents
  • Experience reading and reasoning about PTX/SASS or GPU assembly
  • Solid systems programming in C++ and CUDA (or ROCm/HIP)
  • Good understanding of how high-level ML operations map to hardware execution
  • Experience with distributed training systems: collective ops like all-reduce and all-gather, NCCL/RCCL, multi-node communication patterns
Job Responsibility
Job Responsibility
  • Write and hand-optimize GPU kernels for ML workloads (matmuls, attention, normalization, etc.) to set the performance ceilings
  • Profile at the microarchitectural level: look into SM utilization, warp stalls, memory bank conflicts, register pressure, instruction throughput
  • Debug performance issues by digging deep into things like clock speeds, thermal throttling, driver behavior, hardware errata
  • Turn your hand-optimization insights into automated compiler passes (working closely with our compiler team)
  • Develop performance models that predict how kernels will behave across different GPU architectures
  • Build tools and methods for systematic kernel optimization
  • Work with NVIDIA, AMD, and emerging AI accelerators - understand the common parts and what's vendor-specific
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • relocation assistance
  • Fulltime
Read More
Arrow Right