CrawlJobs Logo

AI Software Product Engineer (GPU Kernel)

China, Shanghai · Job Posted March 21, 2026
Apply Position
Job Link Share

Job Description

AI Product Applications Engineer (Solution Architect) – China position is in the AMD AI group, located in China. Success in this role will require deep knowledge of Data Center, Client, Endpoint AI workloads such as LLM, Generative AI, Recommendation, and/or transformer … AI cross cloud, client, edge… the candidate needs to have hands-on experiences with various AI models, end-to-end pipeline, industry framework (pytrouch, vLLM, SGLang, llm-d,Triton) / SDKs and solutions.

Job Responsibility

  • Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
  • Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
  • Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
  • Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
  • Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
  • Provide feedback and requirements for AI software across cloud, client, and edge deployments

Requirements

  • Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
  • Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
  • Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
  • Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
  • BS required
  • MS preferred, with 6+ years of relevant industry experience

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI Software Product Engineer (GPU Kernel)

8 matching positions

AI Software Product Engineer (GPU)

AI Product Applications Engineer (GPU AI SW Solution Architect) – China position...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge of Data Center, Client, Endpoint AI workloads such as LLM, Generative AI, Recommendation, and/or transformer
  • Hands-on experiences with various AI models, end-to-end pipeline, industry framework (pytorch, vLLM, SGLang, llm-d,Triton) / SDKs and solutions
  • Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
  • Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
  • Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
  • Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
  • BS required
  • MS preferred, with 6+ years of relevant industry experience
Job Responsibility
Job Responsibility
  • Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
  • Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
  • Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
  • Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
  • Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
  • Provide feedback and requirements for AI software across cloud, client, and edge deployments
Read More
Arrow Right

Principal Ai Software Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great prod...
Location
Location
United States , San Jose
Salary
Salary:
240000.00 - 360000.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge in GPU architectures, basic knowledge of CPU architecture
  • Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
  • Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
  • Experience in hardware/software co-design, building high-performance products across the full product lifecycle
  • Experience with operating systems (OS) and device driver development is a plus
  • Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred
Job Responsibility
Job Responsibility
  • Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
  • Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
  • Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure out-of-the-box performance excellence on AMD hardware
  • Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
  • Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
  • Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource
What we offer
What we offer
  • AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Principal AI Software Engineer

AMD AI Group is seeking a highly influential technical leader for OneROCm — driv...
Location
Location
United States , San Jose
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge in GPU architectures, basic knowledge of CPU architecture
  • Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
  • Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
  • Experience in hardware/software co-design, building high-performance products across the full product lifecycle
  • Experience with operating systems (OS) and device driver development is a plus
  • Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred
Job Responsibility
Job Responsibility
  • Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
  • Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
  • Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure out-of-the-box performance excellence on AMD hardware
  • Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
  • Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
  • Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Senior Software Engineer- AI

Are you looking for an opportunity to work with the latest Azure offerings and p...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in Software Development
  • Strong programming expertise in one or more languages such as Python, Go, Java, or C#, with experience designing production-grade services and APIs
  • Experience building AI-powered applications, including integrating LLMs, implementing agent or Copilot workflows, and orchestrating multi-step AI interactions
  • Hands-on experience with LLM application frameworks and orchestration tools such as Semantic Kernel, LangChain, or similar agent frameworks
  • Familiarity with retrieval-augmented generation (RAG) architectures, vector databases, embeddings, and semantic search systems
  • Experience evaluating and improving model performance through prompt design, evaluation frameworks, fine-tuning, or feedback loops
  • Solid understanding of distributed systems concepts including scalability, reliability, observability, caching, and asynchronous processing
  • Experience deploying and operating AI workloads in cloud environments (preferably Azure), including containerized services and GPU-enabled infrastructure
  • Understanding of Responsible AI practices, including model governance, safety, privacy, and evaluation of AI behaviour in production systems
  • Ability to work across product, research, and engineering teams to translate product scenarios into scalable AI system architectures
Job Responsibility
Job Responsibility
  • Design, build, and operate scalable AI systems that power intelligent product experiences, including Copilot and agent-driven workflows
  • Architect and implement backend services that support multi-step AI interactions, including orchestration pipelines, context management, memory/state persistence, and tool execution
  • Integrate large language models (LLMs), APIs, and internal services to enable context-aware, human-in-the-loop experiences across customer scenarios
  • Build and maintain data and inference pipelines that support model training, fine-tuning, evaluation, and real-time inference across diverse data sources
  • Evaluate, benchmark, and tune AI/ML models (LLMs and traditional models) to meet product requirements for accuracy, latency, reliability, and safety
  • Implement robust retrieval, grounding, and knowledge integration mechanisms (e.g., RAG systems, semantic indexing, vector search) to power intelligent applications
  • Collaborate with product managers, software engineers, and researchers to translate product vision into production-ready AI capabilities and measurable outcomes
  • Ensure reliability, observability, and governance of AI systems, including monitoring model performance, data quality, and responsible AI practices
  • Build reusable platforms, APIs, and tools that enable teams to rapidly develop AI-powered features and self-service intelligent applications
  • Fulltime
Read More
Arrow Right

AI Product Performance Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING. At AMD, our mission is to build great pro...
Location
Location
China , Shenzhen
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • deep knowledge of Data Center AI workloads such as LLM, Generative AI, Recommendation, NLP, Video Analytics, and/or transformer
  • hands-on experiences with various AI models, end-to-end pipeline, industry framework / SDKs and solutions
  • GPU Architecture Mastery
  • Kernel Programming Expertise: Strong proficiency in C++ and parallel computing, with extensive hands-on experience in NVIDIA CUDA or AMD HIP kernel programming
  • Performance Engineering: Demonstrated ability to debug and profile complex GPU workloads
  • Systems Knowledge: Familiarity with asynchronous execution, stream management, and host-device memory transfers
  • Python DSLs & Triton: Experience implementing kernels using OpenAI Triton or other Python-based DSLs
  • Inference Engine Experience: Hands-on experience integrating custom kernels into large-scale inference frameworks such as vLLM, SGLang, or TensorRT-LLM
  • Deep Learning Frameworks: Familiarity with writing custom extensions or operators for PyTorch (C++/CUDA extensions)
  • Hardware Agnosticism: Experience porting kernels between NVIDIA and AMD architectures or working with cross-platform HPC libraries
Job Responsibility
Job Responsibility
  • High-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize hardware utilization
  • Performance Optimization: Analyze and optimize kernel execution for latency and throughput, addressing bottlenecks in memory bandwidth, instruction latency, and thread divergence
  • Workload Analysis: Evaluate the end-to-end performance impact of individual kernels on full-stack AI models, ensuring that micro-optimizations translate to application-level speedups
  • Profiling & Tuning: Utilize advanced GPU profiling tools (e.g., ROCm Profiler, Pytorch Profiler) to identify performance cliffs, stall pipelines, and memory hierarchy inefficiencies
  • Architecture Adaptation: Tailor implementation strategies to leverage specific features of modern GPU architectures (e.g., Matrix Cores, HBM characteristics)
  • Framework Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines
What we offer
What we offer
  • AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

AI Inference/GPU Kernel Engineer

AMD is looking for a specialized software engineer who is passionate about impro...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong object-oriented programming background, C/C++ preferred
  • Ability to write high quality code with a keen attention to detail
  • Experience with modern concurrent programming and threading APIs
  • Experience with Windows, Linux and/or Android operating system development
  • Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
  • Effective communication and problem-solving skills
  • Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Work with AMD’s architecture specialists to improve future products
  • Apply a data minded approach to target optimization efforts
  • Stay informed of software and hardware trends and innovations, especially pertaining to algorithms and architecture
  • Design and develop new groundbreaking AMD technologies
  • Participating in new ASIC and hardware bring ups
  • Debugging/fix existing issues and research alternative, more efficient ways to accomplish the same work
  • Develop technical relationships with peers and partners
Read More
Arrow Right

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right

Fellow, AI Software Architecture

AMD AI Group is seeking a highly influential technical leader for the role of AM...
Location
Location
United States , San Jose
Salary
Salary:
268000.00 - 402000.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge in GPU architectures, basic knowledge of CPU architecture
  • Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
  • Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
  • Experience in hardware/software co-design, building high-performance products across the full product lifecycle
  • Experience with operating systems (OS) and device driver development is a plus
  • Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred
Job Responsibility
Job Responsibility
  • Strategic Leadership: Set the technical vision and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
  • Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
  • Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure 'out-of-the-box' performance excellence on AMD hardware
  • Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
  • Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
  • Community & Mentorship: Act as a technical ambassador in industry forums and open-source communities. Mentor and inspire the next generation of AMD's technical leaders and engineers.
  • Fulltime
Read More
Arrow Right