Senior GPU Engineer Job at Microsoft Corporation (Beijing)

Senior GPU Engineer

Microsoft Corporation

Location:
China , Beijing

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team. In this role, you will architect and optimize the core inference engine that powers our large-scale AI models. You will be responsible for pushing the boundaries of hardware performance, reducing latency, and maximizing throughput for Generative AI and Deep Learning workloads. You will work at the intersection of Deep Learning algorithms and low-level hardware, designing custom operators and building a highly efficient training/inference execution engine from the ground up.

Job Responsibility:

Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)

Requirements:

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
4+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
Mastery of NVIDIA Nsight Systems/Compute
Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput

Nice to have:

Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
5+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
Working knowledge of state-of-the-art inference/training stacks: sglang, vLLM, TensorRT-LLM, DeepSpeed, or Megatron-LM
Deep understanding of optimization patterns: PagedAttention, RadixAttention (Prefix Caching), continuous batching, and speculative decoding
Practical experience with CUTLASS, CuTe, or OpenAI Triton
Expertise in high-performance linear algebra (GEMM) optimization, including tiling strategies, data layouts, and mixed-precision accumulation
Proficiency in multi-GPU/multi-node scaling using NCCL and parallelism strategies (Tensor, Pipeline, and Sequence parallelism)
An AI-native mindset: Expert at using vibe coding tools to bypass boilerplate and accelerate the development lifecycle
The technical intuition to architect systems rapidly, moving from 'vibe' to 'highly-optimized production code' with extreme velocity

Additional Information:

Job Posted:
February 17, 2026

Employment Type:

Fulltime

Work Type:

On-site work

Microsoft Corporation - All Job Offers

Job Link Share:

Senior GPU Engineer

Microsoft Corporation

Location:
China , Beijing

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior GPU Engineer

Senior Software Engineer - Rendering

Senior Physical Design Engineer

Senior Systems Software Engineer

Senior DevOps Engineer (GCP)

Senior Research Engineer

Senior Software Engineer, Computational Geometry

Senior AI Engineer

Senior Supplier Quality Engineer

Senior GPU Engineer

Microsoft Corporation

Location:China , Beijing

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior GPU Engineer

Senior Software Engineer - Rendering

Senior Physical Design Engineer

Senior Systems Software Engineer

Senior DevOps Engineer (GCP)

Senior Research Engineer

Senior Software Engineer, Computational Geometry

Senior AI Engineer

Senior Supplier Quality Engineer

Location:
China , Beijing

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 17, 2026