CrawlJobs Logo

Senior GPU Engineer

China, Beijing · Job Posted February 17, 2026
Apply Position
Job Link Share

Job Description

We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team. In this role, you will architect and optimize the core inference engine that powers our large-scale AI models. You will be responsible for pushing the boundaries of hardware performance, reducing latency, and maximizing throughput for Generative AI and Deep Learning workloads. You will work at the intersection of Deep Learning algorithms and low-level hardware, designing custom operators and building a highly efficient training/inference execution engine from the ground up.

Job Responsibility

  • Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
  • Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
  • Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
  • Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
  • Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
  • Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
  • Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
  • Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
  • Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
  • Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
  • Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
  • Mastery of NVIDIA Nsight Systems/Compute
  • Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput

Nice to have

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
  • Working knowledge of state-of-the-art inference/training stacks: sglang, vLLM, TensorRT-LLM, DeepSpeed, or Megatron-LM
  • Deep understanding of optimization patterns: PagedAttention, RadixAttention (Prefix Caching), continuous batching, and speculative decoding
  • Practical experience with CUTLASS, CuTe, or OpenAI Triton
  • Expertise in high-performance linear algebra (GEMM) optimization, including tiling strategies, data layouts, and mixed-precision accumulation
  • Proficiency in multi-GPU/multi-node scaling using NCCL and parallelism strategies (Tensor, Pipeline, and Sequence parallelism)
  • An AI-native mindset: Expert at using vibe coding tools to bypass boilerplate and accelerate the development lifecycle
  • The technical intuition to architect systems rapidly, moving from 'vibe' to 'highly-optimized production code' with extreme velocity

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior GPU Engineer

8 matching positions

Senior GPU Compute Engineer (Rust)

A high-performance computing company building next-generation GPU orchestration ...
Location
Location
Poland
Salary
Salary:
100000.00 EUR / Year
signifytechnology.com Logo
Signify Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with GPU programming (Vulkan, CUDA, OpenCL, or similar)
  • Solid understanding of GPU architecture, memory, and performance optimisation
  • Systems programming background in Rust, C, or C++
  • Experience with low-level programming concepts including DMA, PCIe, and memory management
  • Familiarity with compute shaders, SPIR-V, and GPU debugging/profiling tools
  • Exposure to embedded systems, device drivers, or hardware integration is beneficial
  • Understanding of high-performance networking or RDMA is a plus
Job Responsibility
Job Responsibility
  • Integrate advanced GPU hardware into a distributed orchestration platform
  • Develop and optimise Vulkan compute pipelines and GPU kernels
  • Build low-level systems software in Rust/C++ for GPU control and monitoring
  • Improve GPU scheduling, memory management, and resource utilisation
  • Optimise high-speed GPU-to-GPU communication and RDMA networking
  • Work closely with hardware and SDK teams to troubleshoot performance and integration issues
What we offer
What we offer
  • Work on cutting-edge GPU and compute infrastructure
  • Opportunity to influence architecture and SDK development
  • Collaborative engineering environment with direct hardware exposure
  • Remote working
  • Competitive compensation package
  • Fulltime
Read More
Arrow Right

Senior GPU Software Performance Engineer — Post‑Training

Drive the performance of post‑training workloads on AMD Instinct™ GPUs. You’ll w...
Location
Location
United States , San Jose
Salary
Salary:
204000.00 - 306000.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven GPU performance engineering for deep learning (ROCm/HIP, Triton, or similar)
  • Hands-on with SFT. LoRA and RL-based training at scale
  • Strong PyTorch experience (torch.distributed, FSDP/ZeRO or equivalent)
  • Proficient in Python and C++
  • comfortable reading/writing kernels when needed
  • Experience with distributed systems and collective communication libraries
  • Track record of turning profiles into fixes, upstreaming changes, and documenting results
Job Responsibility
Job Responsibility
  • Lead performance for finetuning and RL training solutions on AMD GPUs
  • Improve throughput, memory efficiency, and stability across data, model, and optimizer steps
  • Optimize multi-GPU/multi-node training and communication patterns
  • Contribute efficient kernels/ops and targeted graph-level optimizations
  • Profile, diagnose, and resolve bottlenecks using standard tooling
  • prevent regressions in CI
  • Ship reproducible pipelines and documentation adopted by internal teams and external developers
  • Collaborate with framework, compiler, and model teams to land durable improvements
  • Fulltime
Read More
Arrow Right

Senior ML Accelerator Engineer - GPU

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congest...
Location
Location
United States , Austin
Salary
Salary:
128700.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 3+ years of relevant industry experience or equivalent experience
  • BS, MS or PhD in CS, or related technical field
  • Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture
  • Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar
  • Strong background in software architecture, library design, and design patterns
  • Strong C++ programming skills with the ability to feel comfortable in large codebases
  • Solid background in system performance, high performance computing and/or architecture-aware optimizations
  • Strong communication skills and the ability to work collaboratively within a team
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads
  • Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack
  • Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans
  • Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production
  • Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review
  • Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Senior AI Models GPU Deployment Software Engineer

Join AMD and help bring cutting-edge AI models to life on AMD GPUs! We’re lookin...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Basic understanding of GPU computing (HIP, CUDA, or OpenCL is a plus)
  • Interest in computer architecture and how hardware works
  • Familiarity with AI concepts (Natural Language Processing, Vision, Audio, Recommendations)
  • Programming skills in C++, Python, or similar languages
  • Ability to debug and test your code
  • Bachelor’s degree in Computer Science, Computer Engineering, or a related field
Job Responsibility
Job Responsibility
  • Help run and improve AI models (like Chatbots, Vision, and MultiModal systems) on AMD GPUs
  • Work with popular AI tools like PyTorch and TensorFlow to make them faster on AMD GPUs
  • Collaborate with open-source communities to share improvements
  • Apply good coding practices to build reliable and efficient software
Read More
Arrow Right

Software Engineer II and Senior Software Engineer - Performance

The Artificial Intelligence Performance team at Microsoft develops AI software t...
Location
Location
United States , Mountain View
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
  • Measure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft silicon
  • Optimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleet
  • Enable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUs
  • Design, implement, and test functions or components for our AI/DNN/LLM frameworks and tools
  • Speeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systems
  • Communicate and collaborate with our partners both internal and external
  • Embody Microsoft's Culture and Values
  • Fulltime
Read More
Arrow Right

Post Silicon Power And Performance Validation Senior Engineer

At AMD, our mission is to build great products that accelerate next-generation c...
Location
Location
Malaysia , Penang
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience validating power or performance features of CPUs, GPUs, or APUs
  • Familiarity with graphics, core, and gaming-focused workloads
  • Familiarity with the use of data acquisition and thermal equipment
  • Experience analyzing and presenting data using Power BI, Sigma, or similar
  • Experience working in a fast-paced, matrixed, and multi-site technical organization
  • Bachelor’s or Master’s in Computer Engineering, Electrical Engineering, Computer Science or equivalent experience
Job Responsibility
Job Responsibility
  • Develop testplans in collaboration with architects, firmware, and IP-specific engineering teams to ensure products meet or exceed performance targets
  • Execute power and performance studies to support business development
  • Validate and tune product features post-silicon
  • Set up and debug systems under test
  • Fulltime
Read More
Arrow Right

Senior Engineer- Artificial Intelligence

We’re looking for a seasoned Senior AI Engineer to join our growing AI team. In ...
Location
Location
Canada , Toronto
Salary
Salary:
126090.00 - 140100.00 CAD / Year
tucows.com Logo
Tucows
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or related field
  • 5+ years of software engineering experience, with recent focus on AI/LLM systems
  • Advanced proficiency in Python and Golang
  • Strong knowledge of software design patterns (SOLID, DRY, CQRS, Saga, event-driven)
  • Deep understanding of the Software Development Life Cycle (SDLC)
  • Proven experience building distributed, highly available systems at scale
  • Strong system design expertise: APIs, async processing, backpressure, fault tolerance
  • Experience with event-driven systems (Kafka, RabbitMQ)
  • Strong engineering practices: TDD, CI/CD, code reviews, and technical debt management
  • Experience writing and communicating Architecture Decision Records (ADRs)
Job Responsibility
Job Responsibility
  • Lead the architecture and development of AI-driven features using Python and Golang
  • Own end-to-end delivery of LLM-based systems — from prototype to production — with a focus on scalability, reliability, and cost efficiency
  • Integrate and fine-tune open-source models (e.g., LLaMA, Mistral, Mixtral) and drive model selection and serving strategies
  • Research and champion emerging AI technologies aligned with product vision
  • Define and uphold architectural best practices through design and code reviews
  • Mentor junior and intermediate engineers, providing technical leadership on complex problems
  • Translate AI capabilities and constraints into clear business context for non-technical stakeholders
  • Shape responsible AI practices, including safety, privacy, and governance
  • Stay current with the open-source AI ecosystem and bring forward relevant innovations
What we offer
What we offer
  • Generous benefits
  • Fair compensation
  • Remote-first work for majority of roles
  • Reasonable accommodation for individuals with disabilities
  • Fulltime
Read More
Arrow Right

Senior Engineer / Tech Lead 3D Vision for Aerial Robotics

Fiducial is a young but fast-growing deep-tech start-up with big ambitions at th...
Location
Location
Netherlands , Delft
Salary
Salary:
Not provided
fdcl.nl Logo
Fiducial
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Good understanding of, and experience with, modern C++
  • Real-time software development experience such as game, simulation, robotics or embedded development
  • Second-nature familiarity with 3D reconstruction concepts: projective geometry, bundle adjustment, PnP, triangulation, feature description, image matching
  • Experience with building photogrammetry, augmented reality, robotics or visual (inertial) odometry (VIO) solutions
  • Experience with 3D Gaussian Splatting or derivatives
  • Deep intuition for transform trees and pose representations used in game engines, computer graphics, and robotics
  • Hands-on experience calibrating camera intrinsics/extrinsics
  • Experience ensuring timestamp integrity and tight time alignment across sensors
  • Experience with GPU programming (CUDA, OpenCL) and graphics pipelines (Vulkan, OpenGL)
  • Experience with AI coding agents and prompt engineering
Job Responsibility
Job Responsibility
  • Wide ownership over some of the core functionalities of Fiducial Scout: our software for full on-board situational awareness for UAVs
  • Responsible for the short-to-medium-term development roadmap
What we offer
What we offer
  • 25 vacation days per year
  • Reimbursed travel expenses and company laptop
  • (Really) flexible working hours and option to work from home 2 days per week when in the Netherlands
  • Working in an international, world-class team of engineers and entrepreneurs
  • Large freedom in how you work and implement solutions
  • The opportunity to take technical risks, to implement stuff the ‘right’ way and to iterate quickly with tight feedback loops
  • Responsibility over the things you implement
  • Traveling abroad to test the solutions you built in relevant environments
  • Fulltime
Read More
Arrow Right