CrawlJobs Logo

Engineering Manager, GPU Kernel

wayve.ai Logo

Wayve

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As the Engineering Manager for the GPU Kernel team, you’ll lead the team responsible for writing custom kernels and libraries which enable our transformer-based driving models to run efficiently on embedded GPUs and accelerators. This team works closely with ML engineers, software engineers and researchers to deploy end-to-end AI for autonomous vehicles at scale. This is an exciting opportunity to lead in several high impact, early stage projects at Wayve with the ultimate goal of enabling product deployments onto millions of customer vehicles around the world.

Job Responsibility:

  • Lead a multi-disciplinary team of ML GPU kernel engineers to enable efficient ML deployments across millions of customer vehicles
  • Set key foundational strategy on deployment frameworks, compilers, toolchains and SoCs
  • Set clear objectives and priorities, and allocate resource efficiently
  • Have opportunities to develop new skills, especially within end-to-end ML and inference optimisation

Requirements:

  • Proven experience as an Engineering Manager delivering complex engineering projects
  • Experience developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT, MLIR, TVM, etc)
  • Experience optimising systems to meet strict utilisation and latency requirements
  • Excellent interpersonal and communication skills

Nice to have:

  • Experience with C++ and ML frameworks such as PyTorch
  • Experience with ML deployment pipelines
  • Experience with embedded SoCs used in automotive environments, e.g. Nvidia, Qualcomm, Renesas, etc

Additional Information:

Job Posted:
January 01, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager, GPU Kernel

Sr. Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Ability to define goals, manage development efforts, and deliver high-quality solutions
  • Strong problem-solving skills
  • Proactive approach
  • Keen understanding of software engineering best practices
  • Experience in GPU kernel development & optimization for AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch)
  • Skilled in Python and C++
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
Read More
Arrow Right

Senior ML Accelerator Engineer - GPU

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congest...
Location
Location
United States , Austin
Salary
Salary:
128700.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 3+ years of relevant industry experience or equivalent experience
  • BS, MS or PhD in CS, or related technical field
  • Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture
  • Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar
  • Strong background in software architecture, library design, and design patterns
  • Strong C++ programming skills with the ability to feel comfortable in large codebases
  • Solid background in system performance, high performance computing and/or architecture-aware optimizations
  • Strong communication skills and the ability to work collaboratively within a team
  • Excellent analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads
  • Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack
  • Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans
  • Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production
  • Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review
  • Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Principal Open Source AI/ML Solutions Engineer

The Senior Member in the GPU domain is a technical role responsible for owning t...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong C++ and Python programming skills
  • Performance analysis skills for both CPU and GPU
  • Good knowledge of AI/ML Frameworks and Architecture
  • Basic GPU kernel programming knowledge
  • Experience with software engineering methodologies such as Agile, Scrum, Kanban
  • Experience in all the phases of software development, from requirement gathering, analysis, design, development, testing to final release
  • Experience developing software in an end customer product delivery environment
  • Experience with open-source software development including collaboration with community maintainers and submitting contributions
  • Excellent analytical and problem-solving skills
  • Strong communication skills to effectively convey complex technical concepts to both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Architectural Design: Own architectural design and development of GPU software components, ensuring alignment with industry standards and best practices
  • Technical Leadership: Act as one of the subject matter experts in GPU technologies, providing guidance and mentorship to junior engineers in the team on complex technical challenges
  • Software Development: Design, write, and deliver high-quality open software solutions that enhance GPU performance and capabilities. This includes developing drivers, APIs, and other critical software components
  • Research and Innovation: Conduct research to explore new technologies and methodologies that can improve GPU performance and efficiency. Propose innovative solutions to meet evolving market demands
  • Collaboration: Work collaboratively with cross-functional teams, including hardware engineers, system architects, and product managers, to ensure successful integration of GPU technologies into broader systems
  • Documentation and Standards: Develop comprehensive technical documentation and establish coding standards to ensure maintainability and scalability of software products
Read More
Arrow Right

Engineering Manager, Kernel Reliability

We're looking for a deeply technical, hands-on engineering leader for our on-fie...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering
  • 3+ years leading teams in SW/HW reliability, debug, diagnostic, failure analysis or related fields
  • Expertise in parallel and distributed programming (message passing, multicore, GPU, embedded, etc.)
  • Expertise in debug and diagnostic tool development or expert usage (debuggers, core dump handling, code sanitizers, etc.)
  • Experience debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.)
  • Deep understanding of computer architectures (instruction pipelining, multithreading, networking, etc.)
  • Strong background in monitoring and reliability engineering (incident response, post-mortem analysis, etc.)
  • Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products.
Job Responsibility
Job Responsibility
  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic
  • Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis
  • Collaborate with SW teams to improve the software stack, including Kernels, to improve on-field debugging and failure analysis
  • Work with the ASIC and HW architecture teams to codesign the next generation architectures with reliability and ease of debug in mind
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs.
Read More
Arrow Right

Senior Machine Learning Engineer

As a Machine Learning Engineer at Dedrone, you’ll play a pivotal role in advanci...
Location
Location
United States , Sterling
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in modern C++ (C++14/17 or later), with strong object-oriented and generic programming skills
  • Deep understanding of multithreading and concurrency (threads, thread pools, locks, lock-free structures, atomics, futures, async patterns) and experience building robust, concurrent systems
  • Hands-on experience with parallel processing frameworks or patterns (SIMD, task-based parallelism, GPU offload, or similar) for real-time or high-throughput applications
  • Strong command of data structures and algorithms, and the ability to choose and implement the right structures for performance-critical, memory-constrained environments
  • Proven experience with memory management and performance optimization in C++ (stack vs heap, custom allocators, cache-aware design, avoiding fragmentation, RAII, move semantics)
  • Practical experience with CUDA (or similar GPU programming frameworks): writing kernels, managing GPU memory, optimizing for occupancy and bandwidth, and integrating with C++ codebases
  • Familiarity with Linux-based development (build systems like CMake, unit testing frameworks, containerization and/or cross-compilation for edge devices)
  • Strong debugging and profiling skills across CPU and GPU, and a methodical approach to benchmarking and regression testing
  • Excellent collaboration and communication skills, with a track record of working closely with research or ML teams to move algorithms from prototype to production
Job Responsibility
Job Responsibility
  • Design and implement high-performance C++ software that runs computer vision and tracking algorithms in real time on edge devices
  • Work closely with computer vision / self-supervised learning engineers to integrate their models into production pipelines, including pre/post-processing, I/O, and system orchestration
  • Build and optimize multithreaded and parallel processing pipelines for ingesting, synchronizing, and processing data from a networked system of cameras
  • Implement and tune CUDA kernels and GPU-accelerated components to maximize throughput and minimize latency for inference, tracking, and search
  • Design robust data structures and memory management strategies for handling large volumes of video, sensor, and metadata streams under tight compute and power constraints
  • Profile and optimize code using tools such as perf, valgrind, nvprof / Nsight, and similar to identify bottlenecks and improve CPU/GPU utilization
  • Collaborate with simulation and CV teams to deploy and evaluate algorithms in realistic test scenarios, including fault handling and performance monitoring
  • Develop clean, well-tested, and well-documented C++ libraries and services that can be reused across products and future airspace applications
  • Contribute to system-level architecture decisions, including inter-process communication, scheduling, resource allocation, and deployment strategies on edge platforms
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Software Development Engineer

We are seeking an experienced and highly technical SMTS Software Development Eng...
Location
Location
United Kingdom
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related technical field
  • 8+ years of software engineering experience in systems software, runtime libraries, GPU programming, or compiler/runtime interfaces
  • Strong proficiency in modern C++ (C++14/C++17 or newer), templates, memory models, and low‑level systems programming
  • Deep understanding of at least one GPU computing model (HIP, CUDA, SYCL, OpenCL, OpenMP offload)
  • Hands‑on experience with runtime systems, driver interfaces, or high‑performance compute libraries
  • Strong debugging skills using tools such as gdb, sanitizers, profilers, and GPU debugging tools
  • Solid understanding of parallel programming concepts—memory hierarchy, synchronization, concurrency, thread scheduling
Job Responsibility
Job Responsibility
  • Architect, implement, and optimize features in the HIP runtime, including memory management, kernel dispatch, device abstraction, multi‑GPU coordination, and synchronization primitives
  • Contribute to the evolution of the HIP programming model and interoperability with ROCr, HSA runtime, and compiler toolchains
  • Ensure functional correctness, performance, and scalability of runtime APIs across different GPU generations
  • Conduct root‑cause analysis and systems‑level debugging across the runtime, driver, compiler, and hardware layers
  • Profile GPU applications and internal runtime components to identify bottlenecks and design performance improvements
  • Optimize HIP runtime behavior for large-scale AI, HPC, and cloud workloads
  • Work closely with compiler teams (LLVM/Clang), driver teams, GPU architecture, and systems engineers to deliver end‑to‑end GPU software solutions
  • Contribute to API specifications and collaborate with upstream open-source communities where appropriate
  • Define and drive technical strategy for correctness, reliability, and conformance of the HIP runtime
  • Support enhancements in automated testing, CI, and stress/failure scenarios in the HIP test suite
Read More
Arrow Right

ROCm Core SW Project Manager

We are seeking an experienced Project Manager to manage ROCm development project...
Location
Location
Canada , Markham
Salary
Salary:
139200.00 - 208800.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of program or project management experience in software development
  • At least 3 years focused on systems software, GPU computing, or HPC/AI infrastructure
  • Demonstrated experience managing complex, multi-team technical programs involving pre-silicon validation or hardware/software co-design
  • Strong foundational knowledge of machine learning frameworks, model architectures, and performance optimization techniques
  • Deep understanding of software development lifecycle (SDLC), agile methodologies, and modern CI/CD practices
  • Excellent stakeholder management, communication, and influencing skills across engineering and executive levels
  • Bachelor’s degree in Computer Science, Electrical Engineering, or related technical field
Job Responsibility
Job Responsibility
  • Manage ROCm development projects for AMD next generation GPUs
  • Drive internal SW execution including GPU performance optimization, pre-silicon performance feature development, and GPU kernel development
  • Coordinate across software, hardware, and validation teams to deliver high-performance, reliable, and scalable ROCm software stack
  • Work together with ROCm SW team to drive pre-silicon software development and performance validation activities using SW/HW emulation platforms
  • Orchestrate hardware-software co-development efforts for new GPU ML features
  • Establish and track KPIs for new GPU feature quality, performance, and time-to-market
  • Proactively identify and mitigate project risks
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

As a Senior Software Engineer, you will lead the design, development, and valida...
Location
Location
United States , Multiple Locations
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 2+ years experience in Kernel bring-up and platform enablement
  • 1+ years experience in GPU driver development and integration
  • 2+ years experience in C / C++ kernel-space programming, Git-based source management and release branching, RPM packaging, spec file authoring, and build automation
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Job Responsibility
Job Responsibility
  • Lead kernel integration and validation for new silicon platforms, from early board bring‑up through full feature enablement
  • Architect and maintain the Maintenance OS (MOS) kernel, ensuring long‑term stability, security, and compatibility across multiple hardware generations
  • Own the end‑to‑end lifecycle of GPU drivers (NVIDIA, amdgpu, ROCm), including:Integration of out‑of‑tree (OOT) kernel drivers DKMS packaging, build, and version‑tracking, Compatibility validation against kernel and firmware baselines
  • Define and manage build and release pipelines for kernel RPMs, driver SRPMs, and signing workflows
  • Collaborate with hardware, platform, and firmware teams to validate kernel features tied to new silicon capabilities (PCIe, CXL, IOMMU, NUMA, etc.)
  • Own spec files, RPM packaging, and associated CI/CD automation for kernel and driver deliverables
  • Conduct deep‑dive debugging across the full stack — from kernel to device firmware — to resolve performance, stability, or bring‑up issues
  • Drive engagement with upstream Linux communities to upstream or align kernel changes where feasible
  • Fulltime
Read More
Arrow Right