CrawlJobs Logo

Engineering Manager, GPU Kernel

wayve.ai Logo

Wayve

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As the Engineering Manager for the GPU Kernel team, you’ll lead the team responsible for writing custom kernels and libraries which enable our transformer-based driving models to run efficiently on embedded GPUs and accelerators. This team works closely with ML engineers, software engineers and researchers to deploy end-to-end AI for autonomous vehicles at scale. This is an exciting opportunity to lead in several high impact, early stage projects at Wayve with the ultimate goal of enabling product deployments onto millions of customer vehicles around the world.

Job Responsibility:

  • Lead a multi-disciplinary team of ML GPU kernel engineers to enable efficient ML deployments across millions of customer vehicles
  • Set key foundational strategy on deployment frameworks, compilers, toolchains and SoCs
  • Set clear objectives and priorities, and allocate resource efficiently
  • Have opportunities to develop new skills, especially within end-to-end ML and inference optimisation

Requirements:

  • Proven experience as an Engineering Manager delivering complex engineering projects
  • Experience developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT, MLIR, TVM, etc)
  • Experience optimising systems to meet strict utilisation and latency requirements
  • Excellent interpersonal and communication skills

Nice to have:

  • Experience with C++ and ML frameworks such as PyTorch
  • Experience with ML deployment pipelines
  • Experience with embedded SoCs used in automotive environments, e.g. Nvidia, Qualcomm, Renesas, etc

Additional Information:

Job Posted:
January 01, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager, GPU Kernel

Engineering Manager, Kernel Reliability

We're looking for a deeply technical, hands-on engineering leader for our on-fie...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering
  • 3+ years leading teams in SW/HW reliability, debug, diagnostic, failure analysis or related fields
  • Expertise in parallel and distributed programming (message passing, multicore, GPU, embedded, etc.)
  • Expertise in debug and diagnostic tool development or expert usage (debuggers, core dump handling, code sanitizers, etc.)
  • Experience debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.)
  • Deep understanding of computer architectures (instruction pipelining, multithreading, networking, etc.)
  • Strong background in monitoring and reliability engineering (incident response, post-mortem analysis, etc.)
  • Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products.
Job Responsibility
Job Responsibility
  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic
  • Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis
  • Collaborate with SW teams to improve the software stack, including Kernels, to improve on-field debugging and failure analysis
  • Work with the ASIC and HW architecture teams to codesign the next generation architectures with reliability and ease of debug in mind
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs.
Read More
Arrow Right

Senior Machine Learning Engineer

As a Machine Learning Engineer at Dedrone, you’ll play a pivotal role in advanci...
Location
Location
United States , Sterling
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in modern C++ (C++14/17 or later), with strong object-oriented and generic programming skills
  • Deep understanding of multithreading and concurrency (threads, thread pools, locks, lock-free structures, atomics, futures, async patterns) and experience building robust, concurrent systems
  • Hands-on experience with parallel processing frameworks or patterns (SIMD, task-based parallelism, GPU offload, or similar) for real-time or high-throughput applications
  • Strong command of data structures and algorithms, and the ability to choose and implement the right structures for performance-critical, memory-constrained environments
  • Proven experience with memory management and performance optimization in C++ (stack vs heap, custom allocators, cache-aware design, avoiding fragmentation, RAII, move semantics)
  • Practical experience with CUDA (or similar GPU programming frameworks): writing kernels, managing GPU memory, optimizing for occupancy and bandwidth, and integrating with C++ codebases
  • Familiarity with Linux-based development (build systems like CMake, unit testing frameworks, containerization and/or cross-compilation for edge devices)
  • Strong debugging and profiling skills across CPU and GPU, and a methodical approach to benchmarking and regression testing
  • Excellent collaboration and communication skills, with a track record of working closely with research or ML teams to move algorithms from prototype to production
Job Responsibility
Job Responsibility
  • Design and implement high-performance C++ software that runs computer vision and tracking algorithms in real time on edge devices
  • Work closely with computer vision / self-supervised learning engineers to integrate their models into production pipelines, including pre/post-processing, I/O, and system orchestration
  • Build and optimize multithreaded and parallel processing pipelines for ingesting, synchronizing, and processing data from a networked system of cameras
  • Implement and tune CUDA kernels and GPU-accelerated components to maximize throughput and minimize latency for inference, tracking, and search
  • Design robust data structures and memory management strategies for handling large volumes of video, sensor, and metadata streams under tight compute and power constraints
  • Profile and optimize code using tools such as perf, valgrind, nvprof / Nsight, and similar to identify bottlenecks and improve CPU/GPU utilization
  • Collaborate with simulation and CV teams to deploy and evaluate algorithms in realistic test scenarios, including fault handling and performance monitoring
  • Develop clean, well-tested, and well-documented C++ libraries and services that can be reused across products and future airspace applications
  • Contribute to system-level architecture decisions, including inter-process communication, scheduling, resource allocation, and deployment strategies on edge platforms
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

As a Senior Software Engineer, you will lead the design, development, and valida...
Location
Location
United States , Multiple Locations
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 2+ years experience in Kernel bring-up and platform enablement
  • 1+ years experience in GPU driver development and integration
  • 2+ years experience in C / C++ kernel-space programming, Git-based source management and release branching, RPM packaging, spec file authoring, and build automation
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Job Responsibility
Job Responsibility
  • Lead kernel integration and validation for new silicon platforms, from early board bring‑up through full feature enablement
  • Architect and maintain the Maintenance OS (MOS) kernel, ensuring long‑term stability, security, and compatibility across multiple hardware generations
  • Own the end‑to‑end lifecycle of GPU drivers (NVIDIA, amdgpu, ROCm), including:Integration of out‑of‑tree (OOT) kernel drivers DKMS packaging, build, and version‑tracking, Compatibility validation against kernel and firmware baselines
  • Define and manage build and release pipelines for kernel RPMs, driver SRPMs, and signing workflows
  • Collaborate with hardware, platform, and firmware teams to validate kernel features tied to new silicon capabilities (PCIe, CXL, IOMMU, NUMA, etc.)
  • Own spec files, RPM packaging, and associated CI/CD automation for kernel and driver deliverables
  • Conduct deep‑dive debugging across the full stack — from kernel to device firmware — to resolve performance, stability, or bring‑up issues
  • Drive engagement with upstream Linux communities to upstream or align kernel changes where feasible
  • Fulltime
Read More
Arrow Right

Senior GPU Engineer

We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team....
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
  • Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
  • Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
  • Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
  • Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
  • Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
  • Mastery of NVIDIA Nsight Systems/Compute
  • Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput
Job Responsibility
Job Responsibility
  • Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
  • Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
  • Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
  • Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
  • Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
  • Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)
  • Fulltime
Read More
Arrow Right
New

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

Senior AI/ML Validation Engineer

We are seeking an experienced and versatile professional with expertise in valid...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
  • Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
  • Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
  • Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
  • Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
  • ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
  • Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
  • Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
  • C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
  • Bash/PowerShell for environment setup, CI scripting, and reproducibility
Job Responsibility
Job Responsibility
  • Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
  • Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
  • Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
  • Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
  • Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
  • Create reproducible environments with containers/orchestration
  • instrument telemetry and observability for data-driven QA
  • Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
  • integrate intelligent diagnostics into pipelines
  • Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right

Senior Product Manager

We are hiring a foundational Product Manager to work directly with the CTO to de...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
SQream
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as a Product Manager or Solutions Architect in infrastructure, HPC, data systems, GPU/AI pipelines, or distributed systems
  • Strong outbound / customer-facing skills: presenting to CTOs, architects, OEM teams, GSIs, and technical buyers
  • Ability to operate at kernel-level conceptual depth and translate physics into product strategy
  • Exceptional communication skills - written and verbal - with the ability to simplify complex GPU and dataflow concepts
  • Demonstrated ability to drive roadmap execution with engineering while also leading external discovery and evangelism
  • Comfort owning both internal product discipline and external technical influence
Job Responsibility
Job Responsibility
  • Product Ownership (Internal): Work directly with the R&D to shape the GPU-native roadmap for ingestion, vectorization, transformation, curation, and continuous production flow
  • Define precise specifications, APIs, pipeline behavior, and physics-aligned constraints
  • Ensure product features adhere to SCAILIUM’s rigid boundaries: No orchestration. No system of record. No serving. No dashboards
  • Enforce documentation rigor. Documentation is code
  • Technical Outbound Leadership (External): Serve as a public-facing authority on GPU starvation, impedance incompatibility, and the AI Production Layer
  • Lead technical sessions with Partners, OEMs (Dell, Supermicro, HPE), GSIs (Accenture, Deloitte), and strategic enterprise customers
  • Conduct in-depth customer pipeline analyses to identify physical constraints and translate them into SCAILIUM features or patterns
  • Present SCAILIUM’s architecture in a clear, authoritative, physics-grounded manner
  • Support sales, partnerships, and field engineering by communicating the “why” behind every product decision
  • Build artifacts that shape the category: reference architectures, workload blueprints, TCO models, and silicon saturation narratives
Read More
Arrow Right

Autonomy Engineer - Deep Learning Infrastructure

Skydio is the leading US drone company and the world leader in autonomous flight...
Location
Location
United States , San Mateo
Salary
Salary:
170000.00 - 236500.00 USD / Year
skydio.com Logo
Skydio
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Demonstrated hands-on experience with MLOps, ML inference optimization and edge deployment
  • Strong knowledge of DL fundamentals, techniques and state-of-the-art DL models/architectures
  • Strong fundamentals in CV, image processing and video processing
  • Demonstrated hands-on experience building and managing ML pipelines for solving vision or vision language tasks including data preparation, model training, model deployment and monitoring
  • Experience and understanding of security and compliance requirements in ML infrastructure
  • Experience with ML frameworks and libraries
  • Demonstrated ability to take a concept and systematically drive it through the software lifecycle: architecture, development, testing, and deployment, and monitoring
  • Comfortable navigating and delivering within a complex codebase
  • Strong communication skills and the ability to collaborate effectively at all levels of technical depth
Job Responsibility
Job Responsibility
  • Develop solutions for high-performance deep learning inference for CV workloads that can deliver high throughput and low latency on different hardware platforms
  • Profile CV and Vision Language Models (VLMs) to analyze performance, identify bottlenecks and optimization opportunities and improve power efficiency of deep learning inference workloads
  • Design and implement end to end MLOps workflows for model deployment, monitoring and re-training
  • Utilize advanced Machine Learning knowledge to leverage training or runtime frameworks or model efficiency tools to improve system performance
  • Create new methods for improving training efficiency
  • Implement GPU kernels for custom architectures and optimized inference
  • Design and implement SDKs that allow customers/external developers to create autonomous workflows using ML
  • Leverage your expertise and best-practices to uphold and improve Skydio’s engineering standards
What we offer
What we offer
  • Equity in the form of stock options
  • Comprehensive benefits packages
  • Relocation assistance may also be provided for eligible roles
  • Paid vacation time
  • Sick leave
  • Holiday pay
  • 401K savings plan
  • Fulltime
Read More
Arrow Right