CrawlJobs Logo

Engineering Manager, GPU Kernel

United Kingdom, London · Job Posted January 01, 2026
Apply Position
Job Link Share

Job Description

As the Engineering Manager for the GPU Kernel team, you’ll lead the team responsible for writing custom kernels and libraries which enable our transformer-based driving models to run efficiently on embedded GPUs and accelerators. This team works closely with ML engineers, software engineers and researchers to deploy end-to-end AI for autonomous vehicles at scale. This is an exciting opportunity to lead in several high impact, early stage projects at Wayve with the ultimate goal of enabling product deployments onto millions of customer vehicles around the world.

Job Responsibility

  • Lead a multi-disciplinary team of ML GPU kernel engineers to enable efficient ML deployments across millions of customer vehicles
  • Set key foundational strategy on deployment frameworks, compilers, toolchains and SoCs
  • Set clear objectives and priorities, and allocate resource efficiently
  • Have opportunities to develop new skills, especially within end-to-end ML and inference optimisation

Requirements

  • Proven experience as an Engineering Manager delivering complex engineering projects
  • Experience developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT, MLIR, TVM, etc)
  • Experience optimising systems to meet strict utilisation and latency requirements
  • Excellent interpersonal and communication skills

Nice to have

  • Experience with C++ and ML frameworks such as PyTorch
  • Experience with ML deployment pipelines
  • Experience with embedded SoCs used in automotive environments, e.g. Nvidia, Qualcomm, Renesas, etc

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Engineering Manager, GPU Kernel

8 matching positions

Engineering Manager, Kernel Reliability

We're looking for a deeply technical, hands-on engineering leader for our on-fie...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering
  • 3+ years leading teams in SW/HW reliability, debug, diagnostic, failure analysis or related fields
  • Expertise in parallel and distributed programming (message passing, multicore, GPU, embedded, etc.)
  • Expertise in debug and diagnostic tool development or expert usage (debuggers, core dump handling, code sanitizers, etc.)
  • Experience debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.)
  • Deep understanding of computer architectures (instruction pipelining, multithreading, networking, etc.)
  • Strong background in monitoring and reliability engineering (incident response, post-mortem analysis, etc.)
  • Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products.
Job Responsibility
Job Responsibility
  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic
  • Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis
  • Collaborate with SW teams to improve the software stack, including Kernels, to improve on-field debugging and failure analysis
  • Work with the ASIC and HW architecture teams to codesign the next generation architectures with reliability and ease of debug in mind
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs.
Read More
Arrow Right
New

Senior Engineering Manager, Data Plane Systems

Crusoe is seeking a Senior Engineering Manager, Data Plane Systems to lead the t...
Location
Location
United States , San Francisco
Salary
Salary:
237600.00 - 288000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 10+ years in high-performance networking or systems engineering, with 5-7+ years specifically managing senior/staff-level talent
  • Deep knowledge of Linux networking internals, kernel architecture, and experience with XDP/eBPF, AF_XDP, and DPDK
  • Hands-on experience with DPU integration and migrating networking functions to hardware accelerators
  • A strong understanding of low-latency networking, performance tuning, and benchmarking
  • The ability to resolve complex technical challenges in a fast-moving, execution-heavy environment
Job Responsibility
Job Responsibility
  • Define the roadmap for SDN data plane systems and lead the integration of DPUs (such as NVIDIA BlueField) and hardware accelerators
  • Oversee the development of Linux kernel networking components, XDP/eBPF data paths, and DPDK-based fast paths while driving the migration of networking functions to hardware offload architectures
  • Lead performance benchmarking, regression prevention, and incident response, ensuring operational excellence within 3-6 month execution cycles
  • Mentor and grow a team of senior and staff-level systems engineers, setting technical standards and fostering a high-performance culture of accountability
  • Partner closely with control-plane teams (OVN/OVS) to optimize throughput and latency for multi-tenant GPU clusters
What we offer
What we offer
  • Competitive compensation
  • Restricted Stock Units
  • Paid time off & paid holidays
  • Comprehensive health, dental & vision insurance
  • Employer contributions to HSA account
  • Paid parental leave
  • Paid life insurance, short-term and long-term disability
  • Professional development & tuition reimbursement
  • Mental health & wellness support
  • Commuter benefits (parking & transit)
  • Fulltime
Read More
Arrow Right

Slt Product Development Engineering Manager

We are a results-driven Central Engineering team delivering System Level Test (S...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Electrical and Electronic Engineering/Computer Engineering
  • Master's degree in Electrical and Electronic Engineering/Computer Engineering
  • Experience with CPU, GPU, system architectures
  • Strong people, communication & stakeholder management skills
  • Test & characterization, Programming & automation/AI assisted skills & experience preferred
  • Hand on experience in a team management role
  • Technical and specialized knowledge in Test, Characterization, Platform Engineering areas such as OS kernel, Driver, BIOS firmware development, Diagnostics or System Debug with semiconductor industry experience in product development for computing/graphics SOCs
  • Demonstrated object-oriented programming experience in scripting and/or programming languages such as python and java
Job Responsibility
Job Responsibility
  • Lead efforts in developing System Level Test (SLT) solutions within the stipulated cost, quality, yield and organizational strategic constraints
  • Collaborate with Design, Validation, Platform Engineering, Diagnostic and Tools teams to determine SLT coverage, content, characterization requirements and root cause resolution for SLT device failures
  • Lead efforts in the New Product Introduction phase to validate new product features, test conditions, Test methodologies and Test content
  • Drive quality & margin improvement in the Sustaining phase
  • Lead bounding box performance characterization efforts and innovation
  • Product health and cost target attainment
  • Drive yield and quality improvement activities through data analysis, debug, root-cause and implementation into production
  • Drive unit-cost reductions through Test-Time reduction and content optimization methodologies
  • Lead efforts in defining and developing SLT Equipment, software Infrastructure and test content within required Product timelines and quality goals
  • Manage resources for agility, focus and execution efficiency to deliver outcomes/results aligned to organization goals
  • Fulltime
Read More
Arrow Right

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right
New

Senior Software Systems Designer

We are looking for a highly skilled Senior System Software Designer to design an...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with performance profiling tools (e.g., AMD uProf, perf, VTune, rocProfiler)
  • Strong understanding of microarchitecture concepts (pipelines, caches, branch prediction, memory hierarchy)
  • Experience working with hardware performance counters (PMC), IBS, or similar sampling techniques
  • Familiarity with OS internals (Linux kernel, schedulers, memory management, tracing frameworks)
  • Experience with distributed/HPC workloads (MPI, OpenMP, large-scale systems)
  • Exposure to trace analysis, call stacks, and sampling-based profiling models
  • Knowledge of container environments and system-level debugging is a plus
  • Experience contributing to cross-platform tools and frameworks
  • Bachelors or master's degree in electrical or computer engineering.
Job Responsibility
Job Responsibility
  • Design and develop system-level profiling tools spanning CPU, memory, IO, and power analysis
  • Build and optimize data collection frameworks leveraging hardware counters (PMC), IBS, and OS tracing
  • Develop low-overhead profiling infrastructure for large-scale and long-running workloads
  • Enhance performance analysis pipelines including data processing, correlation, and visualization
  • Enable cross-platform profiling support across Linux, Windows, and emerging OS ecosystems (e.g., FreeBSD)
  • Work on advanced analysis techniques such as top-down microarchitecture analysis, pipeline utilization, and bottleneck detection
  • Contribute to CLI and GUI-based tools for performance debugging and visualization
  • Integrate support for runtime and framework-level tracing (OpenMP, MPI, Java, Python, etc.)
  • Collaborate with CPU, GPU, kernel, and compiler teams to enable new hardware features in profiling tools
  • Drive automation and intelligent analysis, including AI/ML-assisted performance insights
  • Fulltime
Read More
Arrow Right
New

Baseband AI technology-Co-op

Baseband Technology under Ericsson AI and technology foresight department leads ...
Location
Location
Canada , Ottawa
Salary
Salary:
Not provided
ericsson.com Logo
Ericsson
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Competence and experience in C/C++ and Python
  • Quick learner, self-driven person with open mind is a must
  • MSc degree in Electrical engineering or Computer Science or equivalent
  • Excellent written and verbal communication, interpersonal, time management, and multitasking skills
  • Willingness to work with an international, agile and multi-site teams both collaboratively and autonomously
  • An enthusiastic demeanor, eager to continue growing and learning
  • Innovating, adapting, and responding to change
  • A strong can-do attitude
  • Any relevant experience for wireless mobile network, understanding of 3GPP NR and ORAN standard protocol
  • Experience with ML frameworks (TensorFlow, PyTorch, JAX) and cloud platforms, especially AWS (SageMaker, Lambda, Step Functions, S3, etc.)
Job Responsibility
Job Responsibility
  • Being active and learning and practice agent AI workflow in the daily operation
  • Be actively contribution and support Cloud infrastructure CICD pipeline, such as AWS and on-prem
  • As your interests drive you, you are also very welcomed to be part of our research team on the technology scotting on new algorithm, in particular focus on Neural Network and other deep learning algorithm to enhance implementation of NR/6G baseband functions running on the Ericsson purpose build RAN compute as well as COTS (Commercial off the shelf) General compute platform, i.e x86/ARM with various accelerator, such as GPU programming
Read More
Arrow Right
New

Sr Staff Engineer - Core Infrastructure

We are seeking a Senior Staff Engineer (L6) to lead the technical strategy and e...
Location
Location
United States , New York; Seattle; San Francisco; Sunnyvale
Salary
Salary:
267000.00 - 297000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of software engineering experience, with a focus on massive-scale distributed systems or infrastructure
  • Proven Track Record at Scale: Experience managing infrastructure that supports millions of concurrent users or petabyte-scale data processing
  • Deep Systems Expertise: Mastery of Kubernetes internals, container runtimes, and the Linux kernel, with the ability to debug impossible performance bottlenecks
  • Cloud-Native Fluency: Deep experience with cloud-native networking (Envoy, CNI, Service Mesh) and multi-cloud (AWS/GCP) architecture
  • Coding Proficiency: Expert-level proficiency in Go, Java, or C++
  • Leadership: Demonstrated ability to lead 40+ person technical initiatives and influence VPs and GMs on infrastructure investment
Job Responsibility
Job Responsibility
  • Architect Strategic Efficiency: Own the technical vision to drive fleet-wide CPU utilization and unit-cost optimization through ARM adoption (targeting XM+ cores) and silicon diversity
  • Scale AI & ML Infrastructure: Define the architecture for shared GPU pools and high-performance clusters to support 300x larger ranking models and Autonomous Vehicle data ingestion
  • Modernize the Data Plane: Drive the convergence of Uber’s networking stack toward industry standards (Kubernetes, Envoy, CNI) while enhancing SkyEdge for active-active multi-cloud resilience
  • Enforce Foundations & Reliability: Lead the 100% Done-Done initiative, ensuring every service follows standardized safe-deployment (Starship) and reaches 100% zero-trust authorization
  • Agentic Augmentation: Integrate AI-driven Minions and AIOps into the infrastructure to automate 80% of alerts and unlock thousands of years of developer productivity
  • Cross-Org Influence: Partner with Delivery, Rides, and AV teams to ensure the infrastructure isn't just a container, but a competitive advantage that accelerates their time-to-market
  • Mentor Staff+ Engineers: Act as a force multiplier by coaching the next generation of technical leaders and influencing company-wide engineering standards
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible to participate in a 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Virtual Software Modeling Engineer

Bring leading-edge SoCs to life by building and evolving the infrastructure that...
Location
Location
United Kingdom , Cambridge
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High-performance systems and application development in C/C++ on Windows and/or Linux
  • Hardware system architecture and subsystem interface protocols
  • x86, ARM, or GPU architecture, drivers, and applications
  • Linux and/or Windows kernel debugging
  • Functional modeling, architecture simulation, or hypervisor development
  • Experience with tools such as QEMU, VirtualBox, or SIMICS
  • Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field preferred
Job Responsibility
Job Responsibility
  • Evolve the simulator’s core infrastructure, with a focus on scalability, maintainability, and developer productivity
  • Maintain and improve dependency management and build systems to increase reliability, reproducibility, and performance
  • Develop and enhance tooling for packaging, deployment, and consumption across multiple environments
  • Modernize the simulator codebase using current C++ standards and best practices to improve readability, structure, and long-term sustainability
  • Design and implement infrastructure to support simulation as a cloud-hosted service
  • Build infrastructure for distributed, multi-host simulation, including coordination, synchronization, and performance optimization
  • Create tools and frameworks to debug multi-threaded simulation execution effectively
  • Define processes and infrastructure to simplify integration, validation, and long-term maintenance of third-party and external models
  • Collaborate with model developers to ensure infrastructure evolves with modeling needs without tightly coupling to specific implementations
  • Improve simulator stability, observability, and debuggability through enhanced logging, diagnostics, and tooling
  • Fulltime
Read More
Arrow Right