CrawlJobs Logo

Senior ML Accelerator Engineer - GPU

gm.com Logo

General Motors

Location Icon

Location:
United States , Austin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

128700.00 - 261300.00 USD / Year

Job Description:

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congestion guides everything we do in autonomous and assisted driving. The AV organization is building advanced automated driving technologies, including Level 4–capable fully self-driving systems, to move us toward safer, more sustainable, and more accessible mobility. For the AI Kernels & Compilers team, that mission shows up in the details: turning cutting‑edge perception, prediction, and planning research into production‑grade software that can run efficiently and reliably on real vehicles at scale. We pioneer new approaches to model export, kernel development, and performance engineering so that every cycle on our accelerators translates into better situational awareness, faster reaction times, and more robust behavior on the road. If you want your compiler and kernels work to directly influence how automated vehicles understand and react to the world — while operating at the safety, reliability and scale of a company like GM — this is where that impact becomes real. About the Team: The AI Kernels team builds high‑performance GPU kernels and custom libraries that sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own making core AI workloads faster, more reliable, and easier to maintain and deploy on real cars, under real‑world constraints. That means: Designing and implementing custom operators when vendor libraries hit their limits; Integrating those kernels deep into our ML runtime stack; Debugging and tuning GPU performance across the AV software stack, often on hardware‑in‑the‑loop. We partner closely with AI Solutions, AI Compilers, AI Architecture, and AI Tooling to ensure models deploy efficiently to the car while consistently meeting strict latency, throughput, and reliability targets. If you enjoy pushing GPUs to their limits and seeing your work directly impact how autonomous vehicles perceive and act in the world, this is the team for you.

Job Responsibility:

  • Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads
  • Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack
  • Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans
  • Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production
  • Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review
  • Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs

Requirements:

  • Minimum 3+ years of relevant industry experience or equivalent experience
  • BS, MS or PhD in CS, or related technical field
  • Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture
  • Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar
  • Strong background in software architecture, library design, and design patterns
  • Strong C++ programming skills with the ability to feel comfortable in large codebases
  • Solid background in system performance, high performance computing and/or architecture-aware optimizations
  • Strong communication skills and the ability to work collaboratively within a team
  • Excellent analytical and problem-solving skills

Nice to have:

  • Experience with tensor core programming, CUTLASS and/or CuTe
  • Experience with ML model architectures, in particular transformer-based
  • Experience with low latency or real time systems
  • Experience with lower levels of an accelerator software stack (i.e. drivers, runtimes, and compilers)
What we offer:
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts

Additional Information:

Job Posted:
April 05, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior ML Accelerator Engineer - GPU

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

Senior Software Development Engineer in Test (SDET) - AI Cluster Networking and Security

In AI infrastructure organization, simplifying large hardware deployments with p...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in engineering in computer science, electrical, AI, data science of related field
  • 10+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
  • Experience working in large enterprise or cloud networking infrastructure, high speed switches, routers, firewalls
  • Experience in qualifying networking vendor platforms like Juniper, Arista or Cisco and network test equipment like Ixia/Spirent
  • Experience in Datacenter technology like BGP, ECN, PFC
  • Experience testing networking security, compliance and firewalls
  • Strong coding skills in one of the programming languages like python, golang or C/C++
  • Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like gdb, strace, networking monitors
  • Strong understanding of operating systems internals like memory management, file system working, security basics and performance
  • Strong understanding of datacenter layout, device performance characteristics like PCIe, networking and storage
Job Responsibility
Job Responsibility
  • Innovate and execute tests on cutting edge AI infrastructure
  • Define optimized test strategies and methodologies
  • Be a quick learner, adapt to new technologies
  • Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
  • Automate first approach - Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
  • Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
  • Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect
  • Qualify cluster networking solutions which consists of high-speed switches, routers and optics from various vendors
  • Qualify cluster security features including OS security, network security, cloud compliance user access and security certifications
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Senior ML Compiler Engineer

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congest...
Location
Location
United States , Austin
Salary
Salary:
128700.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in the field of compilers
  • Experience with ML frameworks (e.g., PyTorch, TensorFlow, JAX) and software stack (e.g., ONNX, MLIR, XLA, TVM, TensorRT, etc)
  • Expertise in writing production quality Python/C++ code
  • Expertise in the software development life-cycle - coding, debugging, optimization, testing, integration
  • BS, or higher degree, in CS/CE/EE, or equivalent
Job Responsibility
Job Responsibility
  • Build and evolve the model compilation toolchain used to deploy large‑scale perception, prediction, and planning models to the AV
  • Architect new compiler passes and analysis that improve build times, memory footprint, and runtime latency while preserving—or intentionally trading off—fidelity under strict safety and reliability constraints
  • Collaborate closely with kernels, runtime, and hardware teams to co‑design interfaces, shape accelerator capabilities, and ensure the compiler exposes the right abstractions to unlock peak performance on each platform
  • Set standards and best practices for model export, validation, and debugging so that AV teams can iterate quickly with clear, reproducible performance and accuracy characteristics
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Senior AI/ML Validation Engineer

We are seeking an experienced and versatile professional with expertise in valid...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
  • Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
  • Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
  • Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
  • Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
  • ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
  • Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
  • Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
  • C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
  • Bash/PowerShell for environment setup, CI scripting, and reproducibility
Job Responsibility
Job Responsibility
  • Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
  • Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
  • Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
  • Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
  • Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
  • Create reproducible environments with containers/orchestration
  • instrument telemetry and observability for data-driven QA
  • Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
  • integrate intelligent diagnostics into pipelines
  • Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right
New

Senior Framework Engineer — Diffusion Inference

As a Framework Engineer for Diffusion Model Inference, you will design, build, a...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python and/or C++ engineering skills (debugging, profiling, testing, navigating complex codebases, clean abstractions)
  • Experience with ML frameworks—PyTorch strongly preferred, JAX/TF welcome—and familiarity with diffusion model execution
  • Proven ability to work in GPU-accelerated environments with intuition for performance, memory/compute tradeoffs, and profiling
  • Comfort with containers (Docker) and modern dev workflows (git, CI, build systems)
  • Strong cross-functional collaboration and clear technical communication skills
  • BSc, MSc, PhD, or equivalent experience in Computer Science, Electrical Engineering, or a related field
Job Responsibility
Job Responsibility
  • Develop and maintain a diffusion inference framework for image/video generation with clean APIs and strong compatibility with widely used diffusion ecosystems
  • Own scalable parallel inference features for DiT workloads—single-node and multi-node
  • Integrate optimized operator backends (attention, GEMM, quantized paths) by bridging Python/C++ layers and ensuring correctness and high performance
  • Ship production-grade packaging & releases including containers, versioned artifacts, dependency hygiene, and pip-installable distributions
  • Build continuous testing & benchmarking infrastructure
  • Collaborate across the GPU software stack and translate framework needs into actionable upstream improvements
  • Support strategic customers by mapping real-world inference constraints into framework features, reference configurations, and reproducible deployment recipes
  • Communicate clearly around technical tradeoffs, performance bottlenecks, and roadmap decisions
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Online Advertising is one of the fastest‑growing businesses on the Internet. Mic...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 6+ years' experience building high‑performance, large‑scale distributed systems or ML infrastructure
  • Experience building and optimizing performance‑critical production systems
  • Experience working in Ads, Search, Recommendation systems, or other large‑scale online serving systems
Job Responsibility
Job Responsibility
  • Design and build a unified inference platform for Ads, ensuring scalability, reliability, and efficiency
  • Optimize model inference via batching, quantization, scheduling, memory management, runtime optimization, and other performance improvements
  • Develop, optimize, and maintain performance‑critical components for high‑throughput, low‑latency production inference, including GPU‑accelerated paths when applicable
  • Collaborate with algorithm/model teams to co‑design serving‑aware model architectures and optimizations
  • Profile and improve end‑to‑end system performance: concurrency, memory footprint, throughput, and latency
  • Provide senior technical leadership across teams
  • elevate engineering best practices and influence long‑term technical strategy
  • Fulltime
Read More
Arrow Right