CrawlJobs Logo

Senior ML Accelerator Engineer - GPU

gm.com Logo

General Motors

Location Icon

Location:
United States , Austin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

128700.00 - 261300.00 USD / Year

Job Description:

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congestion guides everything we do in autonomous and assisted driving. The AV organization is building advanced automated driving technologies, including Level 4–capable fully self-driving systems, to move us toward safer, more sustainable, and more accessible mobility. For the AI Kernels & Compilers team, that mission shows up in the details: turning cutting‑edge perception, prediction, and planning research into production‑grade software that can run efficiently and reliably on real vehicles at scale. We pioneer new approaches to model export, kernel development, and performance engineering so that every cycle on our accelerators translates into better situational awareness, faster reaction times, and more robust behavior on the road. If you want your compiler and kernels work to directly influence how automated vehicles understand and react to the world — while operating at the safety, reliability and scale of a company like GM — this is where that impact becomes real. About the Team: The AI Kernels team builds high‑performance GPU kernels and custom libraries that sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own making core AI workloads faster, more reliable, and easier to maintain and deploy on real cars, under real‑world constraints. That means: Designing and implementing custom operators when vendor libraries hit their limits; Integrating those kernels deep into our ML runtime stack; Debugging and tuning GPU performance across the AV software stack, often on hardware‑in‑the‑loop. We partner closely with AI Solutions, AI Compilers, AI Architecture, and AI Tooling to ensure models deploy efficiently to the car while consistently meeting strict latency, throughput, and reliability targets. If you enjoy pushing GPUs to their limits and seeing your work directly impact how autonomous vehicles perceive and act in the world, this is the team for you.

Job Responsibility:

  • Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads
  • Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack
  • Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans
  • Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production
  • Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review
  • Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs

Requirements:

  • Minimum 3+ years of relevant industry experience or equivalent experience
  • BS, MS or PhD in CS, or related technical field
  • Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture
  • Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar
  • Strong background in software architecture, library design, and design patterns
  • Strong C++ programming skills with the ability to feel comfortable in large codebases
  • Solid background in system performance, high performance computing and/or architecture-aware optimizations
  • Strong communication skills and the ability to work collaboratively within a team
  • Excellent analytical and problem-solving skills

Nice to have:

  • Experience with tensor core programming, CUTLASS and/or CuTe
  • Experience with ML model architectures, in particular transformer-based
  • Experience with low latency or real time systems
  • Experience with lower levels of an accelerator software stack (i.e. drivers, runtimes, and compilers)
What we offer:
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts

Additional Information:

Job Posted:
April 05, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior ML Accelerator Engineer - GPU

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

Senior Software Development Engineer in Test (SDET) - AI Cluster Networking and Security

In AI infrastructure organization, simplifying large hardware deployments with p...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in engineering in computer science, electrical, AI, data science of related field
  • 10+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
  • Experience working in large enterprise or cloud networking infrastructure, high speed switches, routers, firewalls
  • Experience in qualifying networking vendor platforms like Juniper, Arista or Cisco and network test equipment like Ixia/Spirent
  • Experience in Datacenter technology like BGP, ECN, PFC
  • Experience testing networking security, compliance and firewalls
  • Strong coding skills in one of the programming languages like python, golang or C/C++
  • Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like gdb, strace, networking monitors
  • Strong understanding of operating systems internals like memory management, file system working, security basics and performance
  • Strong understanding of datacenter layout, device performance characteristics like PCIe, networking and storage
Job Responsibility
Job Responsibility
  • Innovate and execute tests on cutting edge AI infrastructure
  • Define optimized test strategies and methodologies
  • Be a quick learner, adapt to new technologies
  • Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
  • Automate first approach - Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
  • Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
  • Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect
  • Qualify cluster networking solutions which consists of high-speed switches, routers and optics from various vendors
  • Qualify cluster security features including OS security, network security, cloud compliance user access and security certifications
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Senior Software Engineer(Ads)

Online Advertising is one of the fastest‑growing businesses on the Internet. Mic...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years' experience building high‑performance, large‑scale distributed systems or ML infrastructure
  • Expert‑level proficiency in C++, with strong understanding of data structures, algorithms, and system design
  • Experience building and optimizing performance‑critical production systems
  • Experience working in Ads, Search, Recommendation systems, or other large‑scale online serving systems
Job Responsibility
Job Responsibility
  • Design and build a unified inference platform for Ads, ensuring scalability, reliability, and efficiency
  • Optimize model inference via batching, quantization, scheduling, memory management, runtime optimization, and other performance improvements
  • Develop, optimize, and maintain performance‑critical components for high‑throughput, low‑latency production inference, including GPU‑accelerated paths when applicable
  • Collaborate with algorithm/model teams to co‑design serving‑aware model architectures and optimizations
  • Profile and improve end‑to‑end system performance: concurrency, memory footprint, throughput, and latency
  • Provide senior technical leadership across teams
  • elevate engineering best practices and influence long‑term technical strategy
  • Fulltime
Read More
Arrow Right

Senior ML Compiler Engineer

About the Mission: GM’s vision of Zero Crashes, Zero Emissions, and Zero Congest...
Location
Location
United States , Austin
Salary
Salary:
128700.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in the field of compilers
  • Experience with ML frameworks (e.g., PyTorch, TensorFlow, JAX) and software stack (e.g., ONNX, MLIR, XLA, TVM, TensorRT, etc)
  • Expertise in writing production quality Python/C++ code
  • Expertise in the software development life-cycle - coding, debugging, optimization, testing, integration
  • BS, or higher degree, in CS/CE/EE, or equivalent
Job Responsibility
Job Responsibility
  • Build and evolve the model compilation toolchain used to deploy large‑scale perception, prediction, and planning models to the AV
  • Architect new compiler passes and analysis that improve build times, memory footprint, and runtime latency while preserving—or intentionally trading off—fidelity under strict safety and reliability constraints
  • Collaborate closely with kernels, runtime, and hardware teams to co‑design interfaces, shape accelerator capabilities, and ensure the compiler exposes the right abstractions to unlock peak performance on each platform
  • Set standards and best practices for model export, validation, and debugging so that AV teams can iterate quickly with clear, reproducible performance and accuracy characteristics
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Senior AI/ML Validation Engineer

We are seeking an experienced and versatile professional with expertise in valid...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
  • Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
  • Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
  • Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
  • Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
  • ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
  • Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
  • Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
  • C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
  • Bash/PowerShell for environment setup, CI scripting, and reproducibility
Job Responsibility
Job Responsibility
  • Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
  • Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
  • Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
  • Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
  • Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
  • Create reproducible environments with containers/orchestration
  • instrument telemetry and observability for data-driven QA
  • Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
  • integrate intelligent diagnostics into pipelines
  • Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right

Senior Framework Engineer — Diffusion Inference

As a Framework Engineer for Diffusion Model Inference, you will design, build, a...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python and/or C++ engineering skills (debugging, profiling, testing, navigating complex codebases, clean abstractions)
  • Experience with ML frameworks—PyTorch strongly preferred, JAX/TF welcome—and familiarity with diffusion model execution
  • Proven ability to work in GPU-accelerated environments with intuition for performance, memory/compute tradeoffs, and profiling
  • Comfort with containers (Docker) and modern dev workflows (git, CI, build systems)
  • Strong cross-functional collaboration and clear technical communication skills
  • BSc, MSc, PhD, or equivalent experience in Computer Science, Electrical Engineering, or a related field
Job Responsibility
Job Responsibility
  • Develop and maintain a diffusion inference framework for image/video generation with clean APIs and strong compatibility with widely used diffusion ecosystems
  • Own scalable parallel inference features for DiT workloads—single-node and multi-node
  • Integrate optimized operator backends (attention, GEMM, quantized paths) by bridging Python/C++ layers and ensuring correctness and high performance
  • Ship production-grade packaging & releases including containers, versioned artifacts, dependency hygiene, and pip-installable distributions
  • Build continuous testing & benchmarking infrastructure
  • Collaborate across the GPU software stack and translate framework needs into actionable upstream improvements
  • Support strategic customers by mapping real-world inference constraints into framework features, reference configurations, and reproducible deployment recipes
  • Communicate clearly around technical tradeoffs, performance bottlenecks, and roadmap decisions
  • Fulltime
Read More
Arrow Right

Senior Principal Engineering Manager

Microsoft Research (MSR) is working to transform the future of artificial intell...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of people management experience leading software engineering teams, including managing principal engineers
  • Experience building or operating infrastructure for large-scale distributed systems, cloud platforms, or artificial intelligence (AI)/machine learning(ML) workloads
  • Track record of driving execution on complex, multi-workstream infrastructure projects with clear milestones and accountability
  • Technical fluency in one or more of: large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), or High-Performance Compute (HPC) environments
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch
  • Expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • A track record of strong cross-functional partnerships, including the ability to align on strategic direction, deliver joint accountabilities, and develop relationships with staff members with widely varied expertise
  • Experience scaling engineering teams through significant growth phases (hiring, onboarding, and integrating new engineers into a high-performing team)
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow the engineering team that builds MSR’s AI research infrastructure
  • Recruit and develop exceptional engineering talent, building a diverse team - including hiring, onboarding, career development, and performance management
  • Drive execution across the team by setting clear goals, tracking milestones, managing dependencies, and ensuring accountability for delivering complex infrastructure projects on time and at high quality
  • Lead team culture and process changes, cultivating an AI-first mentality that accelerates our progress through agentic coding, automation, and skills development
  • Provide technical vision and judgment on the team's architecture, strategy, and roadmap — spanning supercomputer GPU clusters, high performance networking, workload optimization, researcher tools, and agentic workflows — while empowering engineers to own deep technical details
  • Collaborate closely cross-discipline with engineers, program managers, and research and science teams to align priorities, resolve dependencies, and build better solutions together
  • Foster a team culture of operational excellence, continuous improvement, and high psychological safety where engineers are empowered to take ownership and innovate
  • Fulltime
Read More
Arrow Right