CrawlJobs Logo

AI Software Product Engineer (GPU Kernel)

amd.com Logo

AMD

Location Icon

Location:
China , Shanghai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

AI Product Applications Engineer (Solution Architect) – China position is in the AMD AI group, located in China. Success in this role will require deep knowledge of Data Center, Client, Endpoint AI workloads such as LLM, Generative AI, Recommendation, and/or transformer … AI cross cloud, client, edge… the candidate needs to have hands-on experiences with various AI models, end-to-end pipeline, industry framework (pytrouch, vLLM, SGLang, llm-d,Triton) / SDKs and solutions.

Job Responsibility:

  • Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
  • Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
  • Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
  • Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
  • Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
  • Provide feedback and requirements for AI software across cloud, client, and edge deployments

Requirements:

  • Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
  • Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
  • Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
  • Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
  • BS required
  • MS preferred, with 6+ years of relevant industry experience

Additional Information:

Job Posted:
March 21, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Software Product Engineer (GPU Kernel)

New

Senior Software Engineer- AI

Are you looking for an opportunity to work with the latest Azure offerings and p...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in Software Development
  • Strong programming expertise in one or more languages such as Python, Go, Java, or C#, with experience designing production-grade services and APIs
  • Experience building AI-powered applications, including integrating LLMs, implementing agent or Copilot workflows, and orchestrating multi-step AI interactions
  • Hands-on experience with LLM application frameworks and orchestration tools such as Semantic Kernel, LangChain, or similar agent frameworks
  • Familiarity with retrieval-augmented generation (RAG) architectures, vector databases, embeddings, and semantic search systems
  • Experience evaluating and improving model performance through prompt design, evaluation frameworks, fine-tuning, or feedback loops
  • Solid understanding of distributed systems concepts including scalability, reliability, observability, caching, and asynchronous processing
  • Experience deploying and operating AI workloads in cloud environments (preferably Azure), including containerized services and GPU-enabled infrastructure
  • Understanding of Responsible AI practices, including model governance, safety, privacy, and evaluation of AI behaviour in production systems
  • Ability to work across product, research, and engineering teams to translate product scenarios into scalable AI system architectures
Job Responsibility
Job Responsibility
  • Design, build, and operate scalable AI systems that power intelligent product experiences, including Copilot and agent-driven workflows
  • Architect and implement backend services that support multi-step AI interactions, including orchestration pipelines, context management, memory/state persistence, and tool execution
  • Integrate large language models (LLMs), APIs, and internal services to enable context-aware, human-in-the-loop experiences across customer scenarios
  • Build and maintain data and inference pipelines that support model training, fine-tuning, evaluation, and real-time inference across diverse data sources
  • Evaluate, benchmark, and tune AI/ML models (LLMs and traditional models) to meet product requirements for accuracy, latency, reliability, and safety
  • Implement robust retrieval, grounding, and knowledge integration mechanisms (e.g., RAG systems, semantic indexing, vector search) to power intelligent applications
  • Collaborate with product managers, software engineers, and researchers to translate product vision into production-ready AI capabilities and measurable outcomes
  • Ensure reliability, observability, and governance of AI systems, including monitoring model performance, data quality, and responsible AI practices
  • Build reusable platforms, APIs, and tools that enable teams to rapidly develop AI-powered features and self-service intelligent applications
  • Fulltime
Read More
Arrow Right
New

Principal Software Engineer

Are you looking for an opportunity to work with the latest Azure offerings and p...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–12+ years of experience in software engineering, with significant experience building scalable backend or distributed systems
  • Strong programming expertise in one or more languages such as Python, Go, Java, or C#, with experience designing production-grade services and APIs
  • Experience building AI-powered applications, including integrating LLMs, implementing agent or Copilot workflows, and orchestrating multi-step AI interactions
  • Hands-on experience with LLM application frameworks and orchestration tools such as Semantic Kernel, LangChain, or similar agent frameworks
  • Familiarity with retrieval-augmented generation (RAG) architectures, vector databases, embeddings, and semantic search systems
  • Experience evaluating and improving model performance through prompt design, evaluation frameworks, fine-tuning, or feedback loops
  • Solid understanding of distributed systems concepts including scalability, reliability, observability, caching, and asynchronous processing
  • Experience deploying and operating AI workloads in cloud environments (preferably Azure), including containerized services and GPU-enabled infrastructure
  • Understanding of Responsible AI practices, including model governance, safety, privacy, and evaluation of AI behaviour in production systems
  • Ability to work across product, research, and engineering teams to translate product scenarios into scalable AI system architectures
Job Responsibility
Job Responsibility
  • Design, build, and operate scalable AI systems that power intelligent product experiences, including Copilot and agent-driven workflows
  • Architect and implement backend services that support multi-step AI interactions, including orchestration pipelines, context management, memory/state persistence, and tool execution
  • Integrate large language models (LLMs), APIs, and internal services to enable context-aware, human-in-the-loop experiences across customer scenarios
  • Build and maintain data and inference pipelines that support model training, fine-tuning, evaluation, and real-time inference across diverse data sources
  • Evaluate, benchmark, and tune AI/ML models (LLMs and traditional models) to meet product requirements for accuracy, latency, reliability, and safety
  • Implement robust retrieval, grounding, and knowledge integration mechanisms (e.g., RAG systems, semantic indexing, vector search) to power intelligent applications
  • Collaborate with product managers, software engineers, and researchers to translate product vision into production-ready AI capabilities and measurable outcomes
  • Ensure reliability, observability, and governance of AI systems, including monitoring model performance, data quality, and responsible AI practices
  • Build reusable platforms, APIs, and tools that enable teams to rapidly develop AI-powered features and self-service intelligent applications
  • Fulltime
Read More
Arrow Right

Engineering Manager, Kernel Reliability

We're looking for a deeply technical, hands-on engineering leader for our on-fie...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering
  • 3+ years leading teams in SW/HW reliability, debug, diagnostic, failure analysis or related fields
  • Expertise in parallel and distributed programming (message passing, multicore, GPU, embedded, etc.)
  • Expertise in debug and diagnostic tool development or expert usage (debuggers, core dump handling, code sanitizers, etc.)
  • Experience debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.)
  • Deep understanding of computer architectures (instruction pipelining, multithreading, networking, etc.)
  • Strong background in monitoring and reliability engineering (incident response, post-mortem analysis, etc.)
  • Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products.
Job Responsibility
Job Responsibility
  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic
  • Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis
  • Collaborate with SW teams to improve the software stack, including Kernels, to improve on-field debugging and failure analysis
  • Work with the ASIC and HW architecture teams to codesign the next generation architectures with reliability and ease of debug in mind
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs.
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right

Senior Product Manager

We are hiring a foundational Product Manager to work directly with the CTO to de...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
SQream
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as a Product Manager or Solutions Architect in infrastructure, HPC, data systems, GPU/AI pipelines, or distributed systems
  • Strong outbound / customer-facing skills: presenting to CTOs, architects, OEM teams, GSIs, and technical buyers
  • Ability to operate at kernel-level conceptual depth and translate physics into product strategy
  • Exceptional communication skills - written and verbal - with the ability to simplify complex GPU and dataflow concepts
  • Demonstrated ability to drive roadmap execution with engineering while also leading external discovery and evangelism
  • Comfort owning both internal product discipline and external technical influence
Job Responsibility
Job Responsibility
  • Product Ownership (Internal): Work directly with the R&D to shape the GPU-native roadmap for ingestion, vectorization, transformation, curation, and continuous production flow
  • Define precise specifications, APIs, pipeline behavior, and physics-aligned constraints
  • Ensure product features adhere to SCAILIUM’s rigid boundaries: No orchestration. No system of record. No serving. No dashboards
  • Enforce documentation rigor. Documentation is code
  • Technical Outbound Leadership (External): Serve as a public-facing authority on GPU starvation, impedance incompatibility, and the AI Production Layer
  • Lead technical sessions with Partners, OEMs (Dell, Supermicro, HPE), GSIs (Accenture, Deloitte), and strategic enterprise customers
  • Conduct in-depth customer pipeline analyses to identify physical constraints and translate them into SCAILIUM features or patterns
  • Present SCAILIUM’s architecture in a clear, authoritative, physics-grounded manner
  • Support sales, partnerships, and field engineering by communicating the “why” behind every product decision
  • Build artifacts that shape the category: reference architectures, workload blueprints, TCO models, and silicon saturation narratives
Read More
Arrow Right

Senior AI Hardware Architect

Join the Systems Planning and Architecture (SPARC) team within Microsoft’s Azure...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer, and/or government security screening requirements for this role
  • Passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead performance analysis, profiling, and benchmarking across GPU and in-house AI accelerator architectures, applying rigorous data and statistical analysis to identify complex performance bottlenecks, root causes, and optimization opportunities across hardware, software, and system layers
  • Run and analyze end-to-end AI models on production-like serving infrastructure, performing deep dives into modern AI serving stacks (e.g., optimized LLM serving frameworks, schedulers, runtimes, and memory management systems) to understand performance behavior, scalability limits, and system-level trade-offs
  • Provide data-driven recommendations and architectural trade-offs to senior technical leadership, balancing performance, complexity, cost, quality, reliability, and development timelines to inform accelerator and system architecture decisions
  • Develop and implement technical solutions to complex performance, quality, and design challenges, including kernel-level optimization, architectural tuning, and system-level performance improvements across multiple products or feature areas
  • Correlate on-silicon measurements, software traces, and kernel execution behavior with architectural models and simulators, ensuring alignment between measured performance and architectural intent, and identifying gaps that drive future design enhancements
  • Design, build, and evolve data correlation, analysis, and visualization tools and workflows that scale performance insight, accelerate debugging, and improve clarity and communication of optimization opportunities across teams
  • Lead and contribute to design and performance documentation, including architecture reviews, performance reports, functional specifications, and customized analyses
  • communicate progress, risks, and recommendations within and across teams, and help identify and mitigate significant project risks
  • Fulltime
Read More
Arrow Right

Head of Inference Kernels

As a core member of the team, you will play a pivotal role in leading a high-per...
Location
Location
United States , San Jose
Salary
Salary:
200000.00 - 300000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in designing and optimizing GPU kernels for deep learning on GPUs using CUDA, and assembly (ASM)
  • Experience with low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Deep fluency with transformer inference architecture, optimization levers, and full-stack systems (e.g., vLLM, custom runtimes)
  • History of delivering tangible perf wins on GPU hardware or custom AI accelerators
  • Solid understanding of roofline models of compute throughput, memory bandwidth and interconnect performance
  • Experienced in running large-scale workloads on heterogeneous compute clusters, optimizing for efficiency and scalability of AI workloads
  • Scopes projects crisply, sets aggressive but realistic milestones, and drives technical decision-making across the team
  • Anticipates blockers and shifts resources proactively
Job Responsibility
Job Responsibility
  • Architect Best-in-Class Inference Performance on Sohu: Deliver continuous batching throughput exceeding B200 by ≥10x on priority workloads
  • Develop Best-in-Performance Inference Mega Kernels: Develop complex, fused kernels that increase chip utilization and reduce inference latency, and validate these optimizations through benchmarking and regression-tested in production pipelines
  • Architect Model Mapping Strategies: Develop system level optimizations using a mix of techniques such tensor parallelism and expert parallelism for optimal performance
  • Hardware-Software Co-design of Inference-time Algorithmic Innovation: Develop and deploy production-ready inference-time algorithmic improvements (e.g., speculative decoding, prefill-decode disaggregation, KV cache offloading)
  • Build Scalable Team and Roadmap: Grow and retain a team of high-performing inference optimization engineers
  • Cross-Functional Performance Alignment: Ensure inference stack and performance goals are aligned with the software infrastructure teams, GTM and hardware teams for future generations of our hardware
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • significant equity package
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, On Device

Lead the productionalization of our IoT platform, with a primary emphasis on the...
Location
Location
United States
Salary
Salary:
140000.00 - 170000.00 USD / Year
utilidata.com Logo
Utilidata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience including 5+ years of experience developing production software and systems, or a combination of educational and professional experience commensurate with this level of experience
  • Demonstrated ability to design and implement distributed systems utilizing microservices in a resource-constrained environment (edge devices with limited memory, CPUs, GPU capacity, etc.)
  • Extensive experience using Python, C/C++, Rust, and the Linux operating system
  • Experience with device layered security, i.e. encryption (PKI), disk partitioning, secure boot, os kernel libraries, device drivers, os processes/daemons
  • Data compression and schema management for device time series data
  • Experience implementing and maintaining CI/CD workflows (e.g., GitHub Actions or Jenkins)
  • Strong understanding of synchronous and asynchronous network communication, including REST APIs, gRPC, binary protocols, and distributed publish/subscribe messaging systems and protocols like MQTT and ZeroMQ
  • Strong written and oral communication skills, with a proven track record of working effectively both individually and as part of a team
  • Willingness to travel up to 10% of time
Job Responsibility
Job Responsibility
  • Design, propose, plan, implement, and test resource-constrained, edge software in Python (and possibly lower-level languages, e.g., Rust) including the implementation of precision telemetry collection, real-time control interfaces, and robust system observability
  • Create and maintain CI/CD processes as necessary to support development and deployment with a focus on reproducibility, regression testing for embedded systems, and deployment in real-world, intermittently connected environments
  • Contribute to internal and external technical documentation
  • Collaborate with a cross-functional team of software, hardware, quality assurance (QA), and power systems engineers
  • data scientists
  • and leadership
  • Provide high-quality, in-depth code and architecture reviews, implement new features, and provide technical leadership while coordinating with project management, QA, and other internal teams
  • Continually advocate for and implement process improvement and automation
  • Foster a culture of open communication, innovation, and continual improvement
  • Mentor other engineers using paired programming, code review, and collaborative test scenario design
What we offer
What we offer
  • 10% annual bonus target
  • stock options
  • flexible paid time off
  • health, dental, vision
  • employer-match 401k
  • Fulltime
Read More
Arrow Right