CrawlJobs Logo

AI Model, Framework, and GPU Engineer

Germany, Munich · Job Posted March 20, 2026
Apply Position
Job Link Share

Job Description

We are looking for an experienced Machine Learning Software Engineer who will be part of the AMD GPU Technology and Engineering Software Team developing our latest AI software technologies. You will engage with cross-functional teams to optimize various parts of the AI software stack and deliver AI solutions across AMD Radeon and Ryzen product families.

Job Responsibility

  • Develop and deliver innovative AI software solutions to AMD customers and users
  • Enable and optimize software stack for standard frameworks like ONNX and PyTorch, as well as new popular Open-Source AI software
  • Bring up new SOTA AI models, analyze and improve their performance
  • Participate and drive end-2-end AI software development from feature scoping, implementation, integration and verification, to customer enablement

Requirements

  • Strong technical and analytical skills in C/C++/Python AI development in Windows and Linux environment
  • Some knowledge on GPU programming and compiler
  • Capable problem solver
  • Technical leader to define goals and scope and drive development effort
  • Good communication skills
  • Enthusiastic about AI technologies
  • Strongly motivated to enable customers with best feature-rich efficient solutions
  • Strong cross-platform software development experience and deep programming skills in C/C++ and Python
  • Excellent problem-solving and effective communication skills
  • Development experience on CONV, GEMM, and/or non-linear operators
  • GPU acceleration experience with compiler and low-level GPU programming is a plus
  • Experience with common AI frameworks and inference stacks
  • Solid knowledge of AI and ML concepts and techniques
  • Understanding the performance implications on AI acceleration of different compute, memory, and communication configurations
  • Open-source software development experience is a plus
  • Bachelor’s, Master, or PhD in Computer Science, Electrical Engineering or relevant fields

Nice to have

  • GPU acceleration experience with compiler and low-level GPU programming
  • Open-source software development experience

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI Model, Framework, and GPU Engineer

8 matching positions

AI Models MAD - Model Automation and Dashboarding Engineer

AMD is looking for a skilled and motivated software engineer to join the Model A...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • Strong C/C++/Python programming and software design skills, including debugging, performance analysis, and test design
  • Experience in test automation, CI/CD, and Linux scripting
  • Knowledge of GPU computing (HIP, CUDA, OpenCL)
  • AI model experience or knowledge in Natural Language Processing, Vision, Audio, Recommendation systems
  • Knowledge of Docker, Kubernetes, or Ansible for testing and deploying AI models and services at scale
  • Experience with profiling tools, system monitoring, or regression tracking systems for deep learning models
  • Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development
  • Strong written and verbal communication skills with a proactive approach to defining and driving development efforts
Job Responsibility
Job Responsibility
  • AI Model Enablement & Optimization: Enable and optimize key AI models (LLM, Vision, MultiModal, etc.) on AMD GPUs. Optimize AI frameworks like PyTorch, TensorFlow, etc., on AMD GPUs in upstream open-source repositories
  • Collaboration & Integration: Collaborate with internal GPU library teams and open-source framework maintainers to analyze, optimize, and integrate code changes upstream
  • Model Testing & Validation: Build and maintain automated functional and performance testing pipelines for AI models across ROCm-supported hardware using scalable tools
  • Benchmarking & Metrics: Develop tools and automation for continuous benchmarking and regression tracking across hardware generations and ROCm releases. Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics
  • Ecosystem & Open-Source Contributions: Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm. Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments
Read More
Arrow Right

Senior AI Models MAD - Model Automation and Dashboarding Engineer

AMD is looking for a skilled and motivated software engineer to join the Model A...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • Strong C/C++/Python programming and software design skills, including debugging, performance analysis, and test design
  • Experience in test automation, CI/CD, and Linux scripting
  • Knowledge of GPU computing (HIP, CUDA, OpenCL)
  • Knowledge of Docker, Kubernetes, or Ansible for testing and deploying AI models and services at scale
  • Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development
  • Strong written and verbal communication skills with a proactive approach to defining and driving development efforts
Job Responsibility
Job Responsibility
  • Enable and optimize key AI models (LLM, Vision, MultiModal, etc.) on AMD GPUs
  • Optimize AI frameworks like PyTorch, TensorFlow, etc., on AMD GPUs in upstream open-source repositories
  • Collaborate with internal GPU library teams and open-source framework maintainers to analyze, optimize, and integrate code changes upstream
  • Build and maintain automated functional and performance testing pipelines for AI models across ROCm-supported hardware using scalable tools
  • Develop tools and automation for continuous benchmarking and regression tracking across hardware generations and ROCm releases
  • Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics
  • Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm
  • Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments
Read More
Arrow Right

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right

Applied AI & GPU Software Engineer

AMD is seeking a Software Engineer to join the Software Ecosystem Enablement tea...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of professional software development experience
  • Solid programming fundamentals in C/C++
  • Experience developing or contributing to GPU-accelerated applications
  • Solid understanding of GPU programming fundamentals
  • Debugging experience with GPU kernels or performance-critical code
  • Familiarity with modern ML frameworks and inference systems
  • Experience with denoising, neural rendering, or ML simulation is an asset
  • Experience with content creation apps, CAD/CAE tools, or HPC pipeline is an asset
Job Responsibility
Job Responsibility
  • Investigate and prototype hybrid ML systems for graphics, simulation, and media-generation pipelines
  • Integrate existing ML models and inference pipelines into commercial software systems
  • Design efficient workload scheduling and distribution across heterogeneous resources
  • Profile workloads across GPU, NPU, and CPU to identify bottlenecks and optimize performance
  • Evaluate runtimes, execution providers, and deployment strategies for modern hardware architectures
  • Collaborate with domain experts and existing GPU engineering teams
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Platform and Enablement

We're building a next-generation AI-powered platform and web application for cre...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 286000.00 USD / Year
descript.com Logo
Descript
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in deploying and managing AI models in production
  • Experience with the tools of large volume data pipelines like spark, flume, dask, etc.
  • Familiarity with cloud platforms (AWS, Google Cloud, Azure) and container technologies (Docker, Kubernetes)
  • Knowledge of DevOps and MLOps best practices
  • Strong problem-solving abilities and excellent communication skills
Job Responsibility
Job Responsibility
  • Build, maintain, and standardize third-party model integrations, including consulting for other engineering teams with AI model integration needs
  • Design, implement, and maintain our AI infrastructure supporting our machine learning life cycle, including data ingestion pipelines, training developer experience and infrastructure, evaluation frameworks, and deployments / GPU infrastructure
  • Collaborate with Product Managers, Research Engineers, and AI Researchers to understand their infrastructure needs and ensure our AI systems are robust, scalable, and efficient
  • Optimize and scale our models and algorithms for efficient inference
  • Deploy, monitor, and manage AI models in production
What we offer
What we offer
  • Generous healthcare package
  • 401k matching program
  • Catered lunches
  • Flexible vacation time
  • Fulltime
Read More
Arrow Right

AI Software Product Engineer (GPU)

AI Product Applications Engineer (GPU AI SW Solution Architect) – China position...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge of Data Center, Client, Endpoint AI workloads such as LLM, Generative AI, Recommendation, and/or transformer
  • Hands-on experiences with various AI models, end-to-end pipeline, industry framework (pytorch, vLLM, SGLang, llm-d,Triton) / SDKs and solutions
  • Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
  • Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
  • Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
  • Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
  • BS required
  • MS preferred, with 6+ years of relevant industry experience
Job Responsibility
Job Responsibility
  • Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
  • Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
  • Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
  • Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
  • Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
  • Provide feedback and requirements for AI software across cloud, client, and edge deployments
Read More
Arrow Right

AI Software Product Engineer (GPU Kernel)

AI Product Applications Engineer (Solution Architect) – China position is in the...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
  • Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
  • Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
  • Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
  • BS required
  • MS preferred, with 6+ years of relevant industry experience
Job Responsibility
Job Responsibility
  • Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
  • Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
  • Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
  • Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
  • Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
  • Provide feedback and requirements for AI software across cloud, client, and edge deployments
Read More
Arrow Right

Software Engineer II and Senior Software Engineer - Performance

The Artificial Intelligence Performance team at Microsoft develops AI software t...
Location
Location
United States , Mountain View
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
  • Measure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft silicon
  • Optimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleet
  • Enable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUs
  • Design, implement, and test functions or components for our AI/DNN/LLM frameworks and tools
  • Speeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systems
  • Communicate and collaborate with our partners both internal and external
  • Embody Microsoft's Culture and Values
  • Fulltime
Read More
Arrow Right