AI Model, Framework, and GPU Engineer Job at AMD (Munich)

AI Models MAD - Model Automation and Dashboarding Engineer

AMD is looking for a skilled and motivated software engineer to join the Model A...

Location

India , Hyderabad

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
Strong C/C++/Python programming and software design skills, including debugging, performance analysis, and test design
Experience in test automation, CI/CD, and Linux scripting
Knowledge of GPU computing (HIP, CUDA, OpenCL)
AI model experience or knowledge in Natural Language Processing, Vision, Audio, Recommendation systems
Knowledge of Docker, Kubernetes, or Ansible for testing and deploying AI models and services at scale
Experience with profiling tools, system monitoring, or regression tracking systems for deep learning models
Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development
Strong written and verbal communication skills with a proactive approach to defining and driving development efforts

Job Responsibility

AI Model Enablement & Optimization: Enable and optimize key AI models (LLM, Vision, MultiModal, etc.) on AMD GPUs. Optimize AI frameworks like PyTorch, TensorFlow, etc., on AMD GPUs in upstream open-source repositories
Collaboration & Integration: Collaborate with internal GPU library teams and open-source framework maintainers to analyze, optimize, and integrate code changes upstream
Model Testing & Validation: Build and maintain automated functional and performance testing pipelines for AI models across ROCm-supported hardware using scalable tools
Benchmarking & Metrics: Develop tools and automation for continuous benchmarking and regression tracking across hardware generations and ROCm releases. Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics
Ecosystem & Open-Source Contributions: Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm. Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments

Senior AI Models MAD - Model Automation and Dashboarding Engineer

AMD is looking for a skilled and motivated software engineer to join the Model A...

Location

India , Hyderabad

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
Strong C/C++/Python programming and software design skills, including debugging, performance analysis, and test design
Experience in test automation, CI/CD, and Linux scripting
Knowledge of GPU computing (HIP, CUDA, OpenCL)
Knowledge of Docker, Kubernetes, or Ansible for testing and deploying AI models and services at scale
Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development
Strong written and verbal communication skills with a proactive approach to defining and driving development efforts

Job Responsibility

Enable and optimize key AI models (LLM, Vision, MultiModal, etc.) on AMD GPUs
Optimize AI frameworks like PyTorch, TensorFlow, etc., on AMD GPUs in upstream open-source repositories
Collaborate with internal GPU library teams and open-source framework maintainers to analyze, optimize, and integrate code changes upstream
Build and maintain automated functional and performance testing pipelines for AI models across ROCm-supported hardware using scalable tools
Develop tools and automation for continuous benchmarking and regression tracking across hardware generations and ROCm releases
Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics
Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm
Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...

Location

Canada , Markham

Salary:

106400.00 - 159600.00 CAD / Year

AMD

Expiration Date

Until further notice

Requirements

Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline

Job Responsibility

Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.

Fulltime

Applied AI & GPU Software Engineer

AMD is seeking a Software Engineer to join the Software Ecosystem Enablement tea...

Location

Poland , Warsaw

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

10+ years of professional software development experience
Solid programming fundamentals in C/C++
Experience developing or contributing to GPU-accelerated applications
Solid understanding of GPU programming fundamentals
Debugging experience with GPU kernels or performance-critical code
Familiarity with modern ML frameworks and inference systems
Experience with denoising, neural rendering, or ML simulation is an asset
Experience with content creation apps, CAD/CAE tools, or HPC pipeline is an asset

Job Responsibility

Investigate and prototype hybrid ML systems for graphics, simulation, and media-generation pipelines
Integrate existing ML models and inference pipelines into commercial software systems
Design efficient workload scheduling and distribution across heterogeneous resources
Profile workloads across GPU, NPU, and CPU to identify bottlenecks and optimize performance
Evaluate runtimes, execution providers, and deployment strategies for modern hardware architectures
Collaborate with domain experts and existing GPU engineering teams

What we offer

Benefits offered are described: AMD benefits at a glance

Fulltime

Senior Software Engineer, AI Platform and Enablement

We're building a next-generation AI-powered platform and web application for cre...

Location

United States , San Francisco

Salary:

180000.00 - 286000.00 USD / Year

Descript

Expiration Date

Until further notice

Requirements

Experience in deploying and managing AI models in production
Experience with the tools of large volume data pipelines like spark, flume, dask, etc.
Familiarity with cloud platforms (AWS, Google Cloud, Azure) and container technologies (Docker, Kubernetes)
Knowledge of DevOps and MLOps best practices
Strong problem-solving abilities and excellent communication skills

Job Responsibility

Build, maintain, and standardize third-party model integrations, including consulting for other engineering teams with AI model integration needs
Design, implement, and maintain our AI infrastructure supporting our machine learning life cycle, including data ingestion pipelines, training developer experience and infrastructure, evaluation frameworks, and deployments / GPU infrastructure
Collaborate with Product Managers, Research Engineers, and AI Researchers to understand their infrastructure needs and ensure our AI systems are robust, scalable, and efficient
Optimize and scale our models and algorithms for efficient inference
Deploy, monitor, and manage AI models in production

What we offer

Generous healthcare package
401k matching program
Catered lunches
Flexible vacation time

Fulltime

AI Software Product Engineer (GPU)

AI Product Applications Engineer (GPU AI SW Solution Architect) – China position...

Location

China , Beijing

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Deep knowledge of Data Center, Client, Endpoint AI workloads such as LLM, Generative AI, Recommendation, and/or transformer
Hands-on experiences with various AI models, end-to-end pipeline, industry framework (pytorch, vLLM, SGLang, llm-d,Triton) / SDKs and solutions
Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
BS required
MS preferred, with 6+ years of relevant industry experience

Job Responsibility

Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
Provide feedback and requirements for AI software across cloud, client, and edge deployments

AI Software Product Engineer (GPU Kernel)

AI Product Applications Engineer (Solution Architect) – China position is in the...

Location

China , Shanghai

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Hands‑on experience with AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, TensorRT‑LLM, Megatron‑LM, and DeepSpeed
Proven experience in LLMs, Generative AI models, transformer architectures, and end‑to‑end AI pipelines
Familiarity with AMD MI‑series GPU architecture, GPU kernel programming, and the ROCm AI software stack is strongly preferred
Strong communication and presentation skills, with the ability to articulate architectural proposals and value propositions clearly
BS required
MS preferred, with 6+ years of relevant industry experience

Job Responsibility

Lead and contribute to AI open‑source software projects that support the developer community and the broader ecosystem
Drive developer enablement through technical content (blogs, tutorials, user guides) and AI Academy initiatives
Support the success of AI developers, communities, and customer PoCs through hands‑on technical contributions
Capture and prioritize developer and customer requirements to influence AMD’s AI software and solutions roadmap
Analyze competitive AI software and solutions to identify strengths/weaknesses and clearly communicate AMD’s value propositions
Provide feedback and requirements for AI software across cloud, client, and edge deployments

Software Engineer II and Senior Software Engineer - Performance

The Artificial Intelligence Performance team at Microsoft develops AI software t...

Location

United States , Mountain View

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
Measure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft silicon
Optimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleet
Enable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUs
Design, implement, and test functions or components for our AI/DNN/LLM frameworks and tools
Speeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systems
Communicate and collaborate with our partners both internal and external
Embody Microsoft's Culture and Values

Fulltime

Select Country

AI Model, Framework, and GPU Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?