Ai/ml Performance Engineer - Gpu Optimization Job at AMD (Helsinki)

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...

Location

United States , Santa Clara

Salary:

210400.00 - 315600.00 USD / Year

AMD

Expiration Date

Until further notice

Requirements

Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
Experience in working with large customers such as Cloud Service Providers and global enterprise customers
Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
Excellent communication level from engineer to mid-management to C-level of audience

Job Responsibility

Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
Engage with AMD product groups to drive resolution of application and customer issues
Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Senior System Performance Engineer

Role As a Senior System Performance Engineer on GM's AV System Performance Tea...

Location

United States , Austin;Mountain View

Salary:

128700.00 - 261300.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Minimum 3+ years of relevant industry experience
Hands-on programming experience with C++ and Python
Strong understanding of computer architecture and system-level software fundamentals
Proven experience with performance profiling, analysis, tuning, and optimization
Experience developing or optimizing high-performance software, ideally for heterogeneous compute environments (e.g., GPUs, DSPs, or accelerators)
Familiarity with industry benchmarks and workloads (e.g., MLPerf)
Strong communication skills with the ability to influence technical decisions within a team or product area
Ability to lead projects through ambiguity and deliver results end to end
BS, MS in Computer Science or a related technical field (or equivalent practical experience)

Job Responsibility

Collaborate with performance leads and partner engineering teams to align on performance requirements, development practices, and improvement opportunities
Lead performance-focused engineering initiatives with moderate ambiguity and cross-team collaboration
Contribute to the roadmap for performance tooling, frameworks, and methodologies that support efficient and scalable AV software development
Evaluate and prototype new tools, techniques, and technologies to improve runtime performance and developer workflows
Design, implement, and maintain tools and automated systems that support performance analysis, debugging, and continuous monitoring
Apply and help improve performance engineering standards, processes, and best practices at the team level
Analyze software behavior, identify performance bottlenecks, and collaborate with product teams to propose and implement optimizations
Mentor junior engineers on performance profiling, optimization strategies, and engineering best practices

What we offer

Medical
Dental
Vision
Health Savings Account
Flexible Spending Accounts
Retirement savings plan
Sickness and accident benefits
Life insurance
Paid vacation & holidays
Tuition assistance programs

Fulltime

AI Product Performance Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING. At AMD, our mission is to build great pro...

Location

China , Shenzhen

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

deep knowledge of Data Center AI workloads such as LLM, Generative AI, Recommendation, NLP, Video Analytics, and/or transformer
hands-on experiences with various AI models, end-to-end pipeline, industry framework / SDKs and solutions
GPU Architecture Mastery
Kernel Programming Expertise: Strong proficiency in C++ and parallel computing, with extensive hands-on experience in NVIDIA CUDA or AMD HIP kernel programming
Performance Engineering: Demonstrated ability to debug and profile complex GPU workloads
Systems Knowledge: Familiarity with asynchronous execution, stream management, and host-device memory transfers
Python DSLs & Triton: Experience implementing kernels using OpenAI Triton or other Python-based DSLs
Inference Engine Experience: Hands-on experience integrating custom kernels into large-scale inference frameworks such as vLLM, SGLang, or TensorRT-LLM
Deep Learning Frameworks: Familiarity with writing custom extensions or operators for PyTorch (C++/CUDA extensions)
Hardware Agnosticism: Experience porting kernels between NVIDIA and AMD architectures or working with cross-platform HPC libraries

Job Responsibility

High-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize hardware utilization
Performance Optimization: Analyze and optimize kernel execution for latency and throughput, addressing bottlenecks in memory bandwidth, instruction latency, and thread divergence
Workload Analysis: Evaluate the end-to-end performance impact of individual kernels on full-stack AI models, ensuring that micro-optimizations translate to application-level speedups
Profiling & Tuning: Utilize advanced GPU profiling tools (e.g., ROCm Profiler, Pytorch Profiler) to identify performance cliffs, stall pipelines, and memory hierarchy inefficiencies
Architecture Adaptation: Tailor implementation strategies to leverage specific features of modern GPU architectures (e.g., Matrix Cores, HBM characteristics)
Framework Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines

What we offer

AMD benefits at a glance

Fulltime

Principal Software Engineer

Microsoft Advertising is seeking a Principal Software Engineer to join our Ads E...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure
Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services
Familiarity with LLM inference optimization—model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration
Demonstrated success operating large-scale systems with SLA-based capacity forecasting, autoscaling, and performance telemetry
proven leadership in cross-functional architecture initiatives and technical mentorship

Job Responsibility

Design and lead the development of large-scale, distributed online serving systems—including GPU-accelerated and CPU-based ranking/inference pipelines—to process millions of ad requests per second with ultra-low latency, high throughput, and solid reliability
Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware (GPU, CPU, and memory tiers)
Profile and optimize performance across the full stack—from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling—identifying bottlenecks, tuning latency tails, and improving cost efficiency through advanced profiling and instrumentation
Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms
drive rapid diagnosis and mitigation of performance regressions or outages in globally distributed systems
Collaborate and mentor across teams—driving architecture reviews, enforcing engineering excellence, promoting system-level optimization practices, and mentoring others in deep debugging, profiling, and performance engineering

Fulltime

Director Software Development

At AMD, we are enabling the next generation of AI innovation by leveraging the p...

Location

China , Shanghai

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

10+ years in AI/ML software development
5+ years in leadership roles managing AI model enablement or optimization teams
Expertise in optimizing real-time AI models for deep learning applications (computer vision, NLP, etc.)
Proficiency with AI frameworks (TensorFlow, PyTorch, ONNX Runtime, JAX, Triton) and their optimization for GPU architectures
Strong background in optimizing software for AMD GPUs or similar high-performance platforms
Familiarity with ROCm is a plus
Proven experience with performance optimization, benchmarking, and scaling AI models on GPUs
Exceptional ability to collaborate cross-functionally and define long-term strategies for AI/ML innovation
Strong verbal and written communication skills, with experience presenting to senior leadership and working with customers and partners
Advanced degree (Master’s or PhD) in Computer Science, Electrical Engineering, AI/ML, or related field

Job Responsibility

Lead and develop teams responsible for AI inference model enablement and optimization
Direct efforts to optimize AI frameworks for seamless compatibility and performance on AMD GPUs (Instinct, Navi)
Oversee benchmarking, performance tuning, and optimization of AI inference models to improve latency, throughput, and efficiency on AMD hardware
Partner with hardware, software, and QA teams to ensure tight integration of AI frameworks with ROCm for maximum performance
Drive AI model optimization innovations, enhancing the speed, efficiency, and scalability of AI workloads
Lead the vision and strategy for optimizing AI inference on AMD GPUs
Collaborate with customers and open-source communities to ensure that AMD’s AI solutions meet industry needs, fostering contributions to MIGraphX, vLLM, and other AMD AI Framework Inference teams
Oversee automation frameworks to streamline model integration and performance testing, ensuring scalability across diverse AI workloads

Principal Ai Software Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great prod...

Location

United States , San Jose

Salary:

240000.00 - 360000.00 USD / Year

AMD

Expiration Date

Until further notice

Requirements

Knowledge in GPU architectures, basic knowledge of CPU architecture
Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
Experience in hardware/software co-design, building high-performance products across the full product lifecycle
Experience with operating systems (OS) and device driver development is a plus
Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred

Job Responsibility

Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure out-of-the-box performance excellence on AMD hardware
Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource

What we offer

AMD benefits at a glance

Fulltime

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...

Location

Canada , Markham

Salary:

106400.00 - 159600.00 CAD / Year

AMD

Expiration Date

Until further notice

Requirements

Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline

Job Responsibility

Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.

Fulltime

Principal AI Software Engineer

AMD AI Group is seeking a highly influential technical leader for OneROCm — driv...

Location

United States , San Jose

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Knowledge in GPU architectures, basic knowledge of CPU architecture
Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
Experience in hardware/software co-design, building high-performance products across the full product lifecycle
Experience with operating systems (OS) and device driver development is a plus
Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred

Job Responsibility

Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure out-of-the-box performance excellence on AMD hardware
Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource

What we offer

Benefits offered are described: AMD benefits at a glance

Fulltime

Select Country

Ai/ml Performance Engineer - Gpu Optimization

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?