CrawlJobs Logo

Ai/ml Performance Engineer - Gpu Optimization

Finland, Helsinki · Job Posted April 16, 2026
Apply Position
Job Link Share

Job Description

As an AI Performance Engineers you will focus on pushing machine learning workloads to peak hardware efficiency. The emphasis of this call is on analysis, profiling, debugging and optimization at application/workload-level; however a broad understanding of low-level GPU execution and kernel optimization is a major advantage.

Job Responsibility

  • Explore and benchmark ML models and workloads (including diffusion models, LLMs, and multimodal systems) to identify bottlenecks across compute, memory, and networking layers
  • Optimize performance for inference and training on AMD GPUs, including parallelization strategies, quantization techniques, serving orchestration, network communication and distributed execution
  • Perform deep profiling to uncover inefficiencies in ML frameworks, data pipelines, compiler tools, and key tensor operations such GEMMs, Convs and Attention, to name a few
  • Support AMD top-tier customers to improve model throughput, reduce latency, and optimize resource utilization across multi-GPU and cluster environments
  • Work closely with hardware, compiler, and software teams to drive improvements across the full ROCm stack
  • Communicate performance bottlenecks, solutions, and optimization strategies to stakeholders
  • Work with international teams located across Europe, US and Asia

Requirements

  • Experience with profiling, debugging, benchmarking, and optimization tools
  • Familiarity with ML frameworks (e.g., PyTorch, JAX, TF) and inference serving frameworks (e.g., vLLM, SGLang)
  • Strong C++ and/or Python skills, along the basics: unix, git, terminal, debugging, testing, thinking
  • Experience with Docker, container orchestration (Kubernetes), and job schedulers (Slurm)
  • Ability to work independently and collaboratively in a multi-cultural team
  • Excellent communication skills in a fast-moving environment
  • BSc, MSc, PhD or equivalent experience in Computer Science, Electrical Engineering or a related field

Nice to have

  • Experience with AMD tooling (not mandatory if strong fundamentals)
  • GPU kernel development experience with HIP, CUDA, or OpenCL
  • Tile-programming experience (Triton, Pallas, Gluon, Cutlass, cuDSL...)
  • Experience in multi-GPU cluster environments (single- and multi-node)
  • Background in high-performance networking for AI infrastructure
  • Familiarity with compiler backends or code generation
  • Experience with KVCache optimization and memory hierarchy tuning

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ai/ml Performance Engineer - Gpu Optimization

8 matching positions

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Senior System Performance Engineer

Role As a Senior System Performance Engineer on GM's AV System Performance Tea...
Location
Location
United States , Austin;Mountain View
Salary
Salary:
128700.00 - 261300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 3+ years of relevant industry experience
  • Hands-on programming experience with C++ and Python
  • Strong understanding of computer architecture and system-level software fundamentals
  • Proven experience with performance profiling, analysis, tuning, and optimization
  • Experience developing or optimizing high-performance software, ideally for heterogeneous compute environments (e.g., GPUs, DSPs, or accelerators)
  • Familiarity with industry benchmarks and workloads (e.g., MLPerf)
  • Strong communication skills with the ability to influence technical decisions within a team or product area
  • Ability to lead projects through ambiguity and deliver results end to end
  • BS, MS in Computer Science or a related technical field (or equivalent practical experience)
Job Responsibility
Job Responsibility
  • Collaborate with performance leads and partner engineering teams to align on performance requirements, development practices, and improvement opportunities
  • Lead performance-focused engineering initiatives with moderate ambiguity and cross-team collaboration
  • Contribute to the roadmap for performance tooling, frameworks, and methodologies that support efficient and scalable AV software development
  • Evaluate and prototype new tools, techniques, and technologies to improve runtime performance and developer workflows
  • Design, implement, and maintain tools and automated systems that support performance analysis, debugging, and continuous monitoring
  • Apply and help improve performance engineering standards, processes, and best practices at the team level
  • Analyze software behavior, identify performance bottlenecks, and collaborate with product teams to propose and implement optimizations
  • Mentor junior engineers on performance profiling, optimization strategies, and engineering best practices
What we offer
What we offer
  • Medical
  • Dental
  • Vision
  • Health Savings Account
  • Flexible Spending Accounts
  • Retirement savings plan
  • Sickness and accident benefits
  • Life insurance
  • Paid vacation & holidays
  • Tuition assistance programs
  • Fulltime
Read More
Arrow Right

AI Product Performance Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING. At AMD, our mission is to build great pro...
Location
Location
China , Shenzhen
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • deep knowledge of Data Center AI workloads such as LLM, Generative AI, Recommendation, NLP, Video Analytics, and/or transformer
  • hands-on experiences with various AI models, end-to-end pipeline, industry framework / SDKs and solutions
  • GPU Architecture Mastery
  • Kernel Programming Expertise: Strong proficiency in C++ and parallel computing, with extensive hands-on experience in NVIDIA CUDA or AMD HIP kernel programming
  • Performance Engineering: Demonstrated ability to debug and profile complex GPU workloads
  • Systems Knowledge: Familiarity with asynchronous execution, stream management, and host-device memory transfers
  • Python DSLs & Triton: Experience implementing kernels using OpenAI Triton or other Python-based DSLs
  • Inference Engine Experience: Hands-on experience integrating custom kernels into large-scale inference frameworks such as vLLM, SGLang, or TensorRT-LLM
  • Deep Learning Frameworks: Familiarity with writing custom extensions or operators for PyTorch (C++/CUDA extensions)
  • Hardware Agnosticism: Experience porting kernels between NVIDIA and AMD architectures or working with cross-platform HPC libraries
Job Responsibility
Job Responsibility
  • High-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize hardware utilization
  • Performance Optimization: Analyze and optimize kernel execution for latency and throughput, addressing bottlenecks in memory bandwidth, instruction latency, and thread divergence
  • Workload Analysis: Evaluate the end-to-end performance impact of individual kernels on full-stack AI models, ensuring that micro-optimizations translate to application-level speedups
  • Profiling & Tuning: Utilize advanced GPU profiling tools (e.g., ROCm Profiler, Pytorch Profiler) to identify performance cliffs, stall pipelines, and memory hierarchy inefficiencies
  • Architecture Adaptation: Tailor implementation strategies to leverage specific features of modern GPU architectures (e.g., Matrix Cores, HBM characteristics)
  • Framework Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines
What we offer
What we offer
  • AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Microsoft Advertising is seeking a Principal Software Engineer to join our Ads E...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure
  • Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services
  • Familiarity with LLM inference optimization—model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration
  • Demonstrated success operating large-scale systems with SLA-based capacity forecasting, autoscaling, and performance telemetry
  • proven leadership in cross-functional architecture initiatives and technical mentorship
Job Responsibility
Job Responsibility
  • Design and lead the development of large-scale, distributed online serving systems—including GPU-accelerated and CPU-based ranking/inference pipelines—to process millions of ad requests per second with ultra-low latency, high throughput, and solid reliability
  • Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware (GPU, CPU, and memory tiers)
  • Profile and optimize performance across the full stack—from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling—identifying bottlenecks, tuning latency tails, and improving cost efficiency through advanced profiling and instrumentation
  • Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms
  • drive rapid diagnosis and mitigation of performance regressions or outages in globally distributed systems
  • Collaborate and mentor across teams—driving architecture reviews, enforcing engineering excellence, promoting system-level optimization practices, and mentoring others in deep debugging, profiling, and performance engineering
  • Fulltime
Read More
Arrow Right

Director Software Development

At AMD, we are enabling the next generation of AI innovation by leveraging the p...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in AI/ML software development
  • 5+ years in leadership roles managing AI model enablement or optimization teams
  • Expertise in optimizing real-time AI models for deep learning applications (computer vision, NLP, etc.)
  • Proficiency with AI frameworks (TensorFlow, PyTorch, ONNX Runtime, JAX, Triton) and their optimization for GPU architectures
  • Strong background in optimizing software for AMD GPUs or similar high-performance platforms
  • Familiarity with ROCm is a plus
  • Proven experience with performance optimization, benchmarking, and scaling AI models on GPUs
  • Exceptional ability to collaborate cross-functionally and define long-term strategies for AI/ML innovation
  • Strong verbal and written communication skills, with experience presenting to senior leadership and working with customers and partners
  • Advanced degree (Master’s or PhD) in Computer Science, Electrical Engineering, AI/ML, or related field
Job Responsibility
Job Responsibility
  • Lead and develop teams responsible for AI inference model enablement and optimization
  • Direct efforts to optimize AI frameworks for seamless compatibility and performance on AMD GPUs (Instinct, Navi)
  • Oversee benchmarking, performance tuning, and optimization of AI inference models to improve latency, throughput, and efficiency on AMD hardware
  • Partner with hardware, software, and QA teams to ensure tight integration of AI frameworks with ROCm for maximum performance
  • Drive AI model optimization innovations, enhancing the speed, efficiency, and scalability of AI workloads
  • Lead the vision and strategy for optimizing AI inference on AMD GPUs
  • Collaborate with customers and open-source communities to ensure that AMD’s AI solutions meet industry needs, fostering contributions to MIGraphX, vLLM, and other AMD AI Framework Inference teams
  • Oversee automation frameworks to streamline model integration and performance testing, ensuring scalability across diverse AI workloads
Read More
Arrow Right

Principal Ai Software Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great prod...
Location
Location
United States , San Jose
Salary
Salary:
240000.00 - 360000.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge in GPU architectures, basic knowledge of CPU architecture
  • Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
  • Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
  • Experience in hardware/software co-design, building high-performance products across the full product lifecycle
  • Experience with operating systems (OS) and device driver development is a plus
  • Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred
Job Responsibility
Job Responsibility
  • Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
  • Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
  • Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure out-of-the-box performance excellence on AMD hardware
  • Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
  • Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
  • Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource
What we offer
What we offer
  • AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right

Principal AI Software Engineer

AMD AI Group is seeking a highly influential technical leader for OneROCm — driv...
Location
Location
United States , San Jose
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge in GPU architectures, basic knowledge of CPU architecture
  • Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
  • Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
  • Experience in hardware/software co-design, building high-performance products across the full product lifecycle
  • Experience with operating systems (OS) and device driver development is a plus
  • Undergrad degree required. Bachelor of Science, Masters, or PhD degree with emphasis in Electrical Engineering, Computer architecture, or Computer Science with relevant experience preferred
Job Responsibility
Job Responsibility
  • Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends
  • Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers
  • Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure out-of-the-box performance excellence on AMD hardware
  • Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting
  • Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations
  • Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
  • Fulltime
Read More
Arrow Right