CrawlJobs Logo

Ai/ml Performance Engineer - Gpu Optimization

amd.com Logo

AMD

Location Icon

Location:
Finland , Helsinki

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As an AI Performance Engineers you will focus on pushing machine learning workloads to peak hardware efficiency. The emphasis of this call is on analysis, profiling, debugging and optimization at application/workload-level; however a broad understanding of low-level GPU execution and kernel optimization is a major advantage.

Job Responsibility:

  • Explore and benchmark ML models and workloads (including diffusion models, LLMs, and multimodal systems) to identify bottlenecks across compute, memory, and networking layers
  • Optimize performance for inference and training on AMD GPUs, including parallelization strategies, quantization techniques, serving orchestration, network communication and distributed execution
  • Perform deep profiling to uncover inefficiencies in ML frameworks, data pipelines, compiler tools, and key tensor operations such GEMMs, Convs and Attention, to name a few
  • Support AMD top-tier customers to improve model throughput, reduce latency, and optimize resource utilization across multi-GPU and cluster environments
  • Work closely with hardware, compiler, and software teams to drive improvements across the full ROCm stack
  • Communicate performance bottlenecks, solutions, and optimization strategies to stakeholders
  • Work with international teams located across Europe, US and Asia

Requirements:

  • Experience with profiling, debugging, benchmarking, and optimization tools
  • Familiarity with ML frameworks (e.g., PyTorch, JAX, TF) and inference serving frameworks (e.g., vLLM, SGLang)
  • Strong C++ and/or Python skills, along the basics: unix, git, terminal, debugging, testing, thinking
  • Experience with Docker, container orchestration (Kubernetes), and job schedulers (Slurm)
  • Ability to work independently and collaboratively in a multi-cultural team
  • Excellent communication skills in a fast-moving environment
  • BSc, MSc, PhD or equivalent experience in Computer Science, Electrical Engineering or a related field

Nice to have:

  • Experience with AMD tooling (not mandatory if strong fundamentals)
  • GPU kernel development experience with HIP, CUDA, or OpenCL
  • Tile-programming experience (Triton, Pallas, Gluon, Cutlass, cuDSL...)
  • Experience in multi-GPU cluster environments (single- and multi-node)
  • Background in high-performance networking for AI infrastructure
  • Familiarity with compiler backends or code generation
  • Experience with KVCache optimization and memory hierarchy tuning

Additional Information:

Job Posted:
April 16, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Ai/ml Performance Engineer - Gpu Optimization

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right
New

Principal Software Engineer

Microsoft Advertising is seeking a Principal Software Engineer to join our Ads E...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure
  • Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services
  • Familiarity with LLM inference optimization—model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration
  • Demonstrated success operating large-scale systems with SLA-based capacity forecasting, autoscaling, and performance telemetry
  • proven leadership in cross-functional architecture initiatives and technical mentorship
Job Responsibility
Job Responsibility
  • Design and lead the development of large-scale, distributed online serving systems—including GPU-accelerated and CPU-based ranking/inference pipelines—to process millions of ad requests per second with ultra-low latency, high throughput, and solid reliability
  • Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware (GPU, CPU, and memory tiers)
  • Profile and optimize performance across the full stack—from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling—identifying bottlenecks, tuning latency tails, and improving cost efficiency through advanced profiling and instrumentation
  • Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms
  • drive rapid diagnosis and mitigation of performance regressions or outages in globally distributed systems
  • Collaborate and mentor across teams—driving architecture reviews, enforcing engineering excellence, promoting system-level optimization practices, and mentoring others in deep debugging, profiling, and performance engineering
  • Fulltime
Read More
Arrow Right

Performance Architect

In this position, you will develop AI Storage Solutions based advanced system ar...
Location
Location
United States , Milpitas
Salary
Salary:
136537.00 - 193442.00 USD / Year
sandisk.com Logo
Sandisk
Expiration Date
April 28, 2026
Flip Icon
Requirements
Requirements
  • Bachelors or Masters or PhD in Computer/Electrical Engineering with 5+ years of relevant experience in Performance Modeling, Simulation, and Analysis using SystemC
  • At least 5+ years of experience with SystemC modeling
  • Good understanding of computer/graphics architecture, ML, LLM
  • Experience of simulation using System C and TLM, behavioral modeling and performance analysis
Job Responsibility
Job Responsibility
  • Build SystemC performance models for AI Storage Solutions based products covering end-to-end from GPU/TPU/NPU/xPU, host interface, memory hierarchy, basedie controller, and AI Storage Solutions using various packaging technolgies
  • Responsible for improving the AI/ML ASIC Architecture performance through hardware & software co-optimization, post-silicon performance analysis, and influencing the strategic product roadmap
  • Workload analysis and characterization of ASIC and competitive datacenter and AI solutions to identify opportunities for performance improvement in our products
  • Collaboration with Architecture team to resolve performance issues and optimize the performance and TCO of their AI Storage Solutions based datacenter technologies
  • Experience modeling one or some components of AI/ML accelerator ASICs such as AI Storage Solutions, PCIe/UCIe/CXL, NoC, DMA, Firmware Interactions, NAND, xPU, fabrics, etc
  • Performance modeling and optimization for multi-trillion parameter LLM training/inference including Dense, Mixture of Experts (MoE) with multiple modalities (text, vision, speech)
  • Model/optimize novel parallelization strategies across tensor, pipeline, context, expert and data parallel dimensions
  • Architect memory-efficient training systems utilizing techniques like structured pruning, quantization (MX formats), continuous batching/chunked prefill, speculative decoding
  • Incorporate and extend SOTA models such as GPT-4, Reasoning models like Deepseek-R1, and multi-modal architectures
  • Collaborate with internal and external stakeholders/ML researchers to disseminate results and iterate at rapid pace
What we offer
What we offer
  • Short-Term Incentive (STI) Plan
  • Long-Term Incentive (LTI) program (restricted stock units (RSUs) or cash equivalents)
  • RSU awards for eligible new hires
  • Paid vacation time
  • Paid sick leave
  • Medical/dental/vision insurance
  • Life, accident and disability insurance
  • Tax-advantaged flexible spending and health savings accounts
  • Employee assistance program
  • Other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity
  • Fulltime
Read More
Arrow Right

Director Software Development

At AMD, we are enabling the next generation of AI innovation by leveraging the p...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in AI/ML software development
  • 5+ years in leadership roles managing AI model enablement or optimization teams
  • Expertise in optimizing real-time AI models for deep learning applications (computer vision, NLP, etc.)
  • Proficiency with AI frameworks (TensorFlow, PyTorch, ONNX Runtime, JAX, Triton) and their optimization for GPU architectures
  • Strong background in optimizing software for AMD GPUs or similar high-performance platforms
  • Familiarity with ROCm is a plus
  • Proven experience with performance optimization, benchmarking, and scaling AI models on GPUs
  • Exceptional ability to collaborate cross-functionally and define long-term strategies for AI/ML innovation
  • Strong verbal and written communication skills, with experience presenting to senior leadership and working with customers and partners
  • Advanced degree (Master’s or PhD) in Computer Science, Electrical Engineering, AI/ML, or related field
Job Responsibility
Job Responsibility
  • Lead and develop teams responsible for AI inference model enablement and optimization
  • Direct efforts to optimize AI frameworks for seamless compatibility and performance on AMD GPUs (Instinct, Navi)
  • Oversee benchmarking, performance tuning, and optimization of AI inference models to improve latency, throughput, and efficiency on AMD hardware
  • Partner with hardware, software, and QA teams to ensure tight integration of AI frameworks with ROCm for maximum performance
  • Drive AI model optimization innovations, enhancing the speed, efficiency, and scalability of AI workloads
  • Lead the vision and strategy for optimizing AI inference on AMD GPUs
  • Collaborate with customers and open-source communities to ensure that AMD’s AI solutions meet industry needs, fostering contributions to MIGraphX, vLLM, and other AMD AI Framework Inference teams
  • Oversee automation frameworks to streamline model integration and performance testing, ensuring scalability across diverse AI workloads
Read More
Arrow Right
New

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right

Senior Devops & AI Engineer

This role presents a unique opportunity to contribute to the future of impactful...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
fissionlabs.com Logo
Fission Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • 6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred)
  • Hands-on experience with operations (DevSecOps) principles and best practices
  • Proficiency in scripting languages such as Python, PowerShell, or Bash
  • Excellent communication and collaboration skills
  • In-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration
  • Hands-on experience with a wide range of AWS and Azure services
  • Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation
  • Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
  • Should have worked AIOps/MLOP
Job Responsibility
Job Responsibility
  • Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration
  • Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness
  • Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services
  • Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes
What we offer
What we offer
  • Opportunity to work on impactful technical challenges with global reach
  • Vast opportunities for self-development, including online university access and knowledge sharing opportunities
  • Sponsored Tech Talks & Hackathons to foster innovation and learning
  • Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more
  • Supportive work environment with forums to explore passions beyond work
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, GPU Infrastructure (HPC)

The internal infrastructure team is responsible for building world-class infrast...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environments
  • Kubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloads
  • Strong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutions
  • Low-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads
  • Research collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challenges
  • Self-directed problem-solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast-paced environment
Job Responsibility
Job Responsibility
  • Build and scale ML-optimized HPC infrastructure: Deploy and manage Kubernetes-based GPU/TPU superclusters across multiple clouds, ensuring high throughput and low-latency performance for AI workloads
  • Optimize for AI/ML training: Collaborate with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, leveraging technologies like RDMA, NCCL, and high-speed interconnects
  • Troubleshoot and resolve complex issues: Proactively identify and resolve infrastructure bottlenecks, performance degradation, and system failures to ensure minimal disruption to AI/ML workflows
  • Enable researchers with self-service tools: Design intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently
  • Drive innovation in ML infrastructure: Work closely with AI researchers to understand emerging needs (e.g., JAX, PyTorch, distributed training) and translate them into robust, scalable infrastructure solutions
  • Champion best practices: Advocate for observability, automation, and infrastructure-as-code (IaC) across the organization, ensuring systems are maintainable and resilient
  • Mentorship and collaboration: Share expertise through code reviews, documentation, and cross-team collaboration, fostering a culture of knowledge transfer and engineering excellence
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Product Application Engineer - Data Center Deployment

This highly technical role supports large-scale datacenter graphics hardware and...
Location
Location
United States , Santa Clara; Austin; Secaucus
Salary
Salary:
160960.00 - 241440.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Datacenter customer support in virtualization-focused environments
  • Virtual environments (VMWare, Citrix, KVM, Microsoft, and others) and virtual machine configuration/management
  • Data storage, protection, deduplication, and storage-related network optimization especially with Weka, DDN, and VAST products
  • Working in or closely with a deployment services organization utilizing tools like Salesforce, JIRA and Confluence
  • Linux installation, configuration, debugging, and performance tuning
  • Debugging, root-cause analysis, and system-level problem solving
  • Site reliability engineering concepts and best practices
  • Server architecture, remote management, network topologies, and compute subsystem operations
  • Datacenter GPU software stacks such as ROCm™ or CUDA
  • High-performance networks for HPC and AI (RDMA/RoCE, InfiniBand)
Job Responsibility
Job Responsibility
  • Design, optimize, and troubleshoot virtualization solutions for high-performance datacenter GPU, CPU, and related platforms
  • Support customers, partners, and internal teams on virtualization topics related to AI and Machine Learning workloads
  • Build and configure datacenter networking environments for customer testing, validation, and deployment
  • Qualify and assess new virtualization capabilities to ensure alignment with customer and product requirements
  • Provide mentorship and technical guidance to junior engineering staff
  • Partner with development teams to identify and resolve hardware/software issues from early bring-up through end-of-life
  • Document and escalate technical issues following established procedures
  • Collaborate with program managers to maintain schedules, track action items, and ensure deliverables are met
  • Provide clear project status updates to internal leadership and customer stakeholders
  • Build a deep understanding of customer goals to ensure impactful technical guidance and solution delivery
  • Fulltime
Read More
Arrow Right