AI Performance Engineer Job at Devire (Warszawa)

Job Description

As a member of the Computing Product Line, Heterogeneous Memory Software Lab, you will engage in the research, design, and implementation of software components that enable tiered memory usage within end-to-end solutions. Your work will focus on advancing the AI ecosystem by designing and implementing libraries that leverage the capabilities of new Ascend AI hardware. You will contribute to extending compiler infrastructure to support Triton on hardware, enabling high- performance kernel generation for AI workloads. You will also research new memory management techniques to optimize performance and efficiency. Collaborating closely with research teams across the company, you will drive innovation behind cutting-edge AI solutions.

Job Responsibility

Lead performance optimization of AI models on Ascend NPUs, including performance analysis, bottleneck identification, and optimization implementation for both training and inference workloads
Analyze performance bottlenecks of multimodal models and large language models (LLMs) on the Ascend platform, covering operators, kernels, memory access patterns, and scheduling
design and implement optimization strategies
Develop and optimize critical operators/kernels, continuously improving execution efficiency, memory access patterns, parallelization strategies, and hardware resource utilization
Research and apply advanced techniques such as auto-tuning, operator fusion, graph optimization, and scheduling optimization in real-world production scenarios
Build and lead an NPU performance optimization team
communicate findings to cross-functional teams and leadership, and contribute to the evolution of next-generation Ascend NPU architecture

Requirements

Deep understanding of GPU or NPU architecture, including execution units, memory hierarchy, interconnects, and thread scheduling, as well as performance bottleneck analysis methodologies
Familiarity with mainstream deep learning frameworks such as PyTorch, TensorFlow, or JAX
Hands-on experience in deep learning operator/kernel development and performance tuning, with the ability to implement and optimize complex operators
Proficiency with performance analysis and profiling tools (e.g., Nsight Compute, nvprof, torch.profiler), and ability to conduct quantitative analysis and performance modeling
Strong system design and software engineering skills, with the ability to balance performance, maintainability, and generality in complex systems
Master’s or Ph.D. degree in Computer Architecture, Compiler Design, High Performance Computing, or a related field

What we offer

Private healthcare package
Sport Cards
Benefit Platform
Special discounts for employees
Office massages
annual bonus

Devire - All Job Offers

Select Country

AI Performance Engineer

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

AI Performance Engineer

AI Engineering Manager - Internal AI Agent

Senior Data Engineer – Data Engineering & AI Platforms

Sr AI/HPC Applications and Performance Engineer

Manager, Software Engineering - Expressive AI

Senior Platform Engineer - CI/CD & AI Automation (AI-first)

Software Engineer, AI Product

AI Engineer

AI Engineer

Our AI answers in your language