This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking an experienced AI Network Engineer to support and optimize high-performance infrastructure powering AI/ML workloads. This role focuses on designing and maintaining GPU-accelerated environments leveraging NVIDIA technologies, high-throughput networking, and low-latency architectures.
Job Responsibility
Design, implement, and support high-performance networks for AI/ML workloads, including GPU clusters and distributed training environments
Deploy and optimize NVIDIA-based infrastructure (DGX systems, HGX platforms, or GPU clusters)
Configure and manage high-speed networking technologies such as InfiniBand, RoCE, and 100/200/400Gb Ethernet
Optimize network performance for east-west traffic, low latency, and large data throughput required for AI model training
Integrate NVIDIA software stack (CUDA, NCCL, GPU Cloud, AI Enterprise) with networking and compute environments
Troubleshoot performance bottlenecks across network, storage, and GPU interconnects
Collaborate with AI/ML engineers to ensure infrastructure meets training and inference demands
Support automation and infrastructure-as-code initiatives for scalable AI environments
Requirements
5+ years of experience in network engineering or infrastructure engineering
Hands-on experience with high-performance networking (InfiniBand, RDMA, RoCE)
Experience supporting GPU-based or HPC environments
Strong knowledge of data center networking (L2/L3, BGP, EVPN, VXLAN)
Familiarity with Linux systems and performance tuning
Experience with NVIDIA ecosystems (DGX, CUDA, NCCL, or similar)
Ability to diagnose low-latency and high-throughput network issues
Nice to have
Experience with NVIDIA AI Enterprise or DGX SuperPOD environments
Knowledge of AI/ML workflows and distributed training frameworks (PyTorch, TensorFlow)
Familiarity with Kubernetes for AI workloads
Experience with storage solutions supporting AI workloads (parallel file systems, NVMe over Fabrics)
Exposure to cloud-based GPU environments (AWS, Azure, GCP)
What we offer
medical, vision, dental, and life and disability insurance