CrawlJobs Logo
Briefcase Icon
Category Icon

Filters

×

Senior Manager, Performance AI/ML Network Deployment Engineering Jobs

1 Job Offers

Filters
Senior Manager, Performance AI/ML Network Deployment Engineering
Save Icon
Lead the deployment and optimization of advanced AI/ML data center infrastructures in Santa Clara. This senior role requires expertise in large-scale network architecture, performance tuning, and hands-on systems triage for global cloud providers. You will be the key technical interface, driving ...
Location Icon
Location
United States , Santa Clara
Salary Icon
Salary
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice

About the Senior Manager, Performance AI/ML Network Deployment Engineering role

Pursue leadership jobs at the forefront of technological innovation as a Senior Manager in Performance AI/ML Network Deployment Engineering. This senior-level role sits at the critical intersection of artificial intelligence, high-performance computing, and large-scale network infrastructure. Professionals in this field are responsible for the end-to-end lifecycle of deploying and optimizing the complex, GPU-driven fabrics that power modern AI and machine learning workloads. They bridge the gap between strategic customer ambitions and internal engineering execution, ensuring that groundbreaking AI infrastructure transitions seamlessly from design to full-scale, production-ready deployment.

The core mission of this profession is to architect, deploy, and relentlessly optimize the performance of massive AI/ML clusters. This involves deep collaboration with customers to translate their computational needs into scalable, robust, and efficient network designs encompassing compute, storage, and interconnect technologies. A typical day involves leading technical engagements, driving system-level triage and debugging of intricate cross-stack issues spanning hardware, firmware, and software, and developing methodologies to accelerate the rollout of new platforms. These leaders are champions of reliability and performance, enhancing tools and processes to meet stringent uptime and throughput goals for mission-critical AI training and inference environments.

Common responsibilities for these high-impact jobs include providing technical leadership during proof-of-concept evaluations and early field trials, influencing product roadmaps based on hands-on deployment experience, and developing training for both internal teams and customer audiences. They own the resolution of complex performance bottlenecks and are adept at conducting at-scale debug to ensure system stability. A significant part of the role is to distill lessons learned from deployments to drive continuous improvement across the organization and its technology stack.

Typical skills and requirements for this profession are extensive. Candidates must possess expert-level knowledge in network architecture for AI/ML, including deep hands-on experience with technologies like RoCEv2, lossless fabrics, and overlay protocols. A solid foundation in at least one of the three core domains—compute, networking, or storage—is essential, with a strong preference for cross-disciplinary understanding. Proven leadership in engaging with large-scale cloud service providers or enterprise customers is a must, requiring exceptional communication skills to interface with everyone from engineers to C-level executives. A disciplined approach to project management, with a track record of delivering large infrastructure projects on time, is critical. A bachelor’s or master’s degree in computer science or engineering, along with relevant industry certifications in networking or cloud technologies, is typically required. These are senior roles demanding substantial experience, strategic vision, and the ability to thrive in a globally dispersed, fast-paced team environment dedicated to shaping the future of AI infrastructure.

Filters

×
Countries
Category
Location
Work Mode
Salary