CrawlJobs Logo

Product Manager - AI Data Center Infrastructure

https://www.hpe.com/ Logo

Hewlett Packard Enterprise

Location Icon

Location:
India , Bangalore

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Product Manager - AI Data Center Infrastructure. We are seeking a Product Line Manager (PLM) for AI Data Center Infrastructure to define and deliver next-generation data center networking platforms for large-scale GPU clusters. This role is ideal for a visionary, hands-on leader who understands how AI workloads stress networks at scale and can translate that insight into clear product requirements and roadmaps.

Job Responsibility:

  • AI Data Center & Fabric Architecture: Define product requirements for AI data center network architectures supporting thousands of GPUs
  • Develop requirements for low-latency Ethernet fabrics using Juniper QFX platforms and Apstra-based automation
  • Enable high-bandwidth GPU and NIC interconnects optimized for large-scale distributed training and inference workloads
  • GPU, NIC & Interconnect Strategy: Lead requirements definition for next-generation GPUs, NICs, and interconnect technologies, staying ahead of industry roadmaps
  • Drive alignment with NVIDIA and AMD ecosystems
  • Ensure interoperability across DAC, AEC, ACC, and optical transceivers between switches and NIC endpoints
  • Define scale-up paths using PCIe, NVLink, NVSwitch, ensuring GPU-to-GPU symmetry, consistency, and bandwidth determinism
  • Switching, Routing & Telemetry: Specify and optimize L2/L3 architectures, including EVPN-VXLAN, Class-E IPv4, and AI-optimized buffer tuning
  • Leverage hardware telemetry, streaming sensors, and analytics for proactive performance assurance
  • Drive automation using Python, Ansible, Apstra, Terraform, and related tools to enforce configuration consistency and compliance
  • Performance Optimization & Troubleshooting: Analyze GPU job performance to identify network hotspots, congestion, packet loss, and microbursts
  • Tune ECN, RDMA/ROCEv2, PFC, and traffic-engineering policies for AI workloads
  • Optimize server-to-switch interactions, including BIOS and firmware alignment, NIC queue and link-training parameters, Cable selection and management (AEC/ACC/optics)
  • Cross-Functional & Ecosystem Collaboration: Partner closely with AI platform teams, GPU system architects, data center operations, and strategic vendors (NVIDIA, AMD, Juniper)
  • Lead and participate in root-cause analysis for Link flaps and training failures, FEC and PCS errors, Thermal or power-related performance degradation
  • Drive lab validation, scale testing, and certification of new optics, NIC firmware, and switch software releases

Requirements:

  • 5–10+ years of experience in data center networking, AI infrastructure, or HPC environments
  • Strong hands-on experience with Juniper QFX platforms and JunOS
  • Deep understanding of GPU architectures: NVIDIA: H100/H200, GB200/GB300, NVLink/NVSwitch AMD: MI300/MI400, Pollara NICs, Infinity Fabric
  • Proven expertise in scale-up GPU interconnects and scale-out Ethernet fabrics
  • Strong knowledge of RDMA/ROCEv2, ECN, PFC, and buffer management
  • Familiarity with distributed AI workloads, collective operations (NCCL, RCCL)
  • Hands-on troubleshooting experience with high-speed optics, AEC cables, link training, and NIC firmware
  • Proficiency in automation and scripting (Python, Ansible, Bash, Terraform)

Nice to have:

  • Certification: JNCIE, CCIE, (NCP-AII), (NCA-AIIO), (NCP-AIO), (NCP-AIN)
  • Experience with Apstra or other intent-based networking platforms
  • Knowledge of 1.6T optics, 200G PAM4 SerDes, and CPO/LPO architectures
  • Experience supporting liquid-cooled GPU clusters and rack-level power/network design
  • Understanding of data center operations, observability, and SLAs for AI training and inference clusters
What we offer:
  • Health & Wellbeing: comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: specific programs catered to helping you reach any career goals
  • Unconditional Inclusion: unconditionally inclusive in the way we work and celebrate individual uniqueness

Additional Information:

Job Posted:
March 04, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Product Manager - AI Data Center Infrastructure

Product Manager, Specialized Compute

Designs, plans, develops and manages a product or portfolio of products througho...
Location
Location
United States , Spring
Salary
Salary:
106000.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent in computer science, engineering or related field of study
  • Proven experience as a Product Manager in the server, data center, or related hardware industry
  • Strong analytical skills with a knack for translating complex market data into clear, actionable business insights
  • Deep understanding of the technical and market dynamics of at least one of the following: AI servers, Edge computing, or Telco infrastructure
  • Exceptional communication and presentation skills, with the ability to articulate a vision and build consensus with both technical and non-technical audiences
  • A proactive and adaptable mindset, with the ability to manage multiple investigations and projects simultaneously in a fast-paced environment
Job Responsibility
Job Responsibility
  • Conduct in-depth market research and competitive analysis to identify emerging trends, customer pain points, and new product opportunities across AI, Edge, and Telco markets
  • Develop and present data-driven business cases for new product initiatives, including market sizing, financial projections, and strategic alignment with company goals
  • Work closely with engineering, sales, and customers to define and prioritize detailed product requirements and user stories, ensuring they are well-documented and understood by all stakeholders
  • Navigate the unique technical and business challenges of each market, adapting your approach to meet the diverse needs of AI, Edge, and Telco customers
  • Act as the primary liaison between technical teams and business units, championing the product vision and ensuring alignment across the organization
What we offer
What we offer
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Specific programs catered to helping you reach career goals
  • Inclusive work environment celebrating individual uniqueness
  • Fulltime
Read More
Arrow Right

Is Data Center Operations Engineer

Bridging Information Technology (IT) and the Mechanical, Electrical, and Plumbin...
Location
Location
United States , New Albany
Salary
Salary:
91731.00 - 114948.00 USD / Year
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s degree
  • Bachelor’s degree and 2 years of data center operations experience
  • Associate’s degree and 6 years of data center operations experience
  • High school diploma / GED and 8 years of data center operations experience
  • Hands-on experience with rack/stack, structured cabling, and IT hardware installation
  • Familiarity with Dell PowerEdge, Nutanix, NetApp, and Cisco platforms
  • Ability to interpret electrical and mechanical drawings (awareness-level competency)
  • Experience using monitoring, alerting, or automation systems (AI-enabled platforms preferred)
  • Solid understanding of IT operations concepts including hardware lifecycle management and disaster recovery
  • Ability to read and update documentation, diagrams, and cable records
Job Responsibility
Job Responsibility
  • Serve as the liaison between IT teams and facilities staff, ensuring flawless communication
  • Interpret electrical one-line diagrams, distribution drawings, and cooling schematics to support incident response and planning
  • Install, rack, cable, and support enterprise IT systems including Dell PowerEdge, Nutanix, NetApp, and Cisco technologies
  • Support day-to-day moves, adds, and changes (MACs) in building IDF and VDER environments
  • Perform fiber and copper patch cabling in data centers, IDFs, and VDER closets
  • Trace and troubleshoot cabling issues to restore connectivity
  • Monitor infrastructure, proactively detect issues, and bring up with urgency to appropriate teams
  • Apply AI-enabled monitoring and automation platforms to enhance data center operations
  • Maintain documentation of infrastructure layouts, procedures, and operational standards
  • Participate in capacity planning, disaster recovery drills, and continuous improvement initiatives
What we offer
What we offer
  • A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts
  • A discretionary annual bonus program, or for field sales representatives, a sales-based incentive plan
  • Stock-based long-term incentives
  • Award-winning time-off plans
  • Flexible work models, including remote and hybrid work arrangements, where possible
  • Fulltime
Read More
Arrow Right

Senior Product Manager for Data Center AIOps

Senior Product Manager for Data Center AIOps. HPE is seeking an experienced Seni...
Location
Location
United States , Sunnyvale
Salary
Salary:
136500.00 - 276500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science / Engineering, or equivalent practical experience in data center networking or cloud infrastructure
  • 6+ years of product management or closely related experience in data center networking, network operations, observability, or AIOps/AI-driven infrastructure products
  • Understanding of data center network architectures (leaf-spine and EVPN-VXLAN), intent-based networking, and operational workflows across Day 0 through Day 2+
  • Demonstrated ability to translate complex technical capabilities into clear customer value, business cases, and prioritized roadmaps in a fast-paced environment
  • Excellent communication and stakeholder management skills with a track record of driving alignment across engineering, sales, marketing, and support
  • Understanding of network operations, SRE, or managing large-scale data center or private cloud environments, including troubleshooting and incident response
  • Experience in building business cases, pricing, and go-to-market strategies for enterprise software or cloud services
Job Responsibility
Job Responsibility
  • Influence the product vision and roadmap for Data Center AIOps across the HPE Networking portfolio, aligning with HPE’s AI-native networking strategy and broader data center portfolio
  • Define requirements and user journeys for capabilities such as cross-domain visibility, application awareness, anomaly detection, root cause analysis, impact analysis, and proactive assurance
  • Collaborate closely with engineering, data science, UX, and architecture teams to deliver cloud-hosted AIOps services integrated with Apstra’s intent-based networking and telemetry
  • Partner with GTM, sales, and marketing to craft positioning, packaging, and licensing for AIOps-driven features, including premium and value-added services for data center customers
  • Engage deeply with customers, partners, and field teams to gather feedback, validate hypotheses, and translate operational pain points into prioritized backlog items for AIOps
  • Define and track success metrics for AI and analytics features (e.g., MTTR reduction, automation adoption, model accuracy, recommendation utilization) and iterate based on real-world outcomes
  • Represent the product in customer briefings, business reviews, industry events, and analyst conversations as the subject matter expert for HPE’s Data Center AIOps strategy
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Manager, Data Center Network Deployment & Support

Meta is hiring a Data Center Network Deployment & Support Manager to lead the de...
Location
Location
United States , Aiken
Salary
Salary:
162000.00 - 227000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in network engineering in large-scale data center network deployment, including hands-on responsibility for planning, building, or migrating production networks
  • 3+ years of direct people management experience leading network engineers responsible for Data Center network deployment and operations
  • Bachelor's degree in Computer Engineering, Math, Physics, Networking, or a related technical field (or equivalent practical experience)
  • Hands-on experience with Data Center switching, routing, and dedicated network topologies (“AI Zones”) for backend GPU communication
  • Proven experience leading and executing network deployment and migration projects
  • A proven track record in partnering with cross-functional teams and directly managing large-scale projects
  • Technical expertise in data center, enterprise, or service provider network infrastructure
  • Communication, stakeholder management, and problem-solving skills
  • Knowledge of data center design and operational best practices
  • Experience with process improvement and systems development, leveraging automation to streamline repetitive workflows
Job Responsibility
Job Responsibility
  • Manage the deployment, configuration, and ongoing support of large-scale network infrastructure across data centers and global regions
  • Act as the escalation point for technical issues, providing hands-on support and guidance to resolve complex problems swiftly
  • Develop and maintain relationships with partner teams, vendors, and managed service providers
  • Collaborate with engineering, data center, backbone, AI, and equipment vendors to integrate new technologies and drive innovation
  • Ensure project timelines for network turn-ups and ensure that milestones are met
  • Distribute the deployment workload among team members based on projections, OKRs, and metrics across skill sets and priorities
  • Ensure that standard operating procedures (MOPs, runbooks) are consistently followed
  • Contribute to the standardization and best practices of design, testing, and implementation of scalable network solutions
  • Lead by example and build team chemistry across regions to ensure alignment with organizational priorities
  • Provide mentorship, coaching, and career development to team members
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Solutions Architect – Campus, DCN Switching & Routing

We are looking for a seasoned TME/Networking Solutions Architect with deep exper...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge and hands-on experience in networking protocols: BGP, OSPF, EVPN, VXLAN, MCLAG, DRNI, ISSU, MACSec, DCI
  • Experience in Day 0 to Day 1 deployment of spine-leaf fabrics with any SDN controllers, micro segmentation, and service chaining
  • Working knowledge of automation and orchestration tools used in data center deployments
  • Familiarity with SDN controller architecture and integration with third-party services
  • Proven ability to engage with both technical and business stakeholders to design and defend high-impact networking solutions
  • Strong competitive knowledge of other vendor offerings — including campus solutions, 400G/800G switching platforms, and transceivers such as but not limited to QSFP-DD and OSFP
  • Excellent written and verbal communication skills
  • ability to create compelling documentation and technical collateral
Job Responsibility
Job Responsibility
  • Serve as a trusted technical advisor for customers across AI data centers, enterprise campus networks, and service provider environments — identifying technical requirements, resolving pain points, and showcasing HPE’s end-to-end networking capabilities
  • Architect and support AI-ready Ethernet data center deployments using leaf-spine topologies, EVPN-VXLAN overlays, and RoCEv2 fabrics optimized for GPU-based workloads
  • Lead and participate in customer-facing workshops, whiteboard sessions, and technical deep dives across campus switching, data center fabrics, and edge routing solutions
  • Conduct Proof of Concepts (PoCs) and hands-on validations to assess performance, scale, Day-0 automation, telemetry, and orchestration tools in both data center and campus environments
  • Create and maintain design guidelines, infrastructure blueprints, and best practices for performance-optimized and scalable networking deployments across AI DC, enterprise, and routers use cases
  • Collaborate with pre-sales and go-to-market teams to drive solution adoption and ensure alignment with customer needs and competitive differentiators
  • Contribute to RFP/RFI responses, creating comprehensive solution documentation including Bill of Materials (BoM), redundancy and topology planning
  • Work closely with product management and engineering, providing real-world field feedback to enhance product roadmaps around automation, telemetry, security, and feature development
  • Represent HPE at industry events, AI summits, and technology forums, highlighting the value of HPE’s networking portfolio in comparison to competitors
  • Stay ahead of the curve by tracking emerging trends, analysing the competitive landscape, and influencing internal strategies for next-gen network innovation
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Technical Program Manager

The CO+I AI Delivery team is focused on delivering various platform services to ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 6+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • 3+ years of experience managing cross-functional and/or cross-team projects
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Proven experience leading complex, cross‑team technical programs with significant infrastructure or platform components
  • Strong technical foundation in one or more of the following: Cloud infrastructure and distributed systems, Large‑scale datacentre delivery projects, Hardware‑software integrations (compute, networking, storage, power, cooling)
  • Demonstrated ability to manage execution in ambiguous, fast‑moving environments
  • Excellent written and verbal communication skills, with experience presenting to senior leadership
  • Experience delivering or scaling AI, HPC, or GPU‑based platforms in production environments
  • Familiarity with data center operations, hardware lifecycle management, or global deployment programs
Job Responsibility
Job Responsibility
  • Program Ownership & Execution: Own end‑to‑end technical programs focused on accelerating AI deployment timelines
  • Drive execution across multiple parallel workstreams
  • Establish clear success metrics and mechanisms
  • Document appropriately all artifacts
  • Cross‑Functional Leadership: Partner deeply with hardware engineering, software engineering, infrastructure, networking, data center operations, and supply chain teams
  • Act as the central point of coordination
  • Influence decision‑making with data, technical insight, and strong executive communication
  • Technical Rigor: Develop deep working knowledge of AI deployment architectures
  • Identify technical risks early and drive mitigation strategies
  • Translate complex technical concepts into clear, actionable plans
  • Fulltime
Read More
Arrow Right

Solutions Architect

TME/Solutions Architect – DCN Switching & Solution role at Hewlett Packard Enter...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge and hands-on experience in networking protocols: BGP, OSPF, EVPN, VXLAN, MCLAG, DRNI, ISSU, MACSec, DCI, MPLS and SDN based solutions
  • Experience in Day 0 to Day 1 deployment of spine-leaf fabrics with any SDN controllers, micro segmentation, and service chaining
  • Working knowledge of automation and orchestration tools used in data center deployments
  • Familiarity with SDN controller architecture and integration with third-party services
  • Proven ability to engage with both technical and business stakeholders to design and defend high-impact networking solutions
  • Strong competitive knowledge of other vendor offerings including 100G/400G/800G switching platforms, transceivers and cables
  • Excellent written and verbal communication skills in English
  • Good presentation and event management skills
Job Responsibility
Job Responsibility
  • Serve as a trusted technical advisor for customers across AI data centers, and service provider and enterprise environments
  • Architect and support AI-ready Ethernet data center deployments using leaf-spine topologies, EVPN-VXLAN overlays, and RoCEv2 fabrics optimized for GPU-based workloads
  • Lead and participate in customer-facing workshops, whiteboard sessions, and technical deep dives across campus switching, data center fabrics, and edge routing solutions
  • Conduct Proof of Concepts (PoCs) and hands-on validations to assess performance, scale, Day-0 automation, telemetry, and orchestration tools
  • Create and maintain design guidelines, infrastructure blueprints, and best practices for performance-optimized and scalable networking deployments
  • Collaborate with pre-sales and go-to-market teams to drive solution adoption and ensure alignment with customer needs
  • Contribute to RFP/RFI responses, creating comprehensive solution documentation including Bill of Materials (BoM), redundancy and topology planning
  • Work closely with product management and engineering, providing real-world field feedback to enhance product roadmaps and feature development
  • Represent HPE at industry events, AI summits, and technology forums
  • Stay ahead of the curve by tracking emerging trends, analysing the competitive landscape, and influencing internal strategies for next-gen network innovation
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Director, Data Center Strategy and Site Selection

As Together AI's first Data Center Strategy hire, you'll own how and where we sc...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 275000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in data center strategy, site selection, or infrastructure planning at a hyperscaler, AI infra company, or large colocation provider
  • Strong technical grasp of DC fundamentals: power architecture, cooling (including liquid), redundancy schemes, and rack density
  • Experience leading large complex multi-party negotiations, structuring colo/power contracts, and managing RFP processes
  • Knowledge of standard data center, power, and real estate contractual and legal frameworks
  • Financial fluency: TCO modeling, lease vs. own analysis, and cost drivers across power and colocation
  • Clear communicator: able to translate technical trade-offs into business decisions for non-technical stakeholders
Job Responsibility
Job Responsibility
  • Develop Together's global data center strategy. Define where we invest and lease data center space and who we partner with. Balance cost, risk, and speed across regions to support product and customer demand
  • Own site selection and vendor relationships: evaluate new locations, negotiate large colocation and power contracts, manage vendor performance on cost, delivery, and SLAs
  • Lead technical site diligence process: assess power, cooling, redundancy, grid, and expansion capacity. You'll partner with engineers and consultants but are expected to engage technically, not just be a “project manager”
  • Negotiate and interface with executive and senior level management
  • Drive high-impact commercial and strategic transactions by engaging with supplier and internal executive leadership
  • Support sovereign and large-scale customer deals with custom infrastructure scoping
  • Collaborate cross-functionally with Infra, Finance, Product, and GTM to tie site investments to demand and product roadmap
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • flexibility in terms of remote work
  • Fulltime
Read More
Arrow Right