CrawlJobs Logo

Senior AI Network Architect

United States, Redmond 119800.00 - 234700.00 USD / Year · Job Posted March 19, 2026
Apply Position
Job Link Share

Job Description

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate engineers to help achieve that mission. As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Cloud Hardware Systems Engineering (CHSE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure. We are looking for a Senior AI Network Architect to join the team.

Job Responsibility

  • Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms, with a focus on ultra-high bandwidth, low-latency backend networks
  • Drive system-level integration across compute, storage, and interconnect domains to support scalable AI training workloads
  • Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals
  • Influence platform decisions across rack, chassis, and pod-level implementations
  • Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co-develop differentiated solutions
  • Represent Microsoft in joint architecture forums and technical workshops
  • Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains
  • Frame decisions in terms of TCO, performance, scalability, and deployment risk
  • Lead design reviews and contribute to PRDs and system specifications
  • Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies

Requirements

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience
  • OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 3+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
  • Proven expertise in system architecture across compute, networking, and accelerator domains
  • Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
  • Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
  • Familiarity with signal integrity modeling, link training, and physical layer optimization
  • Experience architecting backend networks for AI training and Inference workloads, including Hamiltonian cycle traffic and collective operations (e.g., all-reduce, all-gather)
  • Hands-on design of high-radix switches (≥400Gbps per port), orthogonal chassis, and cabled backplanes
  • Knowledge of chip-to-chip and chip-to-module interfaces, including error correction and equalization techniques
  • Experience with custom NIC IPs and transport layers for secure, reliable packet delivery
  • Familiarity with AI model execution pipelines and their impact on pod-level network design and latency SLAs
  • Prior contributions to hyperscale deployments or cloud-scale AI infrastructure programs

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior AI Network Architect

8 matching positions

Cloud Solution Architect and Senior Cloud Solution Architect - Data and AI

We are looking for Cloud Solution Architect (CSA) and Senior Cloud Solution Arch...
Location
Location
United States , Multiple Locations
Salary
Salary:
85100.00 - 169800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Information Technology, Engineering, Business, Liberal Arts, or related field AND 2+ years experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting OR equivalent experience
  • Authorization to work in the United States that does not now or in the future require employer sponsorship
  • 1+ years of experience with AI/ML and/or Generative AI technology
  • 2+ years of customer facing experience providing recommendations to, or collaborating with, mid-to-senior level executives to address and advance technology transformation initiatives, entablements, and outcomes, including Data and AI solutions
  • 2+ years of enterprise experience in ANY of the following: Microsoft Fabric, Azure Databricks, Microsoft Purview, Azure SQL, PostgreSQL, MySQL, and Cosmos DB
Job Responsibility
Job Responsibility
  • Seek to understand customers’ overall data estate Business and IT priorities and success measures to design Data & Analytics solutions that drive business value and drive positive Customer Satisfaction & become a trusted advisor
  • Ensure that solution exhibits high levels of performance, security, scalability, maintainability, repeatability, appropriate reusability, and reliability upon deployment and provide feedback and insights from customers/partners
  • Develop opportunities to drive Customer Success business results & help Customers get value from their Microsoft investments and identify resolutions to Customer blockers by leveraging SA subject matter expertise
  • Deliver according to MS best practices & using repeatable Intellectual Property (IP)
  • Apply technical knowledge to architect and design solutions that meet business and IT needs, create AI roadmaps, drive Proof of Concepts (POC) and Minimal Viable Product (MVP), and ensure long term technical viability of new deployments, infusing key AI technologies where appropriate
  • Be the Voice of Customer to share insights and best practices, connect with Engineering team to remove key blockers and drive product improvements
  • Maintain technical skills and knowledge, keep up to date with market trends and competitive insights
  • collaborate and share with the AI technical community while educating customers on Azure platform
  • Accelerate customer outcomes - Share expertise, contribute to IP creation & re-use to accelerate customer outcomes and obtain relevant accreditations and certifications
  • Fulltime
Read More
Arrow Right

Senior Network Security Architect - VOIS

We are seeking an experienced network security professional to design, implement...
Location
Location
India , Pune
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10 to 12 years of relevant industry experience, including significant exposure in senior or lead roles
  • Proficient in firewall technologies such as Palo Alto Networks (Panorama), Fortinet (FortiManager), Juniper SRX, Check Point, or Cisco Firepower
  • Knowledgeable in designing and deploying proxy solutions (Open Proxy, Bluecoat, FortiGate), VPN concentrators (SSL and site-to-site IPsec), and load balancing solutions (F5 or cloud load balancers)
  • Strong foundation in TCP/IP networking, routing and switching protocols (BGP, OSPF), VPN technologies, and intrusion prevention systems
  • Experienced in cloud security implementations, particularly virtual firewalls within Microsoft Azure environments
  • Familiar with ITIL practices and disciplined change management processes
  • A confident communicator who can explain complex security concepts clearly to technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Design and deploy enterprise-grade firewall architectures, including Next-Generation Firewalls (NGFW) and cloud-based security solutions in Azure
  • Govern, audit, and optimise large and complex firewall rule bases, VPN policies, and NAT configurations to balance security, performance, and resilience
  • Serve as the escalation point for complex security incidents and major network outages, leading troubleshooting and root cause analysis
  • Ensure firewall and perimeter security environments align with industry standards such as ISO 27001 and PCI-DSS, as well as client-specific security policies
  • Automate security operations and remediation activities using scripting and infrastructure-as-code approaches such as Python, Ansible, and Terraform
  • Contribute to the adoption of AIOps and agentic AI solutions within the network security landscape
  • Collaborate with internal teams and external vendors to deliver secure, scalable, and reliable network solutions
What we offer
What we offer
  • Opportunities to work on large-scale, global network security environments within a leading telecommunications organisation
  • Exposure to modern security architectures, cloud technologies, and automation-driven operations
  • A collaborative environment that values technical expertise, continuous improvement, and knowledge sharing
  • The ability to influence security strategy and contribute to critical transformation initiatives across Vodafone
  • Fulltime
Read More
Arrow Right

Senior Ai Solution Architect, Specialist Sa

Amazon Web Services (AWS) is leading the next phase of AI adoption and is seekin...
Location
Location
Canada , Toronto
Salary
Salary:
146000.00 - 211600.00 CAD / Year
mygwork.com Logo
myGwork - LGBTQ+ Business Community
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of in design/implementation/operations/consulting with distributed applications experience
  • 5+ years of management of technical, enterprise customer facing resources or equivalent experience
  • 5+ years of working with or evaluating AI systems experience
  • Experience implementing AI solutions that can include integration of LLMs/multi-modal FMs in large scale systems, fine-tuning LLMs, deployment and distributed inference of LLMs, RAG, FM evaluation, Vector DBs, Agentic workflows, prompt/context engineering, and MLOps
  • Hands-on experience with AWS ecosystems (including Bedrock, AgentCore, and SageMaker) to set up secure, private-network AI environments, and practical experience implementing Retrieval-Augmented Generation using embeddings, vector stores, and semantic search optimization
Job Responsibility
Job Responsibility
  • Build technical relationships with customers of all sizes and operate as their trusted advisor, ensuring they get the most out of the cloud at every stage of their journey while adopting GenAI/ML and Agentic technologies across their organisation
  • Manage the overall technical relationship between AWS and our customers, making recommendations on security, cost, performance, reliability and operational efficiency to accelerate their challenging GenAI/ML and Agentic projects
  • Be the voice of the customer, sharing their needs with regard to their usage of our services impacting the roadmap of AWS GenAI/ML and Agentic features
  • Link technology to tangible solutions, with the opportunity to define cloud-native GenAI/ML and Agentic architectural patterns for a variety of use cases
  • Participate in the creation and sharing of best practices, technical content and new reference architectures (e.g. white papers, code samples, blog posts) and evangelize and educate about running GenAI/ML and Agentic workloads on AWS technology (e.g. through workshops, user groups, meetups, public speaking, online videos or conferences)
  • Lead hands-on deep dives and technical workshops, contributing reusable code, reference architectures, and internal technical assets for the broader engineering organization
What we offer
What we offer
  • health insurance (medical, dental, vision, prescription, basic life & AD&D insurance)
  • Registered Retirement Savings Plan (RRSP)
  • Deferred Profit Sharing Plan (DPSP)
  • paid time off
  • other resources to improve health and well-being
  • Fulltime
Read More
Arrow Right

AWS Senior Architect with AI Ops Architecture

We are looking for a highly skilled Enterprise Cloud and AI Ops Architect to joi...
Location
Location
United States , Denver
Salary
Salary:
70.00 USD / Hour
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with AWS services
  • Working knowledge of AI/GenAI for autonomous networks
  • Proficiency in AWS AI (SageMaker, Bedrock, Comprehend)
  • Strong knowledge of open-source MLOps technologies (embeddings, fault detection models), LangGraph, vLLM
  • Hands-on experience with LangChain, Lang Fuse, Llama 3.2 LLM, and RAG architectures
  • Experience in Agentic AI implementation and MCP, A2A
  • Minimum 8+ years in enterprise cloud architecture, with at least 3+ years in AI Ops and GenAI solutions
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
Job Responsibility
Job Responsibility
  • Architect AI Ops Solutions: Design and implement AI-driven operational frameworks for predictive analytics, anomaly detection, and automated remediation
  • Cloud Architecture Leadership: Build enterprise-grade solutions leveraging AWS services
  • Open-Source AI Frameworks: Implement MLOps pipelines using embeddings, fault detection models, LangGraph, and vLLM
  • Advanced AI Development: Hands-on experience with LangChain, LangFuse, Llama 3.2 LLM, and RAG-based architectures
  • Agentic AI & MCP: Drive implementation of Agentic AI systems and Model Context Protocol (MCP), A2A for intelligent orchestration
  • Collaboration: Work closely with cross-functional teams including Cloud Engineering, Data Science, and DevOps to deliver end-to-end AI Ops solutions
Read More
Arrow Right

Senior Product Architect – AI Data Center & SONiC Networking

Senior Product Architect – AI Data Center & SONiC Networking. This role has been...
Location
Location
United States , San Jose
Salary
Salary:
172000.00 - 349000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10 plus years of experience in data center networking, AI infrastructure, or high-performance systems
  • Deep expertise in: SONiC architecture and internals
  • Large-scale Ethernet fabrics
  • High-speed SerDes (112G/224G PAM4) and their impact on system performance
  • Strong understanding of ASIC pipelines, buffering, ECMP behavior, and congestion mechanisms
  • Proven ability to diagnose cross-layer performance and reliability issues involving software, hardware, and physical-layer interactions
  • Hands-on experience with RDMA/RoCE, congestion control, and lossless Ethernet at scale
  • Experience with automation and tooling (Python, Ansible, Terraform) in large-scale environments
  • Industry certifications (e.g., CCIE, JNCIE, NVIDIA) or equivalent practical experience preferred
Job Responsibility
Job Responsibility
  • Architect ultra-low-latency, lossless Ethernet fabrics supporting tens of thousands of GPUs for AI training and inference
  • Own the end-to-end SONiC platform architecture and fabric strategy, spanning control plane, management plane, data-plane integration, and operations at scale
  • Define multi-generation fabric and platform strategy across switch ASICs, NICs, SerDes capabilities, cabling, and system constraints, aligned to power, performance, and deployment realities
  • Own link-level and physical-layer requirements as they impact SONiC performance, including high-speed PAM4 signaling (112G/224G), error handling, and hardware/software interaction
  • Align SONiC architectures with next-generation GPU, NIC, and switch platforms, ensuring optimal performance across hardware and software boundaries
  • Define SONiC capabilities for AI and HPC workloads, including: Lossless Ethernet and RoCE
  • Congestion management, QoS, and ECN
  • Dynamic and flow-based load balancing
  • Drive scale, performance, and resiliency targets for SONiC-based fabrics, including fast convergence, hitless upgrades, and failure recovery
  • Define and enforce system-level validation criteria, including scale testing, fault injection, performance benchmarking, and upgrade scenarios
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Solution Architect AI & HPC

AI is a high-growth market for HPE, and we believe we are uniquely suited to bri...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Engineering, Computer Science, or similar quantitative focus preferred
  • Ability to quickly prototype functionality into scripts for demos, integrations, troubleshooting, etc.
  • Expertise in cloud architectures, specifically with public cloud platforms such as AWS, Azure, or Google Cloud
  • Strong understanding of AI technologies, including machine learning, deep learning, and neural networks
  • Experience participating in solution configurations and the creation of PoCs to meet customer requirements
  • Solid knowledge of infrastructure components, including servers, storage, networking, and virtualization
  • Experience with high-performance computing (HPC) and GPU-accelerated systems is advantageous
  • Demonstrates expert technical skills in assigned area of specialization
  • Expert knowledge of the company offerings, strategic initiatives, current trends, competitor products and strategies within area of responsibility
  • Expert level written and verbal communication skills and mastery over English and local language
Job Responsibility
Job Responsibility
  • Collaborate with sales teams to understand customer requirements and develop tailored solutions for their AI infrastructure needs
  • Engage in pre-sales activities, including technical presentations, demonstrations, and proof-of-concepts
  • Act as a trusted advisor to customers, addressing their questions, concerns, and technical challenges effectively
  • Stay up-to-date with the latest advancements in AI technologies, cloud architectures, and infrastructure trends
  • Lead Proof-of-Concepts (PoC) for HPE customers expanding into Deep Learning or Machine Learning use cases
  • Architect reusable end-to-end AI solutions for HPE customers and prospects
  • Lead technical discussions with customers and partners to propose HPE and partner Integrated solutions
  • Identify solutions, define action plans, and help coordinate and deliver optimal solutions and enhancements
  • Recommend configurations and settings for different types of hardware and interconnect fabrics
  • Assist in any product or technical issue towards an initial sale or renewal of a customer
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Senior Principal AI Interconnect Architect

An AI Interconnect Architect defines and engineers high-speed networking and com...
Location
Location
United States , Milpitas
Salary
Salary:
194425.00 - 322092.00 USD / Year
sandisk.com Logo
Sandisk
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's or Ph.D. in Electrical Engineering, Computer Engineering, or Computer Science
  • 10 - 15 years experience developing interconnect technologies including transport and link level protocols, switching fabrics, QoS and reliable communication methods, and Software Defined Networking
  • Familiarity with various fabric topologies such as Fat tree, Leaf-Spine (Clos), Torus, Meshed and their applicability to various workload and system configurations
  • Familiarity with GPU/accelerator clusters and data center infrastructure
  • Deep, working knowledge of various interconnect technologies and protocols such as PCIe, CXL, NVLink, UALink, Ethernet, Ultra-Ethernet, and serial links
  • Ability to develop performance models
Job Responsibility
Job Responsibility
  • Develop architectures for chip-to-chip interconnects and switched fabrics tailored for AI/ML scale-out
  • Analyze trade-offs in bandwidth, latency, power, area, and reliability
  • Participate in industry standard bodies and contribute/influence/shape the direction of industry specifications
  • Work with SoC, package design, and software teams to ensure seamless integration
What we offer
What we offer
  • paid vacation time
  • paid sick leave
  • medical/dental/vision insurance
  • life, accident and disability insurance
  • tax-advantaged flexible spending and health savings accounts
  • employee assistance program
  • other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity
  • tuition reimbursement
  • transit
  • the Applause Program
  • Fulltime
Read More
Arrow Right

Senior Principal AI Infrastructure Architect

The Senior Principal AI Infrastructure Architect is a highly skilled and advance...
Location
Location
Italy , Milano
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
  • Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
  • Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
  • Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
  • Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
  • Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
  • Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
  • Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
  • Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
  • Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Job Responsibility
Job Responsibility
  • Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
  • Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
  • Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
  • Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
  • Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
  • Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
  • Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
  • Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
  • Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
  • Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps
  • Fulltime
Read More
Arrow Right