CrawlJobs Logo

Principal AI Network Architect

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration, with business insight and strategy? Join the Systems Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. We are seeking a passionate Principal AI Network Architect to join the AI systems architecture team. The role includes network architecture evaluation, design and optimization for next-gen AI systems. Your work will have a direct influence on Azure product roadmaps.

Job Responsibility:

  • Leadership: Spearhead architecture definition and evaluation of AI accelerator platforms, with a focus on high bandwidth, low latency networks. Drive end to end optimization of the stack from hardware, the software kernels
  • Cross functional collaboration: Partner with silicon and platform design teams to co-design infrastructure that meets performance, reliability and deployment goals. Frame decisions in terms of TCO, performance, flexibility, scalability
  • Prototyping: You will be working with state of art networking lab to prototype new network architectures
  • Industry influence: Participate in industry consortiums to shape standards, and influence vendor roadmaps

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice to have:

  • Master’s or Doctoral degree in Electrical Engineering, Computer Engineering, or related fields and 10+ years of technical experience in the domain
  • Deep expertise with ethernet networking, RDMA (RoCE, Infiniband), congestion control, and layer 2/3 switching
  • Experience architecting scale-out/backend network for AI GPU clusters
  • Familiarity with scale-up networks such as NVLinks, UALink
  • Experience with high radix ethernet switches
  • Familiarity with AI model execution pipelines, being able to analyze communication flows and its impact on model performance
  • Prior contributions in standards committee and experience on hyperscale network deployments would be an added benefit
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
  • Ability to manage through ambiguity, bringing clarity and results orientation to engage and energize collaborators and stakeholders
  • Collaboration skills, teamwork, and sense of presumed responsibility
  • Verbal and written communication skills, and ability to articulate and engage with both technical and non-technical stakeholders at all levels
  • Experience leading and driving complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines
  • Intellectual curiosity and passion about learning and deploying new technologies
  • Problem-solving skills, analytical capabilities, and attention to details

Additional Information:

Job Posted:
March 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal AI Network Architect

Principal AI Network Architect

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience
  • OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 5+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
  • Proven expertise in system architecture across compute, networking, and accelerator domains
  • Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
  • Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
  • Familiarity with signal integrity modeling, link training, and physical layer optimization
Job Responsibility
Job Responsibility
  • Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms, with a focus on ultra-high bandwidth, low-latency backend networks
  • Drive system-level integration across compute, storage, and interconnect domains to support scalable AI training workloads
  • Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals
  • Influence platform decisions across rack, chassis, and pod-level implementations
  • Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co-develop differentiated solutions
  • Represent Microsoft in joint architecture forums and technical workshops
  • Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains
  • Frame decisions in terms of TCO, performance, scalability, and deployment risk
  • Lead design reviews and contribute to PRDs and system specifications
  • Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies
  • Fulltime
Read More
Arrow Right

Principal AI Network Architect

Do you want to be at the forefront of innovating the latest hardware designs to ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Master’s or Doctoral degree in Electrical Engineering, Computer Engineering, or related fields and 10+ years of technical experience in the domain
  • Deep expertise with ethernet networking, RDMA (RoCE, Infiniband), congestion control, and layer 2/3 switching
  • Experience architecting scale-out/backend network for AI GPU clusters
  • Familiarity with scale-up networks such as NVLinks, UALink
  • Experience with high radix ethernet switches
  • Familiarity with AI model execution pipelines, being able to analyze communication flows and its impact on model performance
  • Prior contributions in standards committee and experience on hyperscale network deployments would be an added benefit
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
Job Responsibility
Job Responsibility
  • Leadership: Spearhead architecture definition and evaluation of AI accelerator platforms, with a focus on high bandwidth, low latency networks. Drive end to end optimization of the stack from hardware, the software kernels
  • Cross functional collaboration: Partner with silicon and platform design teams to co-design infrastructure that meets performance, reliability and deployment goals. Frame decisions in terms of TCO, performance, flexibility, scalability
  • Prototyping: You will be working with state of art networking lab to prototype new network architectures
  • Industry influence: Participate in industry consortiums to shape standards, and influence vendor roadmaps
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Sr Principal Presales, Systems Engineer - Cloud & AI Networking

Sr Principal Presales, Systems Engineer - Cloud & AI Networking. This role has b...
Location
Location
United States
Salary
Salary:
194500.00 - 456500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in engineering or related field required, advanced degree preferred
  • 15+ years of technical experience in network infrastructure Architecture, design and solutions consulting, designing large-scale AI, Cloud Data Center in Hyperscale networking environments
  • 15+ years of industry experience applying technical expertise with JunOS, EOS, IOS and hyperscale network architecture frameworks
  • JNCIE or equivalent certification
  • Deep hands-on expertise with hyperscale routing, MPLS traffic engineering, switching, SDN overlays (EVPN/VXLAN, MP-BGP), Network Fabrics and DCI solutions, network provisioning, automation and monitoring (Apstra, Paragon, NETCONF/REST APIs, YANG)
  • Familiarity with major networking silicon, hardware platforms and related software (e.g., Juniper, Arista, Cisco, SONiC, Cumulus etc.)
  • SME in Virtualization technologies on x86 and scaling using technologies such DPDK, SR-IOV, SmartNICs etc.
  • Proven success in complex pre-sales roles, with a strong ability to build relationships with senior technical and C-Level stakeholders
  • Experience with programming/scripting (Python, APIs, JSON etc.) to enable solution integration and automation
  • Strong communication, presentation, and interpersonal skills with the ability to influence diverse technical and business audiences
Job Responsibility
Job Responsibility
  • Act as a primary technical architect early in complex sales cycles, working independently or with Sales Engineers to translate business needs into scalable technical solutions
  • Architect end-to-end networking infrastructure solutions with strong expertise in data center, AI networking, hyperscale environments, and WAN technologies
  • Lead solution design and customization, orchestrating input from specialists, systems engineers, and product teams to meet customer requirements
  • Deliver compelling demos, proof-of-concepts, and technical presentations that clearly articulate HPE Networking’s value in customer use cases
  • Engage with hyperscalers, large enterprises, service providers and technology partners to co-design solutions addressing emerging challenges across industries and workloads
  • Influence technical strategy throughout deal validation, solutioning, and initial execution, ensuring smooth handoff to delivery teams
  • Develop best practices, enablement collateral, and architecture playbooks to scale solutions internally and with partners
  • Mentor and provide technical leadership to junior technologists and sales engineering teams
  • Navigate and adapt through complex technology transitions, maintaining thought leadership in networking innovations and industry trends
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Principal Machine Learning Engineer

Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin and Python)
  • Understanding of Machine Learning project lifecycle/tools along with prompt engineering
  • Experience in architecting and implementing high-performance RESTful microservices
  • Experience building and operating large scale distributed systems using Amazon Web Services (S3, Kinesis, Cloud Formation, EKS, AWS Security and Networking)
  • Experience with leveraging LLMs effectively and optimizing model usage on GPUs
  • Experience with Databricks or Apache Spark
  • Experience with Continuous Delivery and Continuous Integration
Job Responsibility
Job Responsibility
  • Regularly tackle the largest and most complex problems in the team, from technical design to launch
  • Work closely with Product, Engineering and Design leads in Jira AI, and translate their requirements into solid engineering deliverables, delegating work to the teams
  • Deliver solutions that are used by other teams and products
  • Follow a Product Engineer mindset by building features that are data-driven and customer-centric, fostering that culture within the Jira AI group
  • Exceptional problem solving ability using ML, AI and core software engineering
  • Routinely tackle complex architecture challenges and define architectural standards
  • Actively contribute to the code delivery through leading code reviews & documentation, direct contribution and fixing complex bugs in high-risk surface areas
  • Expertise in data analysis, statistical methods, and logical reasoning to inform data-driven decision-making
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Mentor junior members on the team
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
Read More
Arrow Right

Principal Software Engineer

The HPC/AI (High performance Computing and Artificial Intelligence) team is on a...
Location
Location
United States , Multiple Locations
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink)
  • In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems
  • Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning
  • Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure
  • Experience with telemetry and observability tools for network monitoring at scale
  • Background in building scalable and fault-tolerant systems in large, distributed environments
Job Responsibility
Job Responsibility
  • Partner with appropriate stakeholders to determine user requirements for a set of scenarios
  • Lead identification of dependencies and the development of design documents for a product, application, service, or platform
  • Leads by example and mentors others to produce extensible and maintainable code used across products
  • Design, develop, and optimize networking solutions tailored for large-scale AI training infrastructure
  • Architect and implement high-performance, low-latency, and low-jitter communication frameworks for distributed systems
  • Benchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transfer
  • Debug and resolve complex networking issues in large-scale, high-performance environments
  • Drive identification of dependencies and the development of design documents for a product, application, service, or platform
  • Create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI)
  • Act as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate
  • Fulltime
Read More
Arrow Right

Principal AI Architect

The Principal AI Architect designs, develops and implements advanced AI solution...
Location
Location
United States , Waukesha
Salary
Salary:
Not provided
energysystems.com Logo
Energy Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors Degree in Computer Science or other related program
  • 8 or more years of experience in AI, machine learning, or data science, with at least 4 years in a senior or lead architect role
  • Proven track record of designing and deploying large-scale AI systems in production environments
  • Experience leading cross-functional teams in the delivery of complex AI projects
  • Hands-on experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and AI frameworks (e.g., TensorFlow, PyTorch, scikit-learn)
  • Must be deeply curious, desire to experiment
  • Expertise in machine learning algorithms, Neural networks, Genetic Algorithms, Decision trees, Business dynamic models, Agent based models, Advanced statistical techniques and operations research
  • Strong proficiency in programming languages such as Python, R, or Java
  • Ability to design scalable, secure, and efficient AI architectures
  • Exceptional problem-solving and analytical skills
Job Responsibility
Job Responsibility
  • Leads the design and development of AI architectures, including machine learning models, deep learning frameworks, and generative AI systems
  • Defines technical strategies and roadmaps for AI-driven projects ensuring alignment with business objectives
  • Collaborates with data scientists, data engineers, business functional and product teams to integrate AI solutions into production environments
  • Advises and oversees the evaluation and adoption of AI technologies, tools, and platforms
  • Serves as the technical leader and mentor to AI and engineering teams
  • Delivers scalable, secure, and optimized AI solutions
  • Leads analytic literacy of the organization and serves as a translator of deep technical concepts into simple business vernacular
  • Leads industry trends and advancements in AI to help maintain a competitive edge
  • Communicates complex technical concepts to non-technical stakeholders effectively
  • Fulltime
Read More
Arrow Right