CrawlJobs Logo

Principal AI Network Architect

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration, with business insight and strategy? Join the Systems Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. We are seeking a passionate Principal AI Network Architect to join the AI systems architecture team. The role includes network architecture evaluation, design and optimization for next-gen AI systems. Your work will have a direct influence on Azure product roadmaps.

Job Responsibility:

  • Leadership: Spearhead architecture definition and evaluation of AI accelerator platforms, with a focus on high bandwidth, low latency networks. Drive end to end optimization of the stack from hardware, the software kernels
  • Cross functional collaboration: Partner with silicon and platform design teams to co-design infrastructure that meets performance, reliability and deployment goals. Frame decisions in terms of TCO, performance, flexibility, scalability
  • Prototyping: You will be working with state of art networking lab to prototype new network architectures
  • Industry influence: Participate in industry consortiums to shape standards, and influence vendor roadmaps

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Master’s or Doctoral degree in Electrical Engineering, Computer Engineering, or related fields and 10+ years of technical experience in the domain
  • Deep expertise with ethernet networking, RDMA (RoCE, Infiniband), congestion control, and layer 2/3 switching
  • Experience architecting scale-out/backend network for AI GPU clusters
  • Familiarity with scale-up networks such as NVLinks, UALink
  • Experience with high radix ethernet switches
  • Familiarity with AI model execution pipelines, being able to analyze communication flows and its impact on model performance
  • Prior contributions in standards committee and experience on hyperscale network deployments would be an added benefit
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
  • Ability to manage through ambiguity, bringing clarity and results orientation to engage and energize collaborators and stakeholders
  • Collaboration skills, teamwork, and sense of presumed responsibility
  • Verbal and written communication skills, and ability to articulate and engage with both technical and non-technical stakeholders at all levels
  • Experience leading and driving complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines
  • Intellectual curiosity and passion about learning and deploying new technologies
  • Problem-solving skills, analytical capabilities, and attention to details

Additional Information:

Job Posted:
January 31, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal AI Network Architect

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Principal Machine Learning Engineer

Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin and Python)
  • Understanding of Machine Learning project lifecycle/tools along with prompt engineering
  • Experience in architecting and implementing high-performance RESTful microservices
  • Experience building and operating large scale distributed systems using Amazon Web Services (S3, Kinesis, Cloud Formation, EKS, AWS Security and Networking)
  • Experience with leveraging LLMs effectively and optimizing model usage on GPUs
  • Experience with Databricks or Apache Spark
  • Experience with Continuous Delivery and Continuous Integration
Job Responsibility
Job Responsibility
  • Regularly tackle the largest and most complex problems in the team, from technical design to launch
  • Work closely with Product, Engineering and Design leads in Jira AI, and translate their requirements into solid engineering deliverables, delegating work to the teams
  • Deliver solutions that are used by other teams and products
  • Follow a Product Engineer mindset by building features that are data-driven and customer-centric, fostering that culture within the Jira AI group
  • Exceptional problem solving ability using ML, AI and core software engineering
  • Routinely tackle complex architecture challenges and define architectural standards
  • Actively contribute to the code delivery through leading code reviews & documentation, direct contribution and fixing complex bugs in high-risk surface areas
  • Expertise in data analysis, statistical methods, and logical reasoning to inform data-driven decision-making
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Mentor junior members on the team
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
Read More
Arrow Right

Principal Network Architect

Architect to help define the future of high-performance networking for HPC and A...
Location
Location
United States , Ft. Collins
Salary
Salary:
142000.00 - 310500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
April 27, 2026
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Electrical Engineering
  • Typically 10+ years experience
  • Deep understanding of network architecture and system-level design principles
  • Proven experience in evaluating architectural trade-offs and implementing optimization strategies
  • Strong ability to work effectively within cross-functional teams
  • Ability to effectively communicate product architectures, design proposals and negotiate options at business unit and executive levels
Job Responsibility
Job Responsibility
  • Define and document ASIC-level network architecture
  • Research and assess new networking technologies
  • Develop and document system-level network designs
  • Collaborate with network architects, ASIC designers, and software engineers to align architecture with system goals
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Data & AI Architect Principal

The Data and AI Architect Principal is central to BT International's ability to ...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define data and AI platform vision with track record driving large-scale data transformation programs
  • Data Platform Architecture – Deep expertise in modern data platform design including data lakes, data warehouses, streaming architectures and data mesh patterns
  • AI/ML Architecture – Strong understanding of machine learning systems including feature engineering, model training, serving, monitoring and MLOps platforms
  • Agentic AI – Emerging expertise in agentic AI systems including LLM-based agents, tool use and reasoning frameworks for autonomous task execution
  • Technical Depth – Hands-on background with coding capability in data languages (Python, SQL, Spark), enabling credibility with engineering teams and participation in data work
  • Data Engineering Patterns – Strong understanding of ETL/ELT patterns, data quality frameworks, schema evolution and data lineage tracking
  • Cloud-Native Data – Experience with cloud data services and platforms across multi-vendor environments including data warehouses, data lakes and ML platforms
  • Platform Engineering Mindset – Ability to treat data and AI capabilities as platform products with clear service contracts and developer experience focus
Job Responsibility
Job Responsibility
  • Define and lead data and AI platform architecture strategy, establishing patterns that balance functional requirements with non-functional requirements including scalability, data quality, privacy and cost optimization
  • Work hand in hand with product engineering squads to establish data generation patterns, working directly with engineers to build data-informed products and AI capabilities
  • Drive architectural strategy for making data available across the organization through self-service data products, APIs and governed data access patterns
  • Lead the technical vision for AI capabilities distinguishing between machine learning for pattern recognition and agentic AI for autonomous task execution, establishing platform foundations for both
  • Champion data mesh principles where appropriate, enabling domain teams to own their data as products while maintaining consistency through federated governance
  • Establish MLOps practices and platforms that enable product teams to train, deploy and monitor machine learning models with the same velocity as application code
  • Collaborate with IT Systems and NaaS architects to ensure telemetry, network data and business systems generate high-quality data with proper lineage and governance
  • Work across multi-vendor environments including cloud data platforms, on-premise data systems and SaaS analytics tools to establish cohesive data architecture
  • Drive architectural governance for data and AI work ensuring solutions follow platform patterns for data quality, model governance and AI safety
  • Provide technical thought leadership on emerging data and AI technologies, evaluating applicability and translating possibilities into roadmaps aligned with BT International's platform strategy
What we offer
What we offer
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • New high-class offices both in Budapest and Debrecen
  • Fulltime
Read More
Arrow Right
New

Principal Software Engineer

Are you passionate about architecting distributed systems, building high-perform...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
  • Expertise in distributed consensus, partitioning, replication, and cloud-native networking
  • Proficiency in C, C++, Rust, Golang, or similar systems programming languages
  • Linux networking expertise: kernel networking stack, packet processing (DPDK/eBPF/XDP), NIC offloads, TCP/UDP performance tuning, and observability tools applied to high‑throughput, low‑latency data paths
  • Experience with DNS protocol, large-scale web applications, or cloud infrastructure is a plus
  • Experience applying AI/Machine Learning (ML) techniques for operational excellence, such as predictive analytics, automated incident detection, or self-healing infrastructure
  • 6+ years of experience designing and building distributed systems or networking data paths at scale
Job Responsibility
Job Responsibility
  • Architect and implement distributed systems and networking data paths for cloud-scale Networking services, focusing on reliability, performance, security, and operational excellence
  • Lead innovation in data plane engineering, including traffic routing, failover and self-healing mechanisms
  • Drive adoption of advanced distributed algorithms, networking protocols, and AI-driven solutions to optimize scalability and resilience
  • Mentor and guide engineers in best practices for distributed systems, networking, security, and cloud infrastructure, providing technical leadership through rigorous code and design reviews
  • Collaborate cross-functionally to deliver end-to-end solutions, from design through deployment and operations
  • Champion operational excellence by developing robust monitoring, observability, and automated recovery solutions, including AI-powered incident detection and predictive scaling
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Business Systems Architect Principal

The Business Systems Architect Principal is central to BT International’s transf...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define and communicate architectural vision for complex business systems landscapes, with track record driving large-scale transformation programmes in regulated telco environments
  • Business Systems Domain Expertise – Deep understanding of BAU systems (sales, service, enterprise applications) and service operations platforms (service desk, AI Ops, lead-to-cash, billing, inventory) with knowledge of how these capabilities support business operations and cost optimisation
  • Systems Integration – Extensive experience linking together various IT systems, services and software across BAU applications, service operations platforms and NaaS capabilities to enable functional operation and support business processes
  • Vendor and Stakeholder Management – Strong ability to work with SaaS platforms and enterprise vendors as strategic partners, negotiate technical implementations and manage complex stakeholder relationships across business and operational teams
  • Modernisation Patterns – Expert knowledge of incremental modernisation approaches including service operations automation, AI-driven process optimisation and cloud-native patterns that balance operational continuity with transformation
  • Technical Depth – Hands-on background in business systems integration and service operations with coding capability, enabling credibility with engineering teams and active participation in technical spikes when needed
  • Operational Excellence – Understanding of comprehensive observability, service resilience and operational automation approaches including metrics, logging, distributed tracing and telemetry pipelines for business systems
  • NaaS Integration Understanding – Knowledge of how BAU systems integrate with network-as-a-service models, enabling sales and service operations to leverage aaS capabilities whilst maintaining operational stability
  • Leadership and Influence – Ability to lead blended IT teams (support, maintenance, BAU systems, service operations) through transformation, build consensus across organisational boundaries and develop technical leadership capability in operational functions
  • Extensive experience leading IT systems and BSS architecture in telecommunications or complex B2B environments, with demonstrated success modernizing legacy landscapes across multiple system domains
Job Responsibility
Job Responsibility
  • Define and lead the architectural strategy for business systems across BAU applications (sales, service, enterprise) and service operations (AI Ops, Service Desk, L2C including Pricing/Design/Quoting/SRM, billing, inventory), establishing target state architecture that optimises legacy systems whilst designing future-ready capabilities
  • Own the business systems portfolio optimisation roadmap, making strategic keep/modernise/retire decisions for BAU systems and service operations platforms based on technical fitness, cost-effectiveness and alignment with asset-light, NaaS-based operating model
  • Establish systems integration approaches that link together BAU applications, service operations platforms and NaaS capabilities, enabling functional operation across sales, service and enterprise systems whilst supporting business processes
  • Lead vendor strategy for business systems platforms including SaaS applications, enterprise systems and service operations tools, negotiating strategic partnerships that simplify IT landscape whilst aligning with aaS business needs and reducing vendor dependencies
  • Champion modern architecture patterns including service operations automation, AI-driven process optimisation, billing automation, inventory accuracy improvement and self-service capabilities that reduce manual propensity and improve cost-to-serve metrics
  • Design for operational excellence by establishing comprehensive observability across business systems including metrics, logging, distributed tracing and telemetry that enable proactive issue detection and support continuous improvement in service delivery
  • Collaborate with Data and AI architects to leverage data platforms and AI capabilities for business intelligence, service automation and customer insights, ensuring business systems generate valuable data and support AI-driven process improvements
  • Drive architectural governance through design reviews and architecture conformance processes, ensuring business systems initiatives align with enterprise standards, security requirements and support transformation to asset-light operating model
  • Build and mentor Business Systems architects who work with BAU operations and service delivery teams, establishing technical leadership capability and fostering architectural thinking across support, maintenance and enterprise systems functions
  • Work with engineering leadership to establish integration patterns that connect business systems with NaaS capabilities, enabling sales and service operations to leverage network-as-a-service whilst maintaining operational stability and cost-effectiveness
What we offer
What we offer
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • New high-class offices both in Budapest and Debrecen
  • Fulltime
Read More
Arrow Right
New

Senior Principal HPC/AI Architect

As a Senior Principal HPC/AI Architect at NTT DATA, you will lead the design of ...
Location
Location
Spain , Madrid
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in HPC/AI infrastructure design
  • 5+ years working with GPU-accelerated systems
  • Proven experience with large-scale GPU deployments (1000+ GPUs)
  • Successful track record in technical bid support and customer engagement
  • Technical Competencies: GPU Architectures: NVIDIA (H100, H200, B100, B200), AMD (MI300X), Intel (Gaudi2/3)
  • Interconnects: InfiniBand (HDR/NDR/XDR), NVLink, RoCE, Infinity Fabric
  • Storage Systems: Lustre, GPFS, BeeGFS, NVMe-oF, S3-compatible object storage
  • Container Platforms: Kubernetes, Docker, Singularity/Apptainer
  • Performance Tools: NVIDIA Nsight, ROCm, Intel VTune
Job Responsibility
Job Responsibility
  • AI Factory Architecture & Design (35%): Design GPU cluster architectures for AI and HPC workloads
  • Define node configurations for diverse workload types
  • Specify and validate performance metrics
  • Architect multi-tier interconnect networks
  • Develop topology designs and calculate bandwidth/latency targets
  • Model performance for customer workloads
  • Pre-Sales Technical Leadership (30%): Lead technical discussions with customer architects
  • Conduct workload sizing and architectural presentations
  • Develop technical content for proposals
  • Analyze competitor solutions
What we offer
What we offer
  • Opportunity to work on cutting-edge AI infrastructure projects
  • Collaborative and innovative work environment
  • Access to advanced lab infrastructure and vendor technologies
  • Career development through technical leadership and innovation
  • Fulltime
Read More
Arrow Right