CrawlJobs Logo

Senior Principal AI Infrastructure Architect

Italy, Milano · Job Posted May 29, 2026
Apply Position
Job Link Share

Job Description

The Senior Principal AI Infrastructure Architect is a highly skilled and advanced subject matter expert, responsible for leading the design of complex AI platform and managed-service solutions and driving the strategic vision and direction for the company's largest enterprise clients. The role sits at the centre of NTT DATA's AI Factories practice and is focused on the hardware foundations — GPU and accelerator compute, host CPU platforms, high-performance storage and AI fabric — that underpin enterprise-scale training, fine-tuning and inference workloads.

Job Responsibility

  • Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
  • Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
  • Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
  • Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
  • Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
  • Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
  • Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
  • Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
  • Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
  • Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps
  • Synthesise current and future trends in AI silicon, memory hierarchies (HBM3e, CXL), interconnects and AI software stacks with client strategic imperatives to create compelling, evidence-based solutions
  • Contribute to NTT DATA's AI Factories knowledge base by sharing reference architectures, sizing tools and lessons learned with internal teams and clients

Requirements

  • Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
  • Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
  • Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
  • Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
  • Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
  • Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
  • Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
  • Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
  • Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
  • Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
  • Solid command of AI networking — InfiniBand NDR/XDR, RoCEv2, NVIDIA Spectrum-X, Ultra Ethernet, NVLink/NVSwitch fabrics, congestion control and fabric design for rail-optimised and fat-tree topologies
  • Working knowledge of the AI software and orchestration stack: CUDA, cuDNN, NCCL, ROCm, Triton Inference Server, NIM, vLLM, TensorRT-LLM, Slurm, Kubernetes (with GPU Operator), Kubeflow, Run:ai, MLflow and NVIDIA AI Enterprise
  • Familiarity with datacenter facilities engineering for AI workloads: high-density power, liquid cooling (DLC, rear-door, immersion), PUE/WUE optimisation and the practical constraints of retrofitting existing colo space for accelerated compute
  • Excellent written and oral communication skills, with the ability to translate complex technical concepts for technical and non-technical executive audiences
  • Strong systems-thinking and strategic-thinking skills — able to capture the key elements of a system into a simple abstraction that empowers good decisions
  • Strong business financial skills, with the demonstrable ability to perform a cost-benefit analysis, build CAPEX vs OPEX comparisons and manage budgets
  • Knowledge of cloud, hybrid and sovereign AI deployment patterns, plus architectural governance for Agile, DevSecOps and MLOps
  • Significant knowledge of core Managed Service portfolio artefacts, techniques, demos, tools and deliverables, applied to AI platform operations

Nice to have

  • Master's or PhD advantageous
  • Vendor and technology certifications in AI infrastructure highly desirable — for example NVIDIA-Certified Associate / Professional (AI Infrastructure, AI Operations), Dell Technologies AI Factory, Cisco / Nutanix / HPE accelerated compute, Red Hat OpenShift AI, Run:ai — plus relevant storage and networking certifications
  • Scaled Agile certification advantageous

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Principal AI Infrastructure Architect

8 matching positions

Principal Infrastructure Architect

Planet DDS is a leading provider of a platform of cloud-based solutions that emp...
Location
Location
United Kingdom , Glasgow
Salary
Salary:
Not provided
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science or related field, or equivalent experience
  • 7+ years of hands-on experience with Azure (required), AWS and GCP (preferred)
  • Minimum 1 years’ experience with AI coding tools
  • Experience with infrastructure as code tools and cloud security best practices
  • Experience in operating large scale critical systems
  • Deep experience with FinOps and cloud cost optimization
  • Strong written and verbal communication skills
  • Proven experience of driving best practices across engineering organization
Job Responsibility
Job Responsibility
  • Drive adoption of AI-based DevOps tools to automate processes, optimize infrastructure, and enhance team productivity
  • Promote best practices for DevOps, SRE, Security, and Deployment platforms
  • Participate in incident retrospectives and drive operational excellence across the engineering teams
  • Participate in Incident Management and establish top notch incident communication practices
  • Partner with engineering leadership to build long term roadmap for the organization
  • Partner with the platform teams to drive architecture review of shared components
  • Leverage expertise in FinOps and cloud cost control to drive efficient utilization of cloud resources in the engineering organization
  • Partner with senior engineers across the company to drive best in class engineering practices across the organization
Read More
Arrow Right

Senior Principal Machine Learning Engineer

You’ll form a new team of passionate engineers dedicated to building and scaling...
Location
Location
United States
Salary
Salary:
222300.00 - 348975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Statistics, Mathematics, or a related field, or equivalent practical experience
  • 12+ years of industry experience in machine learning, data science, or AI, with a proven track record of delivering production-grade ML systems
  • Deep expertise in Python, Go, or Java, with the ability to write performant, production-quality code
  • familiarity with SQL, Spark, and cloud data environments (e.g., AWS, GCP, Databricks)
  • Experience building and scaling ML models for business-critical applications, ideally in security, privacy, anti-abuse, or compliance domains
  • Strong communication skills, able to explain complex ML concepts to diverse audiences and influence stakeholders
  • Demonstrated ability to solve ambiguous, complex problems and drive projects from ideation to production
  • Agile development mindset, with a focus on iterative improvement and business impact
Job Responsibility
Job Responsibility
  • Lead AI/ML Strategy for Trust: Drive the development and implementation of advanced machine learning algorithms and AI systems for Trust, Security, Product Abuse, and Compliance use cases (e.g., threat detection, vulnerability management, privacy automation, AI safety)
  • Architect and Scale ML Platforms: Design and build scalable, secure, and reliable ML infrastructure and pipelines, ensuring compliance with privacy and regulatory requirements
  • AI Safety and Responsible AI: Develop and champion AI safety practices, including output moderation, explainability, and alignment with evolving regulatory frameworks
  • Cross-Functional Collaboration: Partner with product, engineering, security, privacy, and analytics teams to deliver transformative AI/ML solutions that enhance Atlassian’s trust posture
  • Mentorship and Leadership: Mentor and guide ML engineers and data scientists, fostering a culture of technical excellence, innovation, and continuous improvement
  • Innovation and Research: Stay at the forefront of AI/ML research, evaluating and applying the latest techniques (e.g., LLMs, anomaly detection, privacy-preserving ML) to real-world Trust challenges
  • Platform Enablement: Build reusable ML services and APIs that empower other teams to integrate AI/ML into their products and workflows
  • Operational Excellence: Ensure high availability, reliability, and security of all ML-powered Trust platforms and services
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • benefits, bonuses, commissions, and equity
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Software Engineer, Infrastructure

At Docker, we make app development easier so developers can focus on what matter...
Location
Location
United States , Seattle
Salary
Salary:
251000.00 - 352000.00 USD / Year
docker.com Logo
Docker
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of software engineering experience with demonstrated expertise across multiple platform domains (identity, billing, data, infrastructure)
  • Proven track record architecting and delivering large-scale distributed systems serving millions of users and thousands of enterprise customers
  • Deep expertise in at least two of: identity/access management systems, billing/monetization platforms, data platforms, or cloud infrastructure
  • Broad working knowledge across all platform domains with ability to make sound architectural decisions spanning multiple areas
  • Expert-level understanding of API design, service architecture, and system integration patterns at scale
  • Experience with cloud platforms (AWS, GCP, or Azure) and modern infrastructure patterns (Kubernetes, service mesh, infrastructure-as-code)
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • Track record of establishing strategic technical plans that directly enabled business outcomes (revenue growth, cost reduction, market expansion)
  • Experience translating business strategy into technical architecture and roadmaps
  • Demonstrated ability to identify and prioritize investments that provide maximum platform leverage
Job Responsibility
Job Responsibility
  • Define and own the multi-year technical vision for Docker's foundational platform, encompassing accounts, billing, data, enterprise governance, and infrastructure
  • Establish strategic plans and objectives for major platform initiatives, making architectural decisions that ensure effective achievement of Docker's business objectives
  • Contribute to and drive the strategic vision in collaboration with the VP of Engineering, translating organizational strategy into technical roadmaps that span multiple teams and years
  • Identify and prioritize platform investments that provide maximum leverage—capabilities built once that enable rapid iteration across all Docker products
  • Develop architectural principles and standards that guide technical decisions across the Bridge organization and influence product engineering teams
  • Anticipate future business needs and ensure platform architecture provides the flexibility to support Docker's evolving commercial models
  • Lead large cross-company programs that require coordination across Desktop, Hub, AI, Security, Cloud, and Platform teams
  • Architect the unified platform interfaces ("Control Planes") that enable product teams to answer canonical questions like "Can this user access this feature?" or "How much has this organization consumed?" without understanding underlying complexity
  • Drive convergence of fragmented systems across Docker—replacing product-specific implementations with shared platform capabilities for authentication, authorization, billing, and observability
  • Establish technical contracts between platform and product teams that enable independent velocity while ensuring consistency and reliability
What we offer
What we offer
  • Freedom & flexibility
  • fit your work around your life
  • Designated quarterly Whaleness Days plus end of year Whaleness break
  • Home office setup
  • we want you comfortable while you work
  • 16 weeks of paid Parental leave
  • Technology stipend equivalent to $100 net/month
  • PTO plan that encourages you to take time to do the things you enjoy
  • Training stipend for conferences, courses and classes
  • Equity
  • Fulltime
Read More
Arrow Right

Senior Principal Architect, Senior Director

Location
Location
United States , California
Salary
Salary:
217300.00 - 325900.00 USD / Year
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 20+ years of software or systems engineering experience, with 8+ years in a principal, distinguished, or fellow-level architectural role at a software product company or hyperscaler
  • Recognized technical authority in at least two of the following domains: cloud-native data platforms, distributed query processing, AI/ML infrastructure, elastic compute architecture, or enterprise integration
  • Demonstrated ability to set and communicate a multi-year technical vision that influences both internal engineering decisions and external market positioning
  • Proven track record of deploying AI tools, LLM-assisted workflows, and automated engineering practices at an organizational scale—with measurable productivity outcomes
  • Outstanding communication skills with experience presenting to Boards of Directors, institutional investors, and enterprise C-suite executives
Job Responsibility
Job Responsibility
  • Define and steward Teradata's enterprise-wide technical architecture vision, ensuring coherent design across compute, storage, query processing, AI/ML pipelines, and cloud integration layers
  • Lead architectural strategy for Teradata's highest-complexity, highest-risk platform initiatives—including Teradata Factory data lakehouse, Agentic Platform AI workloads, and MCP Integration framework
  • Serve as Teradata's primary external technical voice in industry forums, standards bodies (e.g., cloud data alliances, open-source governance boards), and with strategic enterprise customers and partners
  • Chair the Enterprise Architecture Review Board, establishing architectural principles, evaluating major design proposals, and resolving cross-team technical conflicts with speed and clarity
  • Develop and operationalize a comprehensive AI productivity framework for the Core Platform engineering org—defining tooling standards, measurement baselines, and a roadmap for AI-assisted design, testing, and documentation
  • Mentor a portfolio of Principal and Senior Staff Architects
  • serve as a career sponsor and technical role model for Teradata's most senior engineering talent
  • Produce landmark technical assets—platform blueprints, architecture whitepapers, technology evaluations—that are used directly in Board reporting, investor communications, and enterprise sales cycles
  • Partner with the CPO and Chief Data Officer to ensure architecture investments align with Teradata's five-year AI platform strategy and business outcomes
  • foundational AI skills to explore and implement ways AI can enhance productivity, innovation, and impact across our workforce
What we offer
What we offer
  • Healthcare
  • Life and disability insurance plans
  • 401(k)-retirement savings plan
  • Time-off programs
  • Fulltime
Read More
Arrow Right

Principal AI Solutions Architect

You'll be the technical owner for a small portfolio of strategic accounts — typi...
Location
Location
United States , San Francisco
Salary
Salary:
170000.00 - 260000.00 USD / Year
strategicemployment.com Logo
Strategic Employment Partners
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in a customer-facing technical role — Solutions Architect, Customer Engineer, SRE, or Senior Support Engineer at an infra company. You've owned strategic accounts before.
  • Deep cloud-native chops: Kubernetes, service mesh (Istio, Cilium), API gateways and proxies (Envoy or similar). You've debugged these in production, not just deployed them.
  • 1+ years hands-on with AI/ML infrastructure — LLMs, agentic frameworks, model-serving platforms, inference gateways. You don't need to have trained a model, but you should understand how production AI traffic actually flows.
  • Scripting/programming comfort in Go, Python, or Bash. You'll write diagnostics, automation, and reference code.
  • The ability to talk to a platform engineer at 10am and a CTO at 2pm without changing who you are.
Job Responsibility
Job Responsibility
  • Architect their AI infrastructure layer. LLM gateways with auth, rate limiting, and observability. Agent-to-agent communication patterns. Securing inference traffic across multi-cloud environments. Most of our customers haven't done this before — you have, or you'll figure it out alongside the engineering team and write the playbook everyone else uses.
  • Run technical issue resolution end-to-end. When something escalates, you partner with Support and Engineering, drive root cause, and often dig in directly. We expect Principal-level architects to get their hands dirty when it accelerates the outcome — reading code, reproducing issues, writing reference implementations.
  • Drive deep adoption. You'll consult on performance tuning, deployment patterns, and operational best practices. You'll spot new use cases inside the account and bring them forward.
  • Influence the product. You sit closer to real production AI workloads than almost anyone in the company. Product Management and Engineering treat your feedback as a primary signal for the roadmap.
  • Partner with the account team (CSM, AE, SE) on risk, renewal, and expansion — but you're the technical voice in the room, not the commercial one.
What we offer
What we offer
  • Laptop + WFH stipend + monthly phone/internet allowance
  • Premium-paid healthcare
  • Equity
  • Flexible hours
  • Fulltime
Read More
Arrow Right
New

Principal AI Architect

As the Principal AI Architect for Teradata AI Studio, you will define the techni...
Location
Location
India , Bengaluru, Karnataka
Salary
Salary:
Not provided
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience, including 3+ years in a senior architect or principal engineer role with platform-wide technical scope
  • Demonstrated expertise designing AI/ML platforms or developer tools: model serving infrastructure, feature stores, experiment tracking, MLOps pipelines, or AI agent development environments
  • Deep understanding of LLM integration patterns: RAG architectures, fine-tuning pipelines, evaluation frameworks, and agent tool-calling interfaces
  • Experience with enterprise data platforms (Teradata Vantage, Snowflake, Databricks, or equivalent) at sufficient depth to architect against their APIs, security models, and performance characteristics
Job Responsibility
Job Responsibility
  • Define the technical architecture of Teradata's end-to-end AI development environment
  • Set the architectural direction for how AI Studio integrates with Teradata Vantage's query engine, model registry, feature store, and agent harness
  • Establish the patterns for how enterprise customers build trustworthy AI workflows — from data preparation through model deployment to agent-driven automation
What we offer
What we offer
  • People-first culture
  • Flexible work model
  • Focus on well-being
  • Commitment to fostering an inclusive environment
  • Fulltime
Read More
Arrow Right

Principal Ai Architect

At Teradata, we believe that people thrive when empowered with better informatio...
Location
Location
United States , California
Salary
Salary:
217300.00 - 325900.00 USD / Year
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience, including 3+ years in a senior architect or principal engineer role with platform-wide technical scope
  • Demonstrated expertise designing AI/ML platforms or developer tools: model serving infrastructure, feature stores, experiment tracking, MLOps pipelines, or AI agent development environments
  • Deep understanding of LLM integration patterns: RAG architectures, fine-tuning pipelines, evaluation frameworks, and agent tool-calling interfaces
  • Experience with enterprise data platforms (Teradata Vantage, Snowflake, Databricks, or equivalent) at sufficient depth to architect against their APIs, security models, and performance characteristics
  • Experience building developer-facing platforms — SDKs, APIs, or IDEs — that external developers adopt and extend
  • Familiarity with open-source AI development tools: MLflow, Weights & Biases, Hugging Face, LangChain, LangGraph, or comparable
  • Understanding of enterprise AI governance requirements: model lineage, data access controls, audit logging, and responsible AI guardrails
  • Experience with cloud-native architecture (AWS, Azure, GCP) and containerized ML workloads (Kubernetes, Docker)
  • Strong cross-functional influence: you can drive alignment across engineering, product, and customer-facing teams without formal authority
  • A portfolio of architectural decisions — RFCs, design docs, or open-source work — that demonstrates your approach
Job Responsibility
Job Responsibility
  • Define the technical architecture of Teradata's end-to-end AI development environment — the platform where data scientists, ML engineers, and AI developers build, test, deploy, and monitor AI and agentic applications on top of Vantage
  • Set the architectural direction for how AI Studio integrates with Teradata Vantage's query engine, model registry, feature store, and agent harness
  • Establish the patterns for how enterprise customers build trustworthy AI workflows — from data preparation through model deployment to agent-driven automation
  • Ensure that AI Studio is the most capable, governed, and scalable AI development environment in the market
  • Ship architectural decisions that other engineers can build on with confidence
  • Drive customer adoption of AI Studio at scale
What we offer
What we offer
  • Healthcare
  • life and disability insurance plans
  • 401(k)-retirement savings plan
  • time-off programs
  • Fulltime
Read More
Arrow Right