This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Senior Principal AI Infrastructure Architect is a highly skilled and advanced subject matter expert, responsible for leading the design of complex AI platform and managed-service solutions and driving the strategic vision and direction for the company's largest enterprise clients. The role sits at the centre of NTT DATA's AI Factories practice and is focused on the hardware foundations — GPU and accelerator compute, host CPU platforms, high-performance storage and AI fabric — that underpin enterprise-scale training, fine-tuning and inference workloads.
Job Responsibility
Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps
Synthesise current and future trends in AI silicon, memory hierarchies (HBM3e, CXL), interconnects and AI software stacks with client strategic imperatives to create compelling, evidence-based solutions
Contribute to NTT DATA's AI Factories knowledge base by sharing reference architectures, sizing tools and lessons learned with internal teams and clients
Requirements
Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Solid command of AI networking — InfiniBand NDR/XDR, RoCEv2, NVIDIA Spectrum-X, Ultra Ethernet, NVLink/NVSwitch fabrics, congestion control and fabric design for rail-optimised and fat-tree topologies
Working knowledge of the AI software and orchestration stack: CUDA, cuDNN, NCCL, ROCm, Triton Inference Server, NIM, vLLM, TensorRT-LLM, Slurm, Kubernetes (with GPU Operator), Kubeflow, Run:ai, MLflow and NVIDIA AI Enterprise
Familiarity with datacenter facilities engineering for AI workloads: high-density power, liquid cooling (DLC, rear-door, immersion), PUE/WUE optimisation and the practical constraints of retrofitting existing colo space for accelerated compute
Excellent written and oral communication skills, with the ability to translate complex technical concepts for technical and non-technical executive audiences
Strong systems-thinking and strategic-thinking skills — able to capture the key elements of a system into a simple abstraction that empowers good decisions
Strong business financial skills, with the demonstrable ability to perform a cost-benefit analysis, build CAPEX vs OPEX comparisons and manage budgets
Knowledge of cloud, hybrid and sovereign AI deployment patterns, plus architectural governance for Agile, DevSecOps and MLOps
Significant knowledge of core Managed Service portfolio artefacts, techniques, demos, tools and deliverables, applied to AI platform operations
Nice to have
Master's or PhD advantageous
Vendor and technology certifications in AI infrastructure highly desirable — for example NVIDIA-Certified Associate / Professional (AI Infrastructure, AI Operations), Dell Technologies AI Factory, Cisco / Nutanix / HPE accelerated compute, Red Hat OpenShift AI, Run:ai — plus relevant storage and networking certifications