Hpc Solution Architect Job at Dell (Austin)

Job Description

The Software Engineering team delivers next-generation software application enhancements and new products for a changing world. Working at the cutting edge, we design and develop software for platforms, peripherals, applications and diagnostics — all with the most advanced technologies, tools, software engineering methodologies and the collaboration of internal and external partners. We are hiring a Senior HPC Solution Architect to design, deploy, and support large‑scale HPC and AI clusters for enterprise, research, and hyperscale customers. This is a hands‑on, customer‑facing Individual Contributor role that blends Linux systems engineering, cluster lifecycle automation, provisioning frameworks (Omnia/OpenCHAMI), Slurm/Kubernetes, and deep troubleshooting of production environments. Ideal for strong technical engineers who enjoy solving complex customer problems, contributing to open‑source, and shaping modern HPC deployment practices.

Job Responsibility

Lead customer architecture & design, translating HPC/AI workload requirements into scalable cluster architectures (compute, schedulers, storage, interconnects)
Deploy and operationalize clusters using Omnia or similar automation, including provisioning, scheduler bring‑up, telemetry, authentication, and repo management
Build and maintain provisioning workflows (OpenCHAMI‑based or equivalent) covering PXE/iPXE boot, cloud‑init, security, and identity/cert operations
Serve as Tier‑3 engineering escalation, troubleshooting complex provisioning, scheduling, GPU, networking, and performance issues
perform RCAs and drive permanent fixes
Contribute to open source and customer enablement through code contributions, documentation, workshops, runbooks, templates, and field readiness materials

Requirements

8+ years engineering large‑scale HPC and distributed infrastructure, with strong knowledge of cluster architecture, schedulers, and provisioning workflows
Deep experience with RHEL/Rocky/Ubuntu
hands‑on cluster deployments using open‑source toolchains, Omnia, and OpenCHAMI (composable provisioning, cloud‑init, microservices)
Production experience with Slurm and/or Kubernetes
proficient with Docker/Podman, OpenTelemetry pipelines, and telemetry instrumentation
Solid L2/L3 fundamentals, PXE/iPXE, DHCP/TFTP
experience with InfiniBand/RoCE/Omni‑Path fabrics and event streaming with Kafka
Strong skills in Ansible, Python, Bash
expertise with Prometheus and Grafana dashboards
proven communication skills for escalations and simplifying complex HPC concepts

What we offer

Comprehensive Healthcare Programs
Award Winning Financial Wellness Tools and Resources
Generous Leave of Absence for New Parents and Caregivers
Industry Leading Wellness Platform
Employee Assistance Program

Dell - All Job Offers

Select Country

Hpc Solution Architect

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Hpc Solution Architect

Cloud Solution Architect HPC

Cloud Solution Architect HPC Workload

Senior Solution Architect AI & HPC

FAE Solution Architect

Lead Delivery Solution Architect

AI/ML Enterprise Solution Architect

Lead Solution Architect

Hpc/ai solution architect

Our AI answers in your language