Member of Technical Staff - GPU Infrastructure Job at Prime Intellect (San Francisco)

Member of Technical Staff - GPU Infrastructure

Prime Intellect

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. As our Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements into production-ready systems capable of training the world's most advanced AI models.

Job Responsibility:

Partner with clients to understand workload requirements and design optimal GPU cluster architectures
Create technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUs
Develop deployment strategies for LLM training, inference, and HPC workloads
Present architectural recommendations to technical and executive stakeholders
Deploy and configure orchestration systems including SLURM and Kubernetes for distributed workloads
Implement high-performance networking with InfiniBand, RoCE, and NVLink interconnects
Optimize GPU utilization, memory management, and inter-node communication
Configure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performance
Tune system performance from kernel parameters to CUDA configurations
Serve as primary technical escalation point for customer infrastructure issues
Diagnose and resolve complex problems across the full stack - hardware, drivers, networking, and software
Implement monitoring, alerting, and automated remediation systems
Provide 24/7 on-call support for critical customer deployments
Create runbooks and documentation for customer operations teams

Requirements:

3+ years hands-on experience with GPU clusters and HPC environments
Deep expertise with SLURM and Kubernetes in production GPU settings
Proven experience with InfiniBand configuration and troubleshooting
Strong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack
Experience with infrastructure automation tools (Ansible, Terraform)
Proficiency in Python, Bash, and systems programming
Track record of customer-facing technical leadership
NVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)
Container runtime configuration for GPUs (Docker, Containerd, Enroot)
Linux kernel tuning and performance optimization
Network topology design for AI workloads
Power and cooling requirements for high-density GPU deployments

Nice to have:

Experience with 1000+ GPU deployments
NVIDIA DGX, HGX, or SuperPOD certification
Distributed training frameworks (PyTorch FSDP, DeepSpeed, Megatron-LM)
ML framework optimization and profiling
Experience with AMD MI300 or Intel Gaudi accelerators
Contributions to open-source HPC/AI infrastructure projects

Additional Information:

Job Posted:
February 21, 2026

Employment Type:

Fulltime

Work Type:

Hybrid work

Prime Intellect - All Job Offers

Job Link Share:

Member of Technical Staff - GPU Infrastructure

Prime Intellect

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff - GPU Infrastructure

Member of Technical Staff, Performance Optimization

Member of Technical Staff - Distributed Training Engineer

Member of Technical Staff, Inference

Member of Technical Staff, Software Co-Design AI HPC Systems

Member of Technical Staff - Inference

Member of Technical Staff - Full Stack

Member of Technical Staff, Synthetic Data

Member of Technical Staff - Sovereign AI

Member of Technical Staff - GPU Infrastructure

Prime Intellect

Location:United States , San Francisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff - GPU Infrastructure

Member of Technical Staff, Performance Optimization

Member of Technical Staff - Distributed Training Engineer

Member of Technical Staff, Inference

Member of Technical Staff, Software Co-Design AI HPC Systems

Member of Technical Staff - Inference

Member of Technical Staff - Full Stack

Member of Technical Staff, Synthetic Data

Member of Technical Staff - Sovereign AI

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 21, 2026