CrawlJobs Logo

Member of Technical Staff - GPU Infrastructure

Prime Intellect

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. As our Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements into production-ready systems capable of training the world's most advanced AI models.

Job Responsibility:

  • Partner with clients to understand workload requirements and design optimal GPU cluster architectures
  • Create technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUs
  • Develop deployment strategies for LLM training, inference, and HPC workloads
  • Present architectural recommendations to technical and executive stakeholders
  • Deploy and configure orchestration systems including SLURM and Kubernetes for distributed workloads
  • Implement high-performance networking with InfiniBand, RoCE, and NVLink interconnects
  • Optimize GPU utilization, memory management, and inter-node communication
  • Configure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performance
  • Tune system performance from kernel parameters to CUDA configurations
  • Serve as primary technical escalation point for customer infrastructure issues
  • Diagnose and resolve complex problems across the full stack - hardware, drivers, networking, and software
  • Implement monitoring, alerting, and automated remediation systems
  • Provide 24/7 on-call support for critical customer deployments
  • Create runbooks and documentation for customer operations teams

Requirements:

  • 3+ years hands-on experience with GPU clusters and HPC environments
  • Deep expertise with SLURM and Kubernetes in production GPU settings
  • Proven experience with InfiniBand configuration and troubleshooting
  • Strong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack
  • Experience with infrastructure automation tools (Ansible, Terraform)
  • Proficiency in Python, Bash, and systems programming
  • Track record of customer-facing technical leadership
  • NVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)
  • Container runtime configuration for GPUs (Docker, Containerd, Enroot)
  • Linux kernel tuning and performance optimization
  • Network topology design for AI workloads
  • Power and cooling requirements for high-density GPU deployments

Nice to have:

  • Experience with 1000+ GPU deployments
  • NVIDIA DGX, HGX, or SuperPOD certification
  • Distributed training frameworks (PyTorch FSDP, DeepSpeed, Megatron-LM)
  • ML framework optimization and profiling
  • Experience with AMD MI300 or Intel Gaudi accelerators
  • Contributions to open-source HPC/AI infrastructure projects

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff - GPU Infrastructure

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Distributed Training Engineer

Our Training Infrastructure team is building the distributed systems that power ...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience building distributed training infrastructure (PyTorch Distributed DDP/FSDP, DeepSpeed ZeRO, Megatron-LM TP/PP)
  • Experience diagnosing performance bottlenecks and failure modes (profiling, NCCL/collectives issues, hangs, OOMs, stragglers)
  • Understanding of hardware accelerators and networking topologies
  • Experience optimizing data pipelines for ML workloads
Job Responsibility
Job Responsibility
  • Design and build core systems that make large training runs fast and reliable
  • Build scalable distributed training infrastructure for GPU clusters
  • Implement and tune parallelism/sharding strategies for evolving architectures
  • Optimize distributed efficiency (topology-aware collectives, comm/compute overlap, straggler mitigation)
  • Build data loading systems that eliminate I/O bottlenecks for multimodal datasets
  • Develop checkpointing mechanisms balancing memory constraints with recovery needs
  • Create monitoring, profiling, and debugging tools for training stability and performance
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Inference

We're looking for an ML infrastructure engineer to bridge the gap between resear...
Location
Location
United States
Salary
Salary:
240000.00 - 290000.00 USD / Year
runwayml.com Logo
Runway
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience running ML model inference at scale in production environments
  • Strong experience with PyTorch and multi-GPU inference for large models
  • Experience with Kubernetes for ML workloads—deploying, scaling, and debugging GPU-based services
  • Comfortable working across multiple cloud providers and managing GPU driver compatibility
  • Experience with monitoring and observability for ML systems (errors, throughput, GPU utilization)
  • Self-starter who can work embedded with research teams and move fast
  • Strong systems thinking and pragmatic approach to production reliability
  • Humility and open mindedness
Job Responsibility
Job Responsibility
  • Productionize model checkpoints end-to-end: from research completion to internal testing to production deployment to post-release support
  • Build and optimize inference systems for large-scale generative models running on multi-GPU environments
  • Design and implement model serving infrastructure specialized for diffusion models and real-time diffusion workflows
  • Add monitoring and observability for new model releases—track errors, throughput, GPU utilization, and latency
  • Embed with research teams to gather training data, run preprocessing scripts, and support the model development process
  • Explore and integrate with GPU inference providers (Modal, E2E, Baseten, etc.)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Inference

Prime Intellect is building the open superintelligence stack - from frontier age...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
Prime Intellect
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs
  • Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM
  • Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo
  • Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies
  • Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end
  • Python: Systems tooling and backend services
  • PyTorch: LLM Inference engine development and integration, deployment readiness
  • AWS/GCP service experience, cloud deployment patterns
  • Running infrastructure at scale with containers on Kubernetes
  • Architecture, CUDA runtime, NCCL, InfiniBand
Job Responsibility
Job Responsibility
  • Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets
  • Design placement and scheduling algorithms for heterogeneous accelerators
  • Implement multi‑region/zone failover and traffic shifting for resilience and cost control
  • Build autoscaling, routing, and load balancing to meet throughput/latency SLOs
  • Optimize model distribution and cold-start times across clusters
  • Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM
  • Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance
  • Profile kernels, memory bandwidth and transport
  • apply techniques such as quantization and speculative decoding
  • Develop reproducible performance suites (latency, throughput, context length, batch size, precision)
What we offer
What we offer
  • Competitive compensation with significant equity incentives
  • Flexible work arrangement (remote or San Francisco office)
  • Full visa sponsorship and relocation support
  • Professional development budget
  • Regular team off-sites and conference attendance
  • Opportunity to shape decentralized AI and RL at Prime Intellect
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Full Stack

Prime Intellect is building the open superintelligence stack - from frontier age...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
Prime Intellect
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python backend development (FastAPI, async)
  • Modern frontend development (TypeScript, React/Next.js, Tailwind)
  • Experience building developer tools and dashboards
  • RESTful API design and implementation
  • Systems programming experience with Rust
  • Infrastructure automation (Ansible, Terraform)
  • Container orchestration (Kubernetes)
  • Cloud platform expertise (GCP preferred)
  • Observability tools (Prometheus, Grafana)
Job Responsibility
Job Responsibility
  • Build intuitive web interfaces for AI workload management and monitoring
  • Develop REST APIs and backend services in Python
  • Create real-time monitoring and debugging tools
  • Implement user-facing features for resource management and job control
  • Design and implement distributed training infrastructure in Rust
  • Build high-performance networking and coordination components
  • Create infrastructure automation pipelines with Ansible
  • Manage cloud resources and container orchestration
  • Implement scheduling systems for heterogeneous hardware (CPU, GPU, TPU)
What we offer
What we offer
  • Competitive compensation with significant equity incentives
  • Flexible work arrangement (remote or San Francisco office)
  • Full visa sponsorship and relocation support
  • Professional development budget for courses and conferences
  • Regular team off-sites and conference attendance
  • Opportunity to shape the future of decentralized AI development
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Synthetic Data

As a Machine Learning Engineer specializing in synthetic data, you will play a p...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with LLMs through work projects, open-source contributions or personal experimentation
  • Familiarity with LLM inference frameworks such as vLLM and TensorRT
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Design and build scalable inference pipelines that run on large GPU clusters
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Research and implement innovative synthetic data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Sovereign AI

Our mission is to scale intelligence to serve humanity. We’re training and deplo...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Canadian citizenship and eligibility for security clearance (required for this role)
  • Extremely strong software engineering skills
  • Proficiency in Python and related ML frameworks
  • Experience training, evaluating, and using (as an end-user) LLMs
  • Experience using large-scale distributed (GPU) LLM training strategies
Job Responsibility
Job Responsibility
  • Design and implement novel research ideas, ship state of the art models to production, and maintain deep connections to academia and the government
  • Design, build and scale agentic AI systems for serving mission critical use cases
  • Research, implement, and experiment with ideas on our supercompute and data infrastructure
  • Learn from and work with the best researchers in the field
  • Execute across the full AI stack and ship products to serve public interest
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right