Member of Technical Staff - Inference Job at Prime Intellect (San Francisco)

Member of Technical Staff - Inference

Prime Intellect

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

Job Responsibility:

Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets
Design placement and scheduling algorithms for heterogeneous accelerators
Implement multi‑region/zone failover and traffic shifting for resilience and cost control
Build autoscaling, routing, and load balancing to meet throughput/latency SLOs
Optimize model distribution and cold-start times across clusters
Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM
Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance
Profile kernels, memory bandwidth and transport
apply techniques such as quantization and speculative decoding
Develop reproducible performance suites (latency, throughput, context length, batch size, precision)
Embed and optimize distributed inference within our RL stack
Establish CI/CD with artifact promotion, performance gates, and reproducible builds
Build metrics, logs, tracing
structured incident response and SLO management
Document architectures, playbooks, and API contracts
mentor and collaborate cross‑functionally

Requirements:

3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs
Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM
Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo
Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies
Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end
Python: Systems tooling and backend services
PyTorch: LLM Inference engine development and integration, deployment readiness
AWS/GCP service experience, cloud deployment patterns
Running infrastructure at scale with containers on Kubernetes
Architecture, CUDA runtime, NCCL, InfiniBand
GPU‑aware bin‑packing and scheduling across heterogeneous fleets

Nice to have:

Familiarity with CUDA/Triton kernel development
Nsight Systems/Compute profiling
Rust, C++
Kafka/PubSub, Redis, gRPC/Protobuf
Prometheus/Grafana, OpenTelemetry
reliability patterns
Terraform/Ansible, infrastructure-as-code, reproducible environments
Contributions to serving, inference, or RL infrastructure projects

What we offer:

Competitive compensation with significant equity incentives
Flexible work arrangement (remote or San Francisco office)
Full visa sponsorship and relocation support
Professional development budget
Regular team off-sites and conference attendance
Opportunity to shape decentralized AI and RL at Prime Intellect

Additional Information:

Job Posted:
February 21, 2026

Employment Type:

Fulltime

Work Type:

Hybrid work

Prime Intellect - All Job Offers

Job Link Share:

Member of Technical Staff - Inference

Prime Intellect

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff - Inference

Member of Technical Staff, Cloud Infrastructure

Member of Technical Staff, Performance Optimization

Member of Technical Staff – Backend

Member of Technical Staff - Platform Engineer

Member of Technical Staff – Model Training