This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.
Job Responsibility:
Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets
Design placement and scheduling algorithms for heterogeneous accelerators
Implement multi‑region/zone failover and traffic shifting for resilience and cost control
Build autoscaling, routing, and load balancing to meet throughput/latency SLOs
Optimize model distribution and cold-start times across clusters
Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM
Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance
Profile kernels, memory bandwidth and transport
apply techniques such as quantization and speculative decoding