Research Engineer - Distributed Training Job at Prime Intellect (San Francisco)

Research Engineer - Distributed Training

Prime Intellect

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

Building Open Superintelligence Infrastructure. Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. As a Research Engineer working on Distributed Training, you'll play a crucial role in shaping our technological direction, focusing on our decentralizing AI training stack. If you love scaling things and maximizing training efficiency, this role is for you.

Job Responsibility:

Lead and participate in novel research to build a massive scale, highly reliable and secure decentralized training orchestration solution
Optimize the performance, cost, and resource utilization of AI workloads by leveraging the most recent advances for compute & memory optimization techniques
Contribute to the development of our open-source libraries and frameworks for distributed model training
Publish research in top-tier AI conferences such as ICML & NeurIPS
Distill highly technical project outcomes in layman approachable technical blogs to our customers and developers
Stay up-to-date with the latest advancements in AI/ML infrastructure and tools, decentralized training research and proactively identify opportunities to enhance our platform's capabilities and user experience

Requirements:

Strong background in AI/ML engineering, with extensive experience in designing and implementing end-to-end pipelines for training and deploying large-scale AI models
Deep expertise in distributed training techniques, frameworks (e.g., PyTorch Distributed, DeepSpeed, MosaicML’s LLM Foundry), and tools (e.g. Ray) for optimizing the performance and scalability of AI workloads
Experience in large-scale model training incl. distributed training techniques such as data, tensor & pipeline parallelism
Solid understanding of MLOps best practices, including model versioning, experiment tracking, and continuous integration/deployment (CI/CD) pipelines
Passion for advancing the state-of-the-art in decentralized AI model training and democratizing access to AI capabilities for researchers, developers, and businesses worldwide

What we offer:

Competitive compensation, including equity incentives, aligning your success with the growth and impact of Prime Intellect
Flexible work arrangements, with the option to work remotely or in-person at our offices in San Francisco
Visa sponsorship and relocation assistance for international candidates
Quarterly team off-sites, hackathons, conferences and learning opportunities
Opportunity to work with a talented, hard-working and mission-driven team, united by a shared passion for leveraging technology to accelerate science and AI

Additional Information:

Job Posted:
February 21, 2026

Employment Type:

Fulltime

Work Type:

Remote work

Prime Intellect - All Job Offers

Job Link Share:

Research Engineer - Distributed Training

Prime Intellect

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Research Engineer - Distributed Training

AI Research Engineer, Scaling

Senior Research Engineer

Research Engineer, Scaling

Research Engineer AI

Senior Principal Machine Learning Engineer - LLM Post-Training and Optimization

Member of Technical Staff, AI Training Infrastructure

Vice President of Product and Engineering

Machine Learning Systems Engineer

Research Engineer - Distributed Training

Prime Intellect

Location:United States , San Francisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Research Engineer - Distributed Training

AI Research Engineer, Scaling

Senior Research Engineer

Research Engineer, Scaling

Research Engineer AI

Senior Principal Machine Learning Engineer - LLM Post-Training and Optimization

Member of Technical Staff, AI Training Infrastructure

Vice President of Product and Engineering

Machine Learning Systems Engineer

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 21, 2026