Senior Research Engineer

Senior Research Engineer - Inference ML

Cerebras Systems

Location:
United States; Canada , Sunnyvale ▼
Toronto

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Job Responsibility:

Design, implement, and optimize state-of-the-art transformer architectures for NLP and computer vision on Cerebras hardware
Research and prototype novel inference algorithms and model architectures that exploit the unique capabilities of Cerebras hardware, with emphasis on speculative decoding, pruning/compression, sparse attention, and sparsity
Train models to convergence, perform hyperparameter sweeps, and analyze results to inform next steps
Bring up new models on the Cerebras system, validate functional correctness, and troubleshoot any integration issues
Profile and optimize model code using Cerebras tools to maximize throughput and minimize latency
Develop diagnostic tooling or scripts to surface performance bottlenecks and guide optimization strategies for inference workloads
Collaborate across teams, including software, hardware, and product, to drive projects from inception through delivery

Requirements:

Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, Electrical Engineering, or a related technical field AND 7+ years of ML software development experience
OR Master’s degree in Computer Science or related technical field AND 4+ years of software development experience
OR PhD in Computer Science or related technical field with 2+ years of relevant research or industry experience
OR Equivalent practical experience
4+ years of experience testing, maintaining, or launching software products, including 2+ years of experience with software design and architecture
3+ years of experience in software development focused on machine learning (e.g., deep learning, large language models, or computer vision)
Strong programming skills in Python and/or C++
Experience with Generative AI and Machine Learning systems
Evidence of research impact in machine learning, such as publications at top conferences (NeurIPS, ICLR, ICML, ACL, EMNLP, MLSys) or comparable contributions to widely used open-source projects or high-quality preprints

Nice to have:

Master’s degree or PhD in Computer Science, Computer Engineering, or a related technical field
Experience independently driving complex ML or inference projects from prototype to production-quality implementations
Hands-on experience with relevant ML frameworks such as PyTorch, Transformers, vLLM, or SGLang
Experience with large language models, mixture-of-experts models, multimodal learning, or AI agents
Experience with speculative decoding, neural network pruning and compression, sparse attention, quantization, sparsity, post-training techniques, and inference-focused evaluations
Familiarity with large-scale model training and deployment, including performance and cost trade-offs in production systems
Triton/CUDA experience is a big plus

What we offer:

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Work Type:

Hybrid work

Cerebras Systems - All Job Offers

Job Link Share:

Senior Research Engineer - Inference ML

Cerebras Systems

Location:
United States; Canada , Sunnyvale ▼
Toronto

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Research Engineer - Inference ML

Senior Machine Learning Engineer (Health)

Senior Machine Learning Engineer, Personalization and Recommendations

Senior Staff Machine Learning Engineer

Senior Staff Machine Learning Engineer

Senior Machine Learning Infrastructure Engineer

Principal Engineer - Marketplace

Senior Staff Machine Learning Engineer - Driver Pricing & Marketplace Optimization

Senior Research Engineer - Inference ML

Cerebras Systems

Location:United States; Canada , Sunnyvale ▼Toronto

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Research Engineer - Inference ML