This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.
Job Responsibility:
Contribute to the end-to-end bring up of ML models on Cerebras CSX systems
Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning
Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization
Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups
Requirements:
Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
Strong debugging skills across performance, numerical accuracy, and runtime integration
Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion)
Proficiency in C/C++ programming and experience with low-level optimization
Proven experience in compiler development, particularly with LLVM and/or MLIR
Strong background in optimization techniques, particularly those involving NP-hard problems
What we offer:
Competitive salary and benefits package
Opportunities for professional growth and career advancement
A dynamic and innovative work environment
The chance to work on cutting-edge technologies and make a significant impact on the future of AI