This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Manager, Kernel Software, you will lead a team of engineers at the intersection of hardware and software, developing high-performance solutions for cutting-edge AI and HPC workloads. You will collaborate with leaders from industry and academia to co-design software that fully harnesses the capabilities of our custom, massively parallel processor architecture. In this dual-role position, you will guide the technical roadmap, oversee the design and optimization of deep learning operations, and ensure the delivery of robust, high-performing kernel libraries. You will also manage and mentor a team of talented engineers, supporting their growth and fostering a culture of technical excellence, collaboration, and innovation. Your leadership will directly impact our ability to scale training workloads and deliver breakthroughs in performance and efficiency.
Job Responsibility:
Lead the design and development of high-performance ML and linear algebra kernels for the Cerebras WSE using parallel programming techniques
Guide a team building optimized low-level routines in assembly and a domain-specific C-like language
Use performance modeling to inform design and optimization decisions
Drive test development to ensure correctness and performance of kernel libraries
Evolve kernel architecture to support emerging ML models and workloads
Collaborate with hardware architects to influence future system design
Mentor engineers and foster a high-performing, collaborative team culture
Requirements:
Bachelor’s, Master’s, PhD, or foreign equivalent in Computer Science, Computer Engineering, Mathematics, or a related field
Proven experience leading technical teams, including mentoring engineers, setting technical direction, and driving execution
Strong understanding of hardware architecture concepts and willingness to dive into new system architectures
Proficiency in C++ and Python
experience with low-level systems programming
Familiarity with library/API development best practices and performance optimization
Excellent debugging skills across complex, layered software stacks
Nice to have:
Experience leading teams in kernel development, performance optimization, or low-level systems programming
Strong background in parallel algorithms and distributed memory systems
Hands-on experience with accelerators such as GPUs, FPGAs, or other custom hardware
Familiarity with machine learning workloads and frameworks like TensorFlow and PyTorch
Understanding of HPC kernels and strategies for optimizing them on modern architectures
What we offer:
Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs