This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Engineers on the inference performance team operate at the intersection of hardware and software, driving end-to-end model inference speed and throughput. Their work spans low-level kernel performance debugging and optimization, system-level performance analysis, performance modeling and estimation, and the development of tooling for performance projection and diagnostics.
Job Responsibility:
Build performance models (kernel-level, end-to-end) to estimate the performance of state of the art and customer ML models
Optimize and debug our kernel micro code and compiler algorithms to elevate ML model inference speed, throughput and compute utilization on the Cerebras WSE
Debug and understand runtime performance on the system and cluster
Develop tools and infrastructure to help visualize performance data collected from the Wafer Scale Engine and our compute cluster
Requirements:
Bachelors / Masters / PhD in Electrical Engineering or Computer Science
Strong background in computer architecture
Exposure to and understanding of low-level deep learning / LLM math
Strong analytical and problem-solving mindset
3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC)
Experience working on CPU/GPU simulators
Exposure to performance profiling and debug on any system pipeline
Comfort with C++ and Python
What we offer:
Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs