This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join Cerebras as a Performance Engineer within our innovative Runtime Team. Our groundbreaking CS-3 system, hosted by a distributed set of modern and powerful x86 machines, has set new benchmarks in high-performance ML training and inference solutions. It leverages a dinner-plate sized chip with 44GB of on-chip memory to surpass traditional hardware capabilities. This role will challenge and expand your expertise in optimizing AI applications and managing computational workloads primarily on the x86 architecture that run our Runtime driver.
Job Responsibility:
Focus on CPU and memory subsystem optimizations for our Runtime software driver
Develop and enhance algorithms for efficient data movement, local data processing, job submission, and synchronization between various software and hardware components
Optimize our workloads using advanced CPU features like AVX instructions, prefetch mechanisms, and cache optimization techniques
Perform performance profiling and characterization using tools such as AMD uprof, and reduce OS level overheads
Influence the design of Cerebras' next-generation AI architectures and software stack by analyzing the integration of advanced CPU features
Engage directly with the AI and ML developer community to understand their needs and solve contemporary challenges
Collaborate with multiple teams within Cerebras, including architecture, research, and product management
Requirements:
BS, MS, or PhD in Computer Science, Computer Engineering, or a related field
5+ years of relevant experience in performance engineering, particularly in optimizing algorithms and software design
Strong proficiency in C/C++ and familiarity with Python or other scripting languages
Demonstrated experience with memory subsystem optimizations and system-level performance tuning
Experience with distributed systems is highly desirable
Familiarity with compiler technologies (e.g., LLVM, MLIR) and with PyTorch and other ML frameworks
What we offer:
Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs