CrawlJobs Logo

Kernel Optimization Engineer

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
United Arab Emirates , Dubai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Job Responsibility:

  • Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms
  • Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system
  • Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system
  • Using mathematical models and analysis to measure the software performance and inform design decisions
  • Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries
  • Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks
  • Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems

Requirements:

  • Bachelor’s, Master’s, PhD or foreign equivalents in Computer Science, Computer Engineering, Mathematics, or related fields
  • Understanding of hardware architecture concepts — must be comfortable learning the details of a new hardware architecture
  • Skilled in C++ and Python programming languages
  • Good knowledge of library and/or API development best practices
  • Strong debugging skills and knowledge of debugging complex software stack

Nice to have:

  • Experience in kernel development and/or testing
  • Familiarity with parallel algorithms and distributed memory systems
  • Experience in programming accelerators such as GPUs and FPGAs
  • Familiarity with Machine Learning neural networks and frameworks such as TensorFlow and PyTorch
  • Familiarity with HPC kernels and their optimization
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Kernel Optimization Engineer

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices are essential
  • GPU Kernel Development & Optimization: Experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Deep Learning Integration: Experienced in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Software Engineering: Skilled in Python and C++
  • Experience in debugging, performance tuning, and test design
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • Solid experience in running large-scale workloads on heterogeneous compute clusters
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
Read More
Arrow Right

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related fields
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • GPU Kernel Development & Optimization: Deep experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Deep Learning Integration: Strong experienced in integrating optimized GPU performance into machine learning and LLM frameworks (e.g., vLLM, SGlang,TensorFlow, PyTorch)
  • End to end solution optimization: Understand the latest market trend of LLM and multimodal, solid hands-on E2E performance tuning experience on distributed inference (e.g, P/D disaggregation and Large-EP) and RL
  • Software Engineering: Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • High-Performance Computing: Expert experienced in running large-scale workloads on heterogeneous computing clusters
Job Responsibility
Job Responsibility
  • End to end optimization: Build and optimize end to end distributed inference (e.g, P/D disaggregation and Large-EP) and RL solutions on mainstream frameworks like vLLM and SGlang
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right

Sr. Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Ability to define goals, manage development efforts, and deliver high-quality solutions
  • Strong problem-solving skills
  • Proactive approach
  • Keen understanding of software engineering best practices
  • Experience in GPU kernel development & optimization for AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch)
  • Skilled in Python and C++
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
Read More
Arrow Right

HFT Performance Engineer

We’re engineers driven by performance, reliability, and meaningful innovation in...
Location
Location
Slovakia; Czechia; Poland; United Kingdom; Gibraltar , Bratislava
Salary
Salary:
80000.00 - 150000.00 EUR / Year
wincent.com Logo
Wincent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Senior Expertise (5+ years) with low-latency hardware and software
  • Strong background in Kernel Linux, Network Engineering, and Deep Network Knowledge
  • Understanding of market data feeds, order books, and exchange-level protocols
  • Experience with performance testing tools and scripting
Job Responsibility
Job Responsibility
  • Optimize trading system performance – Improve latency, reliability, and efficiency across the full trading stack (hardware, network, and software)
  • Analyze and tune low-level systems – Perform kernel-level debugging, network stack optimization, and hardware benchmarking to enhance system speed
  • Monitor and profile live trading systems – Collect and analyze real-time performance data, create monitoring tools, and identify bottlenecks
  • Prototype and implement performance improvements – Test new ideas, optimize configurations, and deploy enhancements in live trading environments
  • Collaborate across technical teams – Work with trading, infrastructure, and software engineers to ensure consistent, measurable system improvements
What we offer
What we offer
  • Competitive, above-market compensation
  • Performance bonus every six months
  • Equity as part of compensation package
  • Opportunity to invest in flagship multi-strategy fund
  • Relocation support (if needed)
  • Flexible working time
  • Unlimited paid vacation
  • Culture of transparency, ownership, and autonomy
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Senior Engineer - Kernel

The Senior Engineer - Systems (Kernel Sustaining) provides technical expertise a...
Location
Location
United States , Austin
Salary
Salary:
Not provided
aptiv.com Logo
Aptiv plc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor degree in Computer Science, Electrical Engineering, or related field
  • 5+ years of software engineering experience
  • 3+ years of experience with embedded Linux or systems programming
  • Experience with C programming in production systems
  • Strong background in software development lifecycle
  • Strong proficiency in C programming
  • Solid understanding of Linux kernel architecture
  • Experience with embedded systems development
  • Knowledge of build systems (Yocto, Buildroot, or similar)
  • Strong debugging and problem-solving skills
Job Responsibility
Job Responsibility
  • Maintain Linux kernel components, drivers, and subsystems
  • Address CVE vulnerabilities and security issues
  • Backport and integrate upstream kernel patches
  • Ensure kernel stability, performance, and compatibility
  • Write high-quality, maintainable kernel code following Linux standards
  • Debug and resolve complex kernel issues
  • Provide technical guidance and mentorship to junior engineers
  • Participate in code reviews and technical discussions
  • Contribute to architecture and design decisions
  • Drive technical improvements and best practices
What we offer
What we offer
  • Hybrid work model for workplace flexibility
  • Comprehensive health, dental, and life insurance
  • Short and long-term disability coverage
  • RRSP matching for financial security
  • Flexible time-off policies for work-life balance
  • Employee assistance program for mental well-being
  • Learning benefits, including a LinkedIn Learning subscription and seminars
  • Fulltime
Read More
Arrow Right

Senior Engineer - Linux Kernel

The Senior Engineer - Systems (Kernel Sustaining) provides technical expertise a...
Location
Location
Canada , Ottawa
Salary
Salary:
Not provided
aptiv.com Logo
Aptiv plc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Electrical Engineering, or related field
  • 5+ years of software engineering experience
  • 3+ years of experience with embedded Linux or systems programming
  • Experience with C programming in production systems
  • Strong background in software development lifecycle
  • Strong proficiency in C programming
  • Solid understanding of Linux kernel architecture
  • Experience with embedded systems development
  • Knowledge of build systems (Yocto, Buildroot, or similar)
  • Strong debugging and problem-solving skills
Job Responsibility
Job Responsibility
  • Maintain Linux kernel components, drivers, and subsystems
  • Address CVE vulnerabilities and security issues
  • Backport and integrate upstream kernel patches
  • Ensure kernel stability, performance, and compatibility
  • Write high-quality, maintainable kernel code following Linux standards
  • Debug and resolve complex kernel issues
  • Provide technical guidance and mentorship to junior engineers
  • Participate in code reviews and technical discussions
  • Contribute to architecture and design decisions
  • Drive technical improvements and best practices
What we offer
What we offer
  • Workplace Flexibility: Hybrid Work
  • Company-sponsored health, dental, and life insurance
  • Income protection through short and long-term disability coverage
  • Matching RRSP
  • Vacation and various time off policies to encourage work-life balance
  • Well-being programs: Employee assistance program, mental well-being through Unmind
  • Learning benefits: LinkedIn Learning subscription and seminars
  • Fulltime
Read More
Arrow Right