This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
In this Frontend Architect position, you will develop AI Storage Solutions based advanced system architectures and AI/ML Accelerator ASIC architecture specifications for Sandisk’s next generation products. You will drive, initiate, and analyze frontend architecture of the AI/ML Accelerator product. As a Frontend Architect you will help drive new architecture initiatives that leverage the state-of-the-art frontend interfaces like UCIe, PCIe, CXL, UAL, etc that integrates AI Storage Solutions with xPU in a 3D package system. You will drive the AI Storage Solutions based frontend architecture. You will exercise your technical expertise and excellent communication skills to collaborate with design and product planning with an eye towards delivering innovative and highly competitive adaptive accelerators solutions. Typical activities include writing architecture spec, working with other architects in the team, work with RTL/DV/Simulation/Emulation/FW teams to evaluate these changes and assess the performance, power, area, and endurance of the product. You will work closely with excellent colleague engineers, cope with complex challenges, innovate, and develop products that will change the data centric architecture paradigm.
Job Responsibility:
Responsible for driving the SoC architecture, with a particular focus on I/O subsystems connected over UCIe, PCIe, UAL or CXL
Define I/O subsystem and PCIe DMA architectures, including their interactions with internal embedded processor-subsystems, Network on Chip, Memory controllers, and FPGA fabric
Create flexible and modular I/O subsystem architectures that can be deployed in either chiplet, monolithic or 3D form factors
Work with customers, and cross-functional teams to scope SoC requirements, analyze PPA tradeoffs, and then define architectural requirements that meet the PPA and schedule targets
Define I/O subsystem and DMA hardware, software, and firmware interactions with embedded processing subsystems and SoC CPUs on the device side and Host CPUs
Author architecture specifications in clear and concise language. Guide and assist pre-silicon design/verification and post-silicon validation during the execution phase
Responsible for improving the AI/ML ASIC Architecture performance through hardware & software co-optimization, post-silicon performance analysis, and influencing the strategic product roadmap
LLM Workload analysis and characterization of ASIC and competitive datacenter and AI solutions to identify opportunities for performance improvement in our products
Experience architecting one or some components of AI/ML accelerator ASICs such as HBM, PCIe/UCIe/CXL, NoC, DMA, Firmware Interactions, NAND, xPU, fabrics, etc
Drive the AI Storage Solutions frontend system architecture with GPU/TPU/NPU/xPU to match or exceed the nextgen HBM bandwidth
Architect memory-efficient inference/training systems utilizing techniques like pruning, quantization with MX format , continuous batching/chunked prefill, and speculative decoding
Collaborate with internal and external stakeholders/ML researchers to disseminate results and iterate at rapid pace
Requirements:
Bachelors or Masters or PhD in Computer/Electrical Engineering with 8+ years of hands-on Architecture experience authoring specifications
Strong technical background architecting SoC and I/O subsystems involving PCIe and PCIe-DMA engines, or UCIe or CXL or UAL
Strong IO subsystem microarchitecture, technical, and working knowledge of the PCIe/UCIe protocol specifications
Knowledge of I/O Subsystem and DMA interactions with internal embedded processor-subsystems (x86, RISC-V or ARM) and external host CPU
Good understanding of computer/graphics architecture, ML, LLM
Architecting an GPU/TPU/xPU Accelerator systems with optimized high bandwidth memory hierarchy and frontend architecture for multi-trillion parameter LLM training/inference including Dense, Mixture of Experts (MoE) with multiple modalities (text, vision, speech)
Deep experience optimizing large-scale ML systems, GPU architectures
Proficiency in principles and methods of microarchitecture, software, and hardware relevant to performance engineering
Multi-disciplinary experience, including familiarity with Firmware and ASIC design
Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations
Proven track record architecting distributed training systems handling large scale systems
Nice to have:
Familiarity and background in UCIe, CXL, NVLink, or UAL microarchitecture and protocols
Familiarity with High-speed networking: InfiniBand, RDMA, NVLink
Knowledge of bridging and ordering rule enforcement between on-chip protocols such as AXI, and off-chip protocols such as PCIe desired
Knowledge of ARM Processors and AXI Interconnects desired
Previous experience with NVMe storage systems, protocols, and NAND flash