This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our customers’ system requirements are usually highly complex. Bringing together hardware and software systems design, Systems Development Engineering operates at the very cutting edge of technology to meet them. We design and develop electronic and electro-mechanical or systems-orientated products, conduct feasibility studies on engineering proposals and prepare installation, operation and maintenance specifications and instructions. We’re proud to deliver programs and products to the highest quality standards, on time and within budget.
Job Responsibility:
Lead bring‑up, configuration, and validation of system platforms supporting AI workloads (servers, GPU racks, accelerators, networking fabrics)
work with BIOS/UEFI, BMC, firmware, drivers, and kernel subsystems to ensure system readiness for large‑scale AI deployments
perform hardware–software co-validation of CPUs, GPUs, DPUs, NICs, accelerators, and memory subsystems under AI‑heavy workloads
validate PCIe fabric behavior, NUMA topology, and data‑path efficiency for model training and inference
Diagnose complex issues across BIOS, firmware, OS, driver stack, container runtime, orchestration layer, and AI frameworks
analyze system logs, kernel traces, hardware event telemetry, GPU health signals, and fabric diagnostics
conduct root‑cause analysis of performance bottlenecks, training failures, model divergence, and hardware stability issues
collaborate with silicon, firmware, OS, and AI software teams to resolve issues rapidly
Deploy and manage AI clusters: GPU servers, accelerators, high‑speed networking (InfiniBand, RoCE), and storage systems
validate cluster readiness for distributed training, including bandwidth, latency, topology checks, and gradient‑sync performance
work with orchestration systems (Kubernetes, Slurm, Ray, Docker, Singularity) to run and optimize AI pipelines
partner with data center teams for rack integration, power/thermal analysis, and capacity planning
Execute and analyze standard AI benchmarks (MLPerf Training, MLPerf Inference, SPEC AI Benchmarks)
build custom benchmarks for transformer models, LLMs, computer vision, multimodal models, and recommendation systems
interpret results to provide optimization recommendations at the hardware, OS, driver, and framework levels
document findings and drive improvements across the platform and AI software ecosystem
Requirements:
Bachelor’s or Master’s degree in Computer Engineering, Computer Science, Electrical Engineering, or related field
5+ years of experience in system engineering, platform development, or hardware–software validation
Strong understanding of x86 system architecture, CPU/GPU/accelerator internals, memory systems, and I/O subsystems
What we offer:
Comprehensive Healthcare Programs
Award Winning Financial Wellness Tools and Resources
Generous Leave of Absence for New Parents and Caregivers