This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across emerging GPU platforms. You’ll work across the stack - from low-level kernel performance to high-level distributed execution - and collaborate closely with research, infra, and performance teams to ensure our largest models run smoothly on new hardware. This is a high-impact opportunity to shape OpenAI’s multi-platform inference capabilities from the ground up with a particular focus on advancing inference performance on AMD accelerators.
Job Responsibility:
Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware
Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems
Debug and optimize distributed inference workloads across memory, network, and compute layers
Validate correctness, performance, and scalability of model execution on large GPU clusters
Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks
Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs
Requirements:
Experience writing or porting GPU kernels using HIP, CUDA, or Triton
Familiarity with communication libraries like NCCL/RCCL
Experience working on distributed inference systems
Ability to solve end-to-end performance challenges across hardware, system libraries, and orchestration layers
Ability to thrive in a small, fast-moving team building new infrastructure from first principles
Nice to have:
Contributions to open-source libraries like RCCL, Triton, or vLLM
Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling
Prior experience deploying inference on other non-NVIDIA GPU environments
Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models
What we offer:
Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible
Relocation support for eligible employees
Additional taxable fringe benefits, such as charitable donation matching and wellness stipends