Engineering Manager, Cloud Infrastructure Automation Job at OpenAI (San Francisco)

Job Description

This role is a high-ownership leadership role with direct responsibility for production systems operating at extreme scale. The Cloud Infrastructure team builds and operates the foundational platform that powers OpenAI’s production AI systems. The mission is to make infrastructure predictable and boring at massive scale—so research and product teams can move fast without compromising safety, reliability, or efficiency.

Job Responsibility

Build, lead, and grow high-performing infrastructure engineering teams
Own the evolution of OpenAI’s Kubernetes platform, including cluster lifecycle, upgrades, configuration standards, and safety mechanisms
Set and enforce platform-level reliability goals (SLIs/SLOs), ensuring reliability is designed into the system
Drive infrastructure automation across provisioning, upgrades, remediation, and fleet consistency using Terraform and internal tooling
Reduce operational toil and incident frequency through better abstractions, guardrails, and self-healing systems
Establish clear ownership boundaries, technical direction, and execution discipline

Requirements

Significant experience managing infrastructure or platform engineering teams
Deep hands-on understanding of Kubernetes at scale and distributed systems
Experience operating production infrastructure with strict reliability, latency, and security requirements
Ability to balance technical depth with organizational leadership and long-term strategy
Strong track record of hiring, developing, and retaining senior engineers
Comfort operating in ambiguous, fast-moving environments and creating clarity for others

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible
Relocation support for eligible employees
Additional taxable fringe benefits, such as charitable donation matching and wellness stipends

OpenAI - All Job Offers

Select Country

Engineering Manager, Cloud Infrastructure Automation

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Engineering Manager, Cloud Infrastructure Automation

Tools Engineer (Infrastructure Automation & Cloud Platforms)

Finance Manager - Cloud Infrastructure

Senior Cloud Engineering Manager

Senior Engineering Manager, Cloud Enablement

Azure Cloud Platform Engineering Manager

Infrastructure Senior Manager/ Cloud Engineer

Sr. Infrastructure Engineer – Cloud & Automation

Senior Manager of Cloud Engineering

Our AI answers in your language