CrawlJobs Logo

Engineering Manager, Cloud Infrastructure Automation

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

293000.00 - 385000.00 USD / Year

Job Description:

This role is a high-ownership leadership role with direct responsibility for production systems operating at extreme scale. The Cloud Infrastructure team builds and operates the foundational platform that powers OpenAI’s production AI systems. The mission is to make infrastructure predictable and boring at massive scale—so research and product teams can move fast without compromising safety, reliability, or efficiency.

Job Responsibility:

  • Build, lead, and grow high-performing infrastructure engineering teams
  • Own the evolution of OpenAI’s Kubernetes platform, including cluster lifecycle, upgrades, configuration standards, and safety mechanisms
  • Set and enforce platform-level reliability goals (SLIs/SLOs), ensuring reliability is designed into the system
  • Drive infrastructure automation across provisioning, upgrades, remediation, and fleet consistency using Terraform and internal tooling
  • Reduce operational toil and incident frequency through better abstractions, guardrails, and self-healing systems
  • Establish clear ownership boundaries, technical direction, and execution discipline

Requirements:

  • Significant experience managing infrastructure or platform engineering teams
  • Deep hands-on understanding of Kubernetes at scale and distributed systems
  • Experience operating production infrastructure with strict reliability, latency, and security requirements
  • Ability to balance technical depth with organizational leadership and long-term strategy
  • Strong track record of hiring, developing, and retaining senior engineers
  • Comfort operating in ambiguous, fast-moving environments and creating clarity for others
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager, Cloud Infrastructure Automation

Systems Engineer III – Cloud & Infrastructure

This role involves analyzing, designing, implementing, and maintaining complex s...
Location
Location
United States , San Ramon
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in a related field or equivalent experience
  • 6+ years of experience in system engineering, infrastructure design, and support in a corporate environment
  • Expertise in cloud platforms such as Microsoft Azure, AWS, or Google Cloud Platform
  • Strong networking knowledge (DNS, DHCP, TCP/IP, firewalls, VPNs, load balancers)
  • Experience with IaaS and PaaS solutions
  • Proficiency with Infrastructure as Code (IaC) tools like Terraform or ARM/Bicep
  • Experience with CI/CD tools (Azure DevOps, Jenkins, GitHub Actions)
  • Understanding of database administration (SQL Server, MySQL, PostgreSQL)
  • Knowledge of system security, IAM, encryption, and compliance frameworks (ISO, NIST, SOC 2)
  • Experience with monitoring and logging tools
Job Responsibility
Job Responsibility
  • Design, implement, and maintain system infrastructure solutions for performance, scalability, and security
  • Develop and manage system configurations, automation scripts, and deployment pipelines
  • Analyze and resolve complex infrastructure issues while adhering to best practices
  • Provide Level III support for production systems, including root cause analysis and performance tuning
  • Collaborate with software and IT teams to optimize application performance and deployment strategies
  • Manage cloud-based and hybrid infrastructure solutions, including Microsoft Azure and AWS
  • Implement monitoring, logging, and alerting solutions for system health and availability
  • Maintain security and compliance documentation, ensuring adherence to industry standards
  • Research and implement new technologies to improve system automation, performance, and security
  • Contribute to process improvement initiatives and best practices
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • Eligibility to enroll in company 401(k) plan
  • Free online training
  • Fulltime
Read More
Arrow Right

Lead Software Engineer - Cloud Infrastructure

As the Lead Software Engineer - Cloud Infrastructure, you will collaborate with ...
Location
Location
United States
Salary
Salary:
180000.00 - 225000.00 USD / Year
https://corelight.com/ Logo
Corelight
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors or Masters degree in Computer Science or related fields, or equivalent experience
  • 10+ years of professional experience in cloud infrastructure engineering or related roles
  • Strong programming skills in languages such as Bash, Python, Go
  • Experience with infrastructure-as-code (IaC) tools such as Terraform, CloudFormation
  • Proficiency in scripting/programming languages such as Python, Bash, or PowerShell
  • Experience with automation tools like Jenkins, GitLab, and Ansible/Chef
  • Understanding of networking concepts, security best practices, and cloud-native architectures
  • Experience with cloud platforms like AWS, Azure, or Google Cloud
  • Strong communication and collaboration skills
  • Experience with Observability tools such as Prometheus, Grafana, ELK stack, or similar
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain cloud infrastructure solutions on platforms such as AWS, Azure, or Google Cloud Platform (GCP)
  • Develop automation scripts and tools to streamline provisioning, configuration, and management of cloud resources
  • Collaborate with software development teams to integrate cloud services into applications and workflows
  • Implement monitoring and alerting systems to ensure the performance, availability, and security of cloud environments
  • Optimize resource utilization and cost efficiency through continuous monitoring, analysis, and optimization of cloud infrastructure
  • Stay current with emerging technologies and best practices in cloud computing, DevOps, and infrastructure automation
  • Participate in the resolution of production incidents and contribute to post-mortem analysis and improvement efforts.
What we offer
What we offer
  • Equity
  • Additional benefits
  • Fulltime
Read More
Arrow Right

Infrastructure & Cloud Engineer

We are offering an exciting opportunity for an Infrastructure & Cloud Engineer i...
Location
Location
United States , New York
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years of experience in Infrastructure and Cloud Engineering or related roles
  • Proficiency in Power Automate for automating repetitive tasks and workflow creation
  • Extensive experience with Office 365 for business productivity tools management
  • Demonstrable expertise in Azure for building, deploying, and managing applications
  • Familiarity with Entra ID for identity and access management
  • Solid understanding of Endpoint Security principles to protect corporate data
  • Experience with Windows Server for managing network infrastructure
  • Proven track record in Cloud migration, moving on-premise infrastructure to cloud environments
  • Prior experience in a non-profit organization will be an advantage
Job Responsibility
Job Responsibility
  • Design, implement, and manage our Azure infrastructure ensuring optimal performance, availability, and scalability
  • Oversee the Microsoft 365 environment including Office 365, Entra ID, Intune, and Endpoint Protection
  • Plan and execute migrations between on-premises and cloud platforms while ensuring data integrity and minimal disruption
  • Manage Windows Server environments in compliance with security best practices
  • Implement and maintain endpoint protection and device management policies using Intune and related tools
  • Monitor and optimize system performance, reliability, and security
  • Leverage tools such as Power Automate to streamline and automate workflows
  • Identify opportunities to enhance infrastructure efficiency and implement innovative solutions
  • Collaborate with cross-functional teams to understand organizational needs and deliver tailored technology solutions
  • Provide guidance, training, and support to internal teams on IT infrastructure and cloud technologies
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • eligibility to enroll in company 401(k) plan
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right

Cloud Engineering Manager - FinOps

This role combines technical expertise, leadership, and operational excellence t...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and cloud-native technologies
  • Strong knowledge of FinOps principles and cloud financial management, including cost optimization, forecasting, and governance
  • Experience with application development frameworks (e.g., Node.js, Python, Java) and modern software engineering practices
  • Familiarity with cloud monitoring and cost management tools, such as AWS Cost Explorer, Azure Cost Management, or third-party FinOps platforms (e.g., CloudHealth, Apptio)
  • Proficiency in containerization and orchestration technologies such as Docker and Kubernetes
  • Demonstrated success in leading engineering teams, managing priorities, and delivering complex projects on time and within budget
  • Strong collaboration skills, with the ability to work effectively across engineering, finance, and business teams
  • Exceptional ability to communicate technical concepts to non-technical stakeholders and align engineering efforts with business goals
  • Bachelor’s or master’s degree in computer science, engineering, information systems, or related field
  • Typically, 7-10 years’ experience, including 0-2 years of people management experience
Job Responsibility
Job Responsibility
  • Lead and inspire a team of cloud engineers focused on FinOps application development, fostering a culture of innovation, collaboration, and continuous improvement
  • Drive the design, development, and implementation of cloud engineering applications that enable visibility, optimization, and governance of cloud costs and usage
  • Architect scalable, secure, and resilient solutions that align with FinOps principles (e.g., cost optimization, forecasting, usage analytics)
  • Collaborate with product managers and business stakeholders to define requirements, prioritize features, and deliver value-driven solutions
  • Ensure seamless integration of FinOps applications with existing HPE cloud platform tools and systems
  • Lead efforts to optimize cloud infrastructure costs and usage patterns across HPE's cloud platforms, leveraging advanced analytics and automation
  • Establish and enforce engineering best practices, including CI/CD pipelines, DevSecOps principles, and automated testing frameworks
  • Monitor and improve application performance, reliability, and scalability through proactive measures and robust incident management
  • Collaborate with finance teams to ensure compliance with cloud spending policies and reporting requirements
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Morpheus Cloud Support Engineer

As a Morpheus Cloud Support Engineer, you will provide technical assistance and ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of proven experience as a Cloud Support Engineer or in a similar position
  • At least 5 years of experience in Morpheus Cloud Management Platform
  • Bachelor’s degree in computer science, Information Technology, or a related field
  • Strong understanding of cloud systems, including VMware, KVM, AWS, and Azure
  • Experience with cloud infrastructure as code (IaC) technologies such as Terraform or CloudFormation
  • Experience with containerization and orchestration systems such as Docker and Kubernetes
  • Excellent problem-solving and troubleshooting abilities
  • Strong communication skills, with the ability to clearly convey technical information to both technical and non-technical stakeholders
  • Hands-on experience in Morpheus Cloud Management Platform
  • Proficiency with Windows Server, Ubuntu, RHEL, HPE VME, Centos
Job Responsibility
Job Responsibility
  • Provide technical assistance with cloud infrastructure and services in Morpheus CMP
  • Monitor and maintain infrastructure systems to guarantee their availability and performance
  • Troubleshoot and address issues with cloud infrastructure
  • Work with the development and operations teams to optimize cloud solutions
  • Assist with the deployment and setup of cloud resources
  • Develop and maintain comprehensive documentation for cloud systems, including architecture diagrams, operational procedures, and troubleshooting guides
  • Analyze cloud system performance metrics and logs to identify trends, forecast needs, and recommend improvements or upgrades
  • Collaborate with Product Managers, Developers, Operations to understand requirements, use cases and transform them into tests
  • Handle P1 situations in Cloud Infra
  • Provide technical and architectural leadership for the infrastructure Engineering teams and Operations roles
What we offer
What we offer
  • Comprehensive suite of benefits for physical, financial, and emotional wellbeing
  • Career development programs
  • Inclusive work culture
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure

The Cloud Infrastructure Engineering team builds and manages the foundational bl...
Location
Location
Australia
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • Software development experience in Go, C/C++, Java, or another OOP language
  • Experience with cloud technologies such as AWS, Azure, or GCP, including infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
  • Experience developing cloud infrastructure services, preferably with Kubernetes
  • Experience developing cloud native edge or service mesh services, preferably with envoy and Istio
  • Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers
  • Understanding of network topologies, protocols, and security principles, such as VPNs, firewalls, and load balancers
  • Knowledge of cloud security best practices, including encryption, access controls, and compliance standards like SOC2 and GDPR
  • You have excellent communication skills and the ability to work well within a global team
  • You are a strong problem-solver and have solid production debugging skills
Job Responsibility
Job Responsibility
  • Architect and build a robust, scalable, and highly available distributed infrastructure
  • Build a cutting-edge cloud-native platform on top of the public cloud, and automate our cloud resource management
  • Work closely with our ClickHouse core database development team, and security team and partner with them to produce the SAS offering
  • Work on routing and traffic components to improve the reliability and scalability of our cloud service
  • Systematically improve availability by applying industry and distributed systems best practices
  • Design and build security components & tooling: firewall, PKI and certificate infra, zero trust network, etc.
  • Improve performance and cost efficiency of our infrastructure
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure

About ClickHouse: Recognized on the 2025 Forbes Cloud 100 list, ClickHouse is on...
Location
Location
Singapore
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • Software development experience in Go, C/C++, Java, or another OOP language
  • Experience with cloud technologies such as AWS, Azure, or GCP, including infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
  • Experience developing cloud infrastructure services, preferably with Kubernetes
  • Experience developing cloud native edge or service mesh services, preferably with envoy and Istio
  • Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers
  • Understanding of network topologies, protocols, and security principles, such as VPNs, firewalls, and load balancers
  • Knowledge of cloud security best practices, including encryption, access controls, and compliance standards like SOC2 and GDPR
  • You have excellent communication skills and the ability to work well within a global team
  • You are a strong problem-solver and have solid production debugging skills
Job Responsibility
Job Responsibility
  • Architect and build a robust, scalable, and highly available distributed infrastructure
  • Build a cutting-edge cloud-native platform on top of the public cloud, and automate our cloud resource management
  • Work closely with our ClickHouse core database development team, and security team and partner with them to produce the SAS offering
  • Work on routing and traffic components to improve the reliability and scalability of our cloud service
  • Systematically improve availability by applying industry and distributed systems best practices
  • Design and build security components & tooling: firewall, PKI and certificate infra, zero trust network, etc.
  • Improve performance and cost efficiency of our infrastructure
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right