Datacenter Hardware Operations Lead Job at OpenAI (San Francisco)

Datacenter Hardware Operations Technician, AI Compute Infrastructure - Stargate

OpenAI, in close collaboration with our capital partners, is embarking on a jour...

Location

United States , Abilene, Texas

Salary:

86400.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

7+ years of experience in datacenter hardware operations, hardware engineering, or large-scale server maintenance
At least 2 years in a senior or lead technician capacity
Deep knowledge of high-density server hardware, including x86 platforms, GPUs, storage devices, and power/cooling systems
Excel at diagnosing hardware issues, coordinating complex repairs, and maintaining strong working relationships across organizations
Comfortable setting technical expectations and validating outcomes through collaboration, not direct management
Adapt quickly to changing operational conditions and enjoy solving problems at both the strategic and on-site levels
Communicate clearly and build trust across partner teams, vendors, and internal engineering stakeholders
Willing to be based full-time at a partner-operated campus

Job Responsibility

Serve as OpenAI’s primary on-site hardware contact, collaborating with Oracle teams and vendors to plan and coordinate maintenance, repairs, and lifecycle activities
Share technical requirements and verify that work performed supports OpenAI’s compute needs and agreed quality targets
Coordinate schedules, spare-parts planning, and issue escalation with partner teams to minimize downtime and keep operations running smoothly
Work with OpenAI fleet-health engineers to translate software-detected issues into on-site hardware actions in partnership with Oracle
Track hardware trends and provide joint recommendations with partner teams for design or operational improvements
Prepare documentation and runbooks that capture joint best practices and can be applied at additional campuses
Offer technical guidance and context to partner personnel while respecting their operational ownership
Collaborate with supply-chain teams to plan spares and manage hardware lifecycle activities

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Critical Environment Electrical Engineer

In alignment with our Microsoft values, we are committed to cultivating an inclu...

Location

South Korea , Busan

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

10+ years of technical engineering experience OR Bachelor's degree in Electrical Engineering or related field AND 5+ years of technical engineering experience OR Master's degree in Electrical Engineering or related field AND 3+ year(s) of technical engineering experience
Bachelor's degree in Electrical Engineering or related field AND 10+ years of technical engineering experience OR Master's degree in Electrical Engineering or related field AND 7+ years of technical engineering experience
Experience in operating or constructing data centers or substations from an electrical engineering perspective or maintaining large-scale electrical facilities
Possession of relevant certifications in electrical safety, electrical equipment switching or operation and etc
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Design: Develop basic knowledge of industry trends, competitors' products, customer experience, and advances in various engineering fields
Consult with other engineers to help understand and apply advanced concepts
Assist in the creation and execution of robust, scalable, secure, and extensible designs and/or verification plans for the electrical engineering aspects of a well-defined feature or product extension
Participate in creating intellectual property
May file patents within personal area of scope and complete necessary patent documentation
Implement known solutions to common technical and/or design challenges, with some guidance from other engineers
Read device specification sheets and interpret common details required to design and test various hardware features
Develop design documentation which could include drawings, specifications, schematics, or diagrams for features of well-defined products with guidance and direction from a manager or other engineers
Review supplier or partner technical sheets for available parts and choose appropriate parts needed to support product development
Testing and Verification: Ensure that test requirements are included in documentation and specifications for a feature or product

Fulltime

Mechanical Engineer

Mechanical Engineer (IC3) is responsible for ensuring the availability, reliabil...

Location

Netherlands , Middenmeer

Salary:

68400.00 - 102600.00 EUR / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Proven technical engineering experience OR Bachelor's degree in Mechanical Engineering, or related field AND Proven technical engineering experience OR Master's degree in Mechanical Engineering, or related field AND Proven technical engineering experience OR Doctorate degree in Mechanical Engineering, or related field OR Proven technical engineering experience
Holds a valid driving license that enables travel to and from the worksite, where required for the role
Ability to reliably reach the worksite within 60 minutes to meet operational and on-call requirements
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check

Job Responsibility

Own mechanical system performance across assigned AMS sites (cooling, fuel, water, generators, HVAC)
Ensure high availability and resilience of CE infrastructure aligned to Microsoft standards
Act as technical authority for risk identification and mitigation
Support uptime objectives and prevent SEV events through proactive actions
Provide engineering support to CE operations teams (CET / Shift / Site teams)
Own and approve SOP / MOP / EOP execution, CAB technical reviews, Permit-to-work approvals
Perform incident management (AIR tickets, break-fix oversight), maintenance review (PM & CM quality and trends), daily risk reviews and DCAT checks
Lead Root Cause Analysis (RCA) and incident investigations
Own technical input for SEV incidents and High-risk failures (HRI)
Drive corrective and preventive actions across sites

Fulltime

Mechanical Engineer - CTJ - Poly

Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our c...

Location

United States , San Antonio

Salary:

102100.00 - 202200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Mechanical Engineering, or related field AND 1+ year(s) related technical engineering experience OR Bachelor's Degree in Mechanical Engineering, or related field AND 2+ years related technical engineering experience OR equivalent experience.
Active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph.
Must pass Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Verification of U.S. citizenship.

Job Responsibility

Empower a culture of safety, security, and compliance in all aspect of data center activities
Understand the design and functionality of your datacenter
Work alongside a team of industry professionals across IT and CE spectrum to support 24x7x365 on-site datacenter operations and mechanical infrastructures
Act as the technical authority for on-site operations of large-scale mechanical systems and designs
Collaborate with the Operations and Design team to develop data center mechanical designs from project conception to IFC for new DC projects and major infrastructure upgrades
Establish and coordinate maintenance and safety procedures for redundant high-capacity cooling systems and datacenter backup electrical systems
Steward vital infrastructure supporting deployments of Microsoft's online services
Assist with supplier selection and overseeing mechanical supplier execution
Learn, live, and coach the One Microsoft culture and values
Lead through change by bringing clarity, generating energy, and delivering success

Fulltime

Senior Finance Manager - Datacenter Cost out/Mantis

Empower every person and organization on the planet to achieve more. That’s what...

Location

United States , Redmond

Salary:

97600.00 - 188400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Business Administration, Accounting, Finance, Economics, Data Science or related field AND 2+ years experience in financial analysis, accounting, controllership or finance, or related field OR Bachelor's Degree in Business Administration, Accounting, Finance, Economics, Data Science or related field AND 4+ years experience in financial analysis, accounting, controllership or finance, or related field OR equivalent experience.

Job Responsibility

Serve as the finance owner and product lead for the Microsoft datacenter cost planning and modeling toolset, partnering with Mantis Engineering (CSCP) to build a world-class platform for forecasting, scenario modeling, and long-range planning
Translate finance requirements into engineering specifications: define data models, calculation logic, driver hierarchies, scenario frameworks, and version-control needs
Drive the product roadmap with Mantis — prioritizing capabilities against finance-cycle deadlines
Lead UAT, model validation, and rollout to the Finance community
ensure adoption through training, documentation, and feedback loops
Establish governance (model integrity, change control, audit trails, and reconciliation standards) across the full data-to-reporting lifecycle, ensuring impacts are understood and financial accuracy is maintained
Partner with Finance Data & Engineering (FD&E) to build analytic tools (Power BI, semantic models, data marts) that make Datacenter cost data accessible and transparent across finance and engineering stakeholders
Define the metrics, dimensions, and views that surface unit economics, cost drivers, and variance insights at multiple levels (workload, service, region, hardware generation)
Champion data quality, lineage, and consistency across planning, actuals, and reporting layers
close gaps between source systems and finance-ready data

What we offer

Certain roles may be eligible for benefits and other compensation.

Fulltime

Data Center Engineer II

We're looking for an experienced and independent Data Center Engineer to join ou...

Location

United States , San Francisco

Salary:

169000.00 - 232000.00 USD / Year

Adyen

Expiration Date

Until further notice

Requirements

Experienced in data center engineering as well as operations
Proven experience in the design and delivery of data center infrastructure
Hands-on experience managing modern environments
Strong project management capabilities, with the ability to handle multiple complex initiatives simultaneously
Exceptional attention to detail and organizational skills with a strong sense of ownership and follow-through
Strong communication and collaboration skills to work across technical and vendor teams across regions and timezones
Experienced in DCIM and asset management on a large scale

Job Responsibility

Global Project Leadership- Lead the end-to-end rollout of new data centers and large-scale expansion projects, including the coordination and deployment of High-Performance Computing (HPC) clusters across global sites
Take ownership of our datacenter footprint in the USA
Infrastructure Design & Engineering- Collaborate with partners to oversee the design and installation of critical infrastructure and advanced thermal management
End-to-End Hardware Lifecycle- Manage the full equipment journey from initial Purchase Order (PO) through delivery, installation, and eventual decommissioning/End-of-Life (EoL) processing
Supply Chain & Logistics Coordination- Partner with Procurement to track orders, manage inventory, and synchronize on-site logistics with local DC-ops teams or remote hands
Data Integrity & DCIM Management- Serve as the guardian of the Source of Truth by ensuring all physical changes, assets, and project updates are 100% accurately reflected in the DCIM software
Cross-Functional Transition- Act as the bridge between installation and production by communicating with Platform and Network teams to ensure systems are successfully handed over and integrated
Maintenance & Technical Oversight- Plan and oversee engineering work-streams, including coordinating repairs, replacements, and maintenance executed by third parties
Vendor & Stakeholder Management- Serve as the primary point of contact for DC vendors and internal stakeholders to resolve issues, facilitate audits, and support year-over-year platform growth
Hands-on Field Operations- Maintain a boots on the ground approach by performing direct data center hardware work and traveling globally to provide on-site project oversight

Fulltime

Sr. Manufacturing Test Engineering Manager - Hyperscale Systems

We are seeking a highly experienced Manufacturing Test Engineering Manager to le...

Location

United States , Secaucus

Salary:

Not provided

Sanmina

Expiration Date

Until further notice

Requirements

Bachelor's degree in Engineering, Computer Science, Data Science, or a related field. Advanced degree preferred.
10+ years of experience in manufacturing test engineering, with 5+ years in a leadership role.
Experience in a hyperscale datacenter, server OEM/ODM, or cloud infrastructure company.
Familiarity with global manufacturing operations, including working with CM/JDM/ODM partners.
Experience managing a team of Engineers in an NPI environment
Knowledge of Hyperscale System Architecture, including: Intel-based architecture, Hardware Management (BMC, SMC, etc.), Storage, memory, GPU, networking, Liquid cooling systems, Factory Network Infrastructure design
Understanding test development at multiple levels: board, module, server, rack, cluster.
Background in diagnostics, fault isolation tools, and root cause methodologies.
Experience with high-volume production test.

Job Responsibility

Lead a team of 20-40 that develops and executes comprehensive test strategies for all levels of hyperscale product assembly: module, server, rack, and cluster.
Define and implement Best-in-Class manufacturing test practices and create a clear roadmap to achieve them.
Nurture outside the box thinking and create a risk-taking culture to enable the creation of leading edge test processes and utilities.
Drive continuous improvement through data analytics, root cause analysis, and TTF (Time to Failure) insights.
Provide mentorship and technical leadership to a global team of manufacturing test engineers.
Lead the Definition and implement strategies for manual, automated, and system-level testing of products from NPI (New Product Introduction) to mass production.
Collaborate with Customers and Partners (AMD) to develop Test Requirements and Product Test Plans.
Orchestrate the collaborative development of Manufacturing tests with resources within your team
with Global Test Engineering teams
with cross functional ZT teams

What we offer

Competitive base salary
Performance-based annual bonus eligibility
401(k) retirement savings plan
Tuition reimbursement for eligible education programs
Comprehensive medical, dental, and vision coverage
Mental health resources and employee wellness support programs
Company-paid life and disability insurance
Paid time off (PTO) and company-paid holidays
Parental leave and family care support programs
Structured training programs and on-the-job learning opportunities

Fulltime

Senior Infrastructure & Cloud Engineer

We are looking for a Senior Infrastructure & Cloud Engineer to join our Infrastr...

Location

Spain , Madrid

Salary:

Not provided

Fever

Expiration Date

Until further notice

Requirements

Professional working proficiency in English (C1 or higher) — mandatory
Professional working proficiency in French (C1 or higher) — mandatory
Ability to communicate effectively with international and French-speaking stakeholders, both written and verbal
Ability to create and maintain technical documentation in both languages
5+ years of hands-on experience in Infrastructure Engineering, Systems Engineering, Cloud Engineering, or similar roles
3+ years of production experience working with AWS environments
Proven experience designing, implementing, and operating hybrid cloud architectures
Previous experience in a senior-level position, leading technical initiatives and mentoring engineers
Experience working in international and multicultural environments
Strong knowledge of core AWS services: EC2, VPC, S3, IAM, RDS, Route 53, CloudFront, ELB / ALB

Job Responsibility

Design, deploy, and maintain secure, scalable, and resilient infrastructure across AWS and on-premise environments
Lead the implementation and optimization of hybrid cloud architectures
Manage and optimize AWS services including EC2, VPC, S3, IAM, RDS, Route 53, CloudFront, ECS, EKS, Lambda, API Gateway, and Step Functions
Design and operate secure connectivity solutions between cloud and datacenter environments using AWS Direct Connect, Transit Gateway, VPNs, and VPC Peering
Manage Linux and Windows-based infrastructure platforms and associated services
Administer virtualization platforms such as VMware vSphere, ESXi, Hyper-V, or KVM
Manage enterprise storage environments, including NetApp solutions, backup, replication, and disaster recovery strategies
Implement Infrastructure as Code using Terraform and/or CloudFormation
Develop automation and operational tooling using Python, Bash, and PowerShell
Build and maintain CI/CD pipelines to support infrastructure and platform deployments

What we offer

40% discount on all Fever events and experiences
Home office friendly anywhere in Spain
Responsibility from day one and professional and personal growth
Great work environment with a young, diverse team of talented people to work with
Health insurance and other benefits such as Flexible remuneration with a 100% tax exemption through Cobee's platform
English Lessons
Gympass Membership
Possibility to receive in advance part of your salary through Payflow
Attractive compensation package consisting of base salary and the potential to earn a significant bonus for top performance

Fulltime

Select Country

Datacenter Hardware Operations Lead

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?