CrawlJobs Logo

Software Engineer, Fleet Hardware Health

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

230000.00 - 490000.00 USD / Year

Job Description:

As a software engineer on the Fleet Hardware team, you will be responsible for the reliability and uptime of all of OpenAI’s compute fleet. Minimizing hardware failure is key to research training progress and stable services, as even a single hardware hiccup can cause significant disruptions. With increasingly large supercomputers, the stakes continue to rise. Being at the forefront of technology means that we are often the pioneers in troubleshooting these state-of-the-art systems at scale. This is a unique opportunity to work with cutting-edge technologies and devise innovative solutions to maintain the health and efficiency of our supercomputing infrastructure. Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.

Job Responsibility:

  • Build and maintain automation systems for provisioning and managing server fleets
  • Develop tools to monitor server health, performance, and lifecycle events
  • Collaborate with clusters, networking, and infrastructure teams
  • Partner with external operators to ensure a high level of quality
  • Identify and fix performance bottlenecks and inefficiencies
  • Continuously improve automation to reduce manual work

Requirements:

  • Experience managing large-scale server environments
  • A balance of strengths in building and operationalizing
  • Proficiency in Python, Go, or similar languages
  • Strong Linux, networking, and server hardware knowledge
  • Comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool

Nice to have:

  • Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning)
  • Knowledge of hardware management protocols (e.g., IPMI, Redfish)
  • High-performance computing (HPC) or distributed systems experience
  • Prior experience developing, managing, or designing hardware
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana)
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Offers Equity
  • Performance-related bonus(es) for eligible employees

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer, Fleet Hardware Health

Head of Factory Software & Vehicle Diagnostics

At Mach Industries, we are designing and building the world’s most advanced prod...
Location
Location
United States , Huntington Beach
Salary
Salary:
170000.00 - 250000.00 USD / Year
machindustries.com Logo
Mach Industries
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Electrical Engineering, Mechanical Engineering, Robotics, or a related engineering field
  • 10+ years of experience in software engineering, controls engineering, automated testing, manufacturing software, or firmware systems
  • 5+ years of experience leading technical teams or engineering organizations
  • Proven track record of shipping production-critical software or managing large-scale automated test systems
  • Strong systems-level thinking across software, hardware, networks, and manufacturing workflows
  • Deep expertise in one or more of the following areas: Manufacturing Execution Systems (MES)
  • PLCs and industrial controls (Beckhoff, Siemens, B&R, Allen-Bradley)
  • Firmware flashing, bootloaders, and secure signing
  • Vehicle or embedded diagnostics (CAN, LIN, Ethernet, UDS, custom protocols)
  • Test automation frameworks, HIL systems, or end-of-line validation
Job Responsibility
Job Responsibility
  • Build, lead, and develop a cross-functional organization including manufacturing software engineers, controls engineers, firmware-tools engineers, diagnostic engineers, and data platform engineers
  • Own the end-to-end architecture for factory software, including MES-like systems, build tracking, serialization, and production workflow tools
  • Lead the design and implementation of vehicle flashing, commissioning, and diagnostics pipelines inside the factory
  • Define and deliver the vehicle–factory communication framework (CAN, Ethernet, custom protocols, telemetry ingestion, APIs)
  • Oversee all end-of-line (EOL) software, automated test stands, calibration systems, and data acquisition infrastructure
  • Partner with manufacturing engineering, build engineering, design engineering, flight software, and NPI teams to integrate software tools and processes across the vehicle lifecycle
  • Implement highly reliable production-grade software with redundancy, observability, and real-time data health monitoring
  • Drive rapid iteration and continuous improvement of test coverage, automation, and factory efficiency
  • Own uptime, performance, and correctness for all software critical to production and test operations
  • Establish coding standards, architecture strategies, and long-range roadmaps for factory software and diagnostics
What we offer
What we offer
  • Offers Equity
  • healthcare
  • dental and vision plans
  • retirement savings
  • paid time off
  • funds for continuing education, training, and career growth
  • Fulltime
Read More
Arrow Right

Datacenter Hardware Operations Technician, AI Compute Infrastructure - Stargate

OpenAI, in close collaboration with our capital partners, is embarking on a jour...
Location
Location
United States , Abilene, Texas
Salary
Salary:
86400.00 - 228000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in datacenter hardware operations, hardware engineering, or large-scale server maintenance
  • At least 2 years in a senior or lead technician capacity
  • Deep knowledge of high-density server hardware, including x86 platforms, GPUs, storage devices, and power/cooling systems
  • Excel at diagnosing hardware issues, coordinating complex repairs, and maintaining strong working relationships across organizations
  • Comfortable setting technical expectations and validating outcomes through collaboration, not direct management
  • Adapt quickly to changing operational conditions and enjoy solving problems at both the strategic and on-site levels
  • Communicate clearly and build trust across partner teams, vendors, and internal engineering stakeholders
  • Willing to be based full-time at a partner-operated campus
Job Responsibility
Job Responsibility
  • Serve as OpenAI’s primary on-site hardware contact, collaborating with Oracle teams and vendors to plan and coordinate maintenance, repairs, and lifecycle activities
  • Share technical requirements and verify that work performed supports OpenAI’s compute needs and agreed quality targets
  • Coordinate schedules, spare-parts planning, and issue escalation with partner teams to minimize downtime and keep operations running smoothly
  • Work with OpenAI fleet-health engineers to translate software-detected issues into on-site hardware actions in partnership with Oracle
  • Track hardware trends and provide joint recommendations with partner teams for design or operational improvements
  • Prepare documentation and runbooks that capture joint best practices and can be applied at additional campuses
  • Offer technical guidance and context to partner personnel while respecting their operational ownership
  • Collaborate with supply-chain teams to plan spares and manage hardware lifecycle activities
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Software Engineer I

We’re searching for a Software Engineer to join our Vehicle Platforms team. This...
Location
Location
United States , Mountain View
Salary
Salary:
116000.00 - 174000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field
  • Strong proficiency in C++: Solid understanding of data structures, algorithms, and memory management (academic or project-based)
  • Hands-on Systems Experience: At least 1–2 years of relevant experience, which can be demonstrated through multiple internships, significant academic research, or professional work involving hardware-software interaction
  • Linux Fundamentals: Experience working in a Linux environment with an understanding of command-line tools and system basics
  • Exposure to Networking Protocols: Familiarity with the basics of data transport, such as TCP/IP, UDP, or serial communication
Job Responsibility
Job Responsibility
  • Integrate Core Hardware Components: Write and maintain C++ interfaces and drivers that integrate Lidars, Radars, Cameras, and other embedded devices into the Aurora Driver stack
  • Support Platform Bring-up: Participate in the initial software "bring-up" of new vehicle platforms, ensuring the onboard compute and sensors are correctly configured and communicating
  • Optimize Onboard Performance: Profile and optimize code to ensure efficient use of limited CPU, GPU, and memory resources on the vehicle
  • Monitor Hardware Health: Develop and refine software tools that track the real-time health and telemetry of our hardware components to ensure safe fleet operations
  • Validate via HIL Infrastructure: Utilize Hardware-in-the-Loop (HIL) environments to test and verify your code changes against real-world hardware before deployment to the road
What we offer
What we offer
  • annual bonus
  • equity compensation
  • benefits
  • Fulltime
Read More
Arrow Right

Staff Fleet Operations Robot Captain, Atlas

The Fleet Operations Robot Captain is the mission lead for robot health, uptime,...
Location
Location
United States , Waltham
Salary
Salary:
116000.00 - 160000.00 USD / Year
bostondynamics.com Logo
Boston Dynamics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Robotics, Computer Science, Electrical Engineering, or a related technical field preferred
  • 5 years of experience in robotics operations, systems engineering, or a highly technical site reliability role
  • Proven track record of troubleshooting complex electromechanical systems
  • Experience in a high-intensity R&D environment where hardware availability is a critical path to project success
  • Strong understanding of robotics systems, including software stacks, controls, networking, and hardware interfaces
  • Proficiency in reading and interpreting Real-Time (RT) code, Linux system logs, and networking telemetry to troubleshoot controls
  • Experience with issue tracking, work management systems, such as JIRA, and data-driven monitoring tools/dashboards such as Tableau
  • Ability to stay calm and organized while managing multiple high-priority streams of work
  • Strong understanding of safety best practices in high-energy or mobile robotics environments
Job Responsibility
Job Responsibility
  • Act as the first responder for robot hardware and software issues, performing system-level triage to localize failures and determine appropriate escalation paths
  • Analyze logs, telemetry, and system behavior to narrow issues to specific subsystems or components
  • Create high-quality issue reports and tickets that enable subject-matter experts to root cause problems efficiently
  • Identify and escalate fleet-level blockers or safety risks that impact robot availability or operational continuity
  • Maintain the real-time health, configuration, and connectivity of all robots in the fleet
  • Distinguish between hardware availability and software-induced downtime to ensure accurate reporting and planning
  • Ensure fleet status, availability, and constraints are clearly communicated to stakeholders to avoid conflicts or idle time
  • Own and prioritize the daily experiment and test queue to maximize robot utilization during core operating hours
  • Balance risk by sequencing experiments appropriately, enabling progress while minimizing extended downtime
  • Provide clear visibility into what is running now, what is next, and expected completion timelines for all active users
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • paid time off
  • annual bonus structure
  • Fulltime
Read More
Arrow Right

Senior Flight Sim Engineer

Ryanair is currently recruiting for a Senior Flight SIM Engineer to join our tea...
Location
Location
United Kingdom , East Midlands Airport
Salary
Salary:
Not provided
ryanair.com Logo
Ryanair - Europe's Favourite Airline
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Recognized Apprenticeship or Adult Training in electronic and avionic engineering
  • A background in aviation/flight simulation
  • Specializing in Digital Flight Simulation, Complex Data Networking and Process Control using Distributed Micro processing
  • At least 4-6 years’ experience in military and/or civilian Flight Simulation for the experienced positions
Job Responsibility
Job Responsibility
  • Perform daily readiness checks to ensure FSTD is ready for daily operation
  • Maintain Flight Simulator Equipment to EASA FSTD-A Level ‘D’ Standard
  • Maintain the Quality System in accordance with EASA Requirements
  • Review and implement aircraft and vendor service bulletins
  • Install and verify aircraft loadable software
  • Liaise with aircrew at the technical level on aircraft systems to resolve defects
  • Liaise with technical inspectors at initial and recurrent qualifications
  • Liaise with FSTD vendors to rectify defects requiring vendor support
  • Run and check the scheduled Quality Test Guide (QTG) Tests
  • Design and embody modifications on the simulators to maintain compatibility with Ryanair fleet using hardware and software engineering skills
What we offer
What we offer
  • Competitive salary
  • Discounted and unlimited travel to over 230 destinations
  • Death in Service Benefit – Up to 2 times of annual basic salary
  • Unrivalled career progression
  • Fulltime
Read More
Arrow Right

Tech Support Admin Assoc

Ensures HIT environment is functioning at an optimal level and end-users’ needs ...
Location
Location
United States
Salary
Salary:
26.55 - 39.85 USD / Hour
advocatehealth.com Logo
Advocate Health Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent experience in Computer Science, Information System, Engineering, or related field
  • 1 year of experience in a complex IT operating environment
  • Must have excellent interpersonal and technical skills
  • Must troubleshoot problems accurately and possess a positive attitude to deal with a variety of situations
  • Excellent written and oral communication skills
  • Strong customer service skills
  • Excellent problem-solving skills
  • Ability to lift up to 35 pounds without assistance
  • Must be able to travel to various Advocate Health locations
  • 24 hour/7 day on call support required
Job Responsibility
Job Responsibility
  • Ensures HIT environment is functioning at an optimal level and end-users’ needs are met
  • Provides end-user support including training on new device capability, basic device operations, accessing network resources, and device security best practices
  • Follows procedures for managing tickets including timely acknowledgment, appropriate communication with complete resolution documentation
  • Ensures that technology problems and service requests are resolved in accordance with service level objectives and information systems policies
  • Contribute to Endpoint Fleet Technology Management for any Advocate Health Device
  • Analysis, configuration, installation, maintenance, upgrades and retirement of hardware and software which requires 24/7 support in addition to business travel
  • Ensures compliance with Advocate Health HIT standards
  • Preemptively identifies variations from standards and potential technology issues
  • Participate in root cause analysis, engage other Advocate Health Teams and vendors, as needed, to resolve identified issues
  • Perform software installation using defined Advocate Health processes and tools
What we offer
What we offer
  • Paid Time Off programs
  • Health and welfare benefits such as medical, dental, vision, life, and Short- and Long-Term Disability
  • Flexible Spending Accounts for eligible health care and dependent care expenses
  • Family benefits such as adoption assistance and paid parental leave
  • Defined contribution retirement plans with employer match and other financial wellness programs
  • Educational Assistance Program
  • Fulltime
Read More
Arrow Right

Tech Support Admin Assoc

Ensures HIT environment is functioning at an optimal level and end-users’ needs ...
Location
Location
United States
Salary
Salary:
26.55 - 39.85 USD / Hour
advocatehealth.com Logo
Advocate Health Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent experience in Computer Science, Information System, Engineering, or related field
  • 1 year of experience in a complex IT operating environment
  • Must have excellent interpersonal and technical skills
  • Must troubleshoot problems accurately and possess a positive attitude to deal with a variety of situations
  • Excellent written and oral communication skills
  • Strong customer service skills
  • Excellent problem-solving skills
  • Ability to lift up to 35 pounds without assistance
  • Must be able to travel to various Advocate Health locations
  • 24 hour/7 day on call support required
Job Responsibility
Job Responsibility
  • Ensures HIT environment is functioning at an optimal level and end-users’ needs are met
  • Provides end-user support including training on new device capability, basic device operations, accessing network resources, and device security best practices
  • Follows procedures for managing tickets including timely acknowledgment, appropriate communication with complete resolution documentation
  • Ensures that technology problems and service requests are resolved in accordance with service level objectives and information systems policies
  • Contribute to Endpoint Fleet Technology Management for any Advocate Health Device
  • Analysis, configuration, installation, maintenance, upgrades and retirement of hardware and software which requires 24/7 support in addition to business travel
  • Ensures compliance with Advocate Health HIT standards
  • Preemptively identifies variations from standards and potential technology issues
  • Participate in root cause analysis, engage other Advocate Health Teams and vendors, as needed, to resolve identified issues
  • Perform software installation using defined Advocate Health processes and tools
What we offer
What we offer
  • Paid Time Off programs
  • Health and welfare benefits such as medical, dental, vision, life, and Short- and Long-Term Disability
  • Flexible Spending Accounts for eligible health care and dependent care expenses
  • Family benefits such as adoption assistance and paid parental leave
  • Defined contribution retirement plans with employer match and other financial wellness programs
  • Educational Assistance Program
  • Fulltime
Read More
Arrow Right

IT Technical Support Administrator

Location
Location
United States
Salary
Salary:
26.55 - 39.85 USD / Hour
advocatehealth.com Logo
Advocate Health Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent experience in Computer Science, Information System, Engineering, or related field
  • 1 year of experience in a complex IT operating environment
  • Must have excellent interpersonal and technical skills
  • Must troubleshoot problems accurately and possess a positive attitude to deal with a variety of situations
  • Excellent written and oral communication skills
  • Strong customer service skills
  • Excellent problem-solving skills
  • Ability to lift up to 35 pounds without assistance
  • Must be able to travel to various Advocate Health locations
  • 24 hour/7 day on call support required
Job Responsibility
Job Responsibility
  • Ensures HIT environment is functioning at an optimal level and end-users’ needs are met
  • Provides end-user support including training on new device capability, basic device operations, accessing network resources, and device security best practices
  • Follows procedures for managing tickets including timely acknowledgment, appropriate communication with complete resolution documentation
  • Ensures that technology problems and service requests are resolved in accordance with service level objectives and information systems policies
  • Contribute to Endpoint Fleet Technology Management for any Advocate Health Device
  • Analysis, configuration, installation, maintenance, upgrades and retirement of hardware and software which requires 24/7 support in addition to business travel
  • Ensures compliance with Advocate Health HIT standards
  • Preemptively identifies variations from standards and potential technology issues
  • Participate in root cause analysis, engage other Advocate Health Teams and vendors, as needed, to resolve identified issues
  • Perform software installation using defined Advocate Health processes and tools
What we offer
What we offer
  • Paid Time Off programs
  • Health and welfare benefits such as medical, dental, vision, life, and Short- and Long-Term Disability
  • Flexible Spending Accounts for eligible health care and dependent care expenses
  • Family benefits such as adoption assistance and paid parental leave
  • Defined contribution retirement plans with employer match and other financial wellness programs
  • Educational Assistance Program
  • Fulltime
Read More
Arrow Right