CrawlJobs Logo

Hardware Systems Engineer, NPI AI

meta.com Logo

Meta

Location Icon

Location:
United States , Menlo Park

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

173000.00 - 245000.00 USD / Year

Job Description:

Hardware Systems Engineers in RTP work closely with Hardware/Software co-design teams, hardware designers, networking teams, system manufacturers, component vendors, capacity engineering, production engineering, production services, and data center operations teams to enable new systems that will be deployed in our production data centers. Ramping to production and solving the datacenter scaling and deployment challenges requires us to take a systems based approach to the new product introduction (NPI) phase.

Job Responsibility:

  • Drive and execute end-to-end system validation strategy (hardware and software), with a focus on various AI/HPC hardware systems in datacenter applications
  • Lead the bring-up, validation, and deployment of cutting-edge hardware systems in large scale deployment with active hands-on participations
  • Explore new use cases with customer teams and identify related test methodologies/test cases accordingly
  • Investigate and troubleshoot complex failures potentially related to Hardware systems with cross-function teams, which may involve different stacks like silicon, firmware, software, etc
  • Triage failures and continue rootcausing while driving project development work forward
  • Identify gaps and opportunities to improve test process and test methodologies across the NPI space
  • Guide automation efforts and data analysis for NPI projects through engagement with related cross-function teams
  • Communicate project progress and assessments to related internal and external teams

Requirements:

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 8+ years of experience in hands-on SW, FW or HW engineering to build any of the following products (AI Silicon, GPUs, TPUs, Autonomous cars, AI servers)
  • Experience in one or more domains such as: ASIC development (Silicon design, bringup, characterization, validation), board level debug, firmware validation, system validation
  • Experience with leading Silicon or System troubleshooting and debugging
  • Experience in developing test specifications, procedures, and debug guides for test solutions

Nice to have:

  • Proficiency in High-Performance Computing (HPC) or AI system architecture at rack level and at scale
  • 5+ years of experience with one or more of the following modules/domains: PCIe, NVlink, Networking, Flash, Memory, CPU, GPU, TPU, DRAM (DDR4/5 or HBM), AI silicon/AI accelerators
  • Hands-on experience in software, firmware, and hardware engineering to develop systems/products for datacenter applications such as video processing, AI/ML, and networking
  • Experience with definition of HW/SW interface requirements for Telemetry, Diagnostics, Debugging
  • Proficiency in Linux environment and server system management
  • Experience with debugging tools for SoCs (e.g., JTAG, GDB, Trace32) and knowledge of common bus protocols such as I2C, SPI, USB, and PCIe
  • Experience in using continuous integration and version control tools for system development and testing
  • Experience integrating lab tools for automated workflows and managing large-scale deployments
What we offer:
  • bonus
  • equity
  • benefits

Additional Information:

Job Posted:
January 30, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Hardware Systems Engineer, NPI AI

Senior Hardware Engineering Project Manager

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
130000.00 - 185000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Electrical Engineering, Mechanical Engineering, or a related field
  • advanced degree preferred
  • 7+ years of experience managing complex hardware development programs
  • Proven success launching at least one consumer hardware product involving cross-functional integration of mechanical and electrical systems
  • Demonstrated ability to manage supplier timelines and component development workflows from DFM/DFX through ramp
  • Skilled in organizing and driving execution across multiple parallel workstreams in fast-paced environments
  • Strong communication and stakeholder management skills with the ability to influence at all levels
  • Comfortable with international travel (up to 15%) to support engineering builds and supplier interactions
  • Strong commitment to embracing and leveraging AI tools in day-to-day tasks, ensuring AI-assisted work aligns with the same high-quality standards as personal contributions
Job Responsibility
Job Responsibility
  • Lead and align cross-functional hardware teams—including Electrical, Mechanical, Firmware, and Compliance Engineering—to deliver new WHOOP products from concept through launch
  • Manage full lifecycle NPI projects, ensuring alignment with technical performance, budget, and schedule targets
  • Develop and maintain detailed project schedules across hardware domains such as PCBAs, plastics, batteries, haptics, wireless components, and engineering test fixtures
  • Coordinate with key stakeholders in Manufacturing, Supply Chain, Quality, Data Science, Signal Processing, Industrial Design, and Product Management to define milestones, remove blockers, and maintain execution momentum
  • Own the execution of hardware builds from early prototypes through Design Validation Testing (DVT), including hands-on support at manufacturing sites
  • Partner with hardware technical leads to plan and prioritize validation activities—from small-scale experiments to full beta testing cycles
  • Manage risk and issue tracking frameworks, ensuring timely resolution and clear communication with stakeholders
  • Lead through ambiguity and change, guiding teams through tactical pivots while keeping DRIs aligned and progress on track
  • Identify and lead process improvement initiatives within WHOOP’s Hardware Product Development framework to drive organizational effectiveness and executional consistency
  • Communicate status updates and influence decisions at the executive level
What we offer
What we offer
  • competitive base salaries
  • meaningful equity
  • benefits
  • generous equity package
  • Fulltime
Read More
Arrow Right

Electrical Engineer - Systems

The Scaling team works on the design of our AI supercomputers, doing everything ...
Location
Location
United States , San Francisco
Salary
Salary:
225000.00 - 445000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 10 years of industry experience, including experience designing hardware systems for data center applications
  • experience in designing EE circuit, CPU/GPU/TPU hw system design, board bring up, system design, integration, and system bring up
  • Master's degree in Electrical Engineering, Computer Engineering, Physics, a related field, or equivalent practical experience
  • Have a strong bias toward action, and won’t take no for an answer
  • Have experience and good knowledge of system design experience in the mechanical and product design areas, from xPUs, board, rack level to data center level
  • Have a strong intrinsic desire to learn and fill in missing skills
  • and an equally strong talent for sharing that information clearly and concisely with others
  • Are comfortable with ambiguity and rapidly changing conditions
Job Responsibility
Job Responsibility
  • Work on Machine Learning/AI hardware systems projects to craft the solutions for current and future data center deployments
  • Worked with hardware team on test vehicle, bring up board design, evaluating end to end system design trade off
  • Lead EE circuit level design, work with power, thermal, mechanical teams to drive AI hardware system design
  • Work with product teams to ensure that goals are met with systems and will work with ASIC/FPGA, Software, and Verification teams to ensure proper verification of features
  • Work with the manufacturing teams to ensure that designs are manufacturable and ready for volume production, and with the field teams to support systems that are deployed in the data center
  • Gather system requirements, define architecture, execute hardware design, and product validation
  • Lead the system bring up, validation, NPI, deployment, and sustaining of hardware solutions
  • Work cross-functionally with Hardware, Software, Mechanical, Thermal, Validation, Manufacturing, and external vendors
  • Drive system development from concept through production
  • Lead debug and root cause analysis of deployed systems
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

System Quality Assurance Test Lead

Leads the design, development, execution, and optimization of system integration...
Location
Location
Taiwan , Taipei
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Computer Engineering, Information Systems, or equivalent
  • Typically 5–8+ years of experience in server system integration, hardware/software validation, or test engineering, with demonstrated leadership responsibilities
  • Strong background in server system architecture, hardware platforms, and HW/SW integration testing
  • Proven experience leading system integration or platform-level testing efforts in NPI or product development environments
  • Solid programming and scripting skills (Python preferred) for test automation, data analysis, and tool development
  • Hands-on experience designing and deploying test automation frameworks, infrastructure, and equipment
  • Experience or strong exposure to AI-assisted testing, automation, log analysis, or data-driven quality improvement approaches
  • Deep familiarity with server operating systems and virtualization platforms, including Windows Server, Red Hat, SUSE, Ubuntu, and VMware
  • Strong problem-solving, debugging, and root cause analysis skills across hardware, firmware, and software layers
  • Excellent communication and collaboration skills
Job Responsibility
Job Responsibility
  • Lead server system integration testing covering hardware, firmware, BIOS, OS, drivers, virtualization, and management software during product development and NPI phases
  • Define and own overall system integration test strategy, test plans, and test coverage, with a strong focus on risk-based testing
  • Review, design, and approve test cases, test methodologies, and test infrastructure, ensuring alignment with product requirements and architecture
  • Drive automation strategy, including Python-based test scripting, framework development, and CI/CD test integration, to improve efficiency and scalability
  • Leverage AI-assisted tools or data analytics techniques to enhance failure analysis, log analysis, test optimization, and issue triage
  • Lead failure analysis and root cause investigations across HW/SW boundaries
  • work closely with BIOS, firmware, OS, and hardware development teams to drive timely issue resolution
  • Analyze test results and quality metrics, provide data-driven insights, and proactively identify quality risks and improvement opportunities
  • Act as the primary technical interface between system integration, platform architecture, hardware engineering, and software development teams
  • Mentor and provide technical guidance to junior engineers
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Read More
Arrow Right

Technical Program Manager

At Hewlett Packard Enterprise (HPE), we're at the forefront of the AI and superc...
Location
Location
United States , Houston; Chippewa Falls; Ft. Collins; San Jose; Remote
Salary
Salary:
119500.00 - 275000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Electrical Engineering, or a related technical field
  • 5+ years of experience in technical program management, software engineering, or hardware engineering
  • Proven experience managing the New Product Introduction (NPI) lifecycle for complex hardware and software products
  • Demonstrated experience leading cross-functional teams to deliver products on schedule
  • Agile, PMP, or other program management certifications are a plus
  • HW, SW,Six Sigma, etc.
Job Responsibility
Job Responsibility
  • End-to-End Program Leadership: Own and manage the entire lifecycle of complex HPC & AI programs, from initial concept and architectural definition through development, validation, and customer delivery. Define program scope, deliverables, and success metrics
  • Cross-Functional Execution: Lead and align a diverse team of hardware engineers (silicon, systems, networking), software developers (firmware, OS, AI frameworks), product managers, supply chain experts, and marketing teams. Foster a collaborative environment to ensure seamless execution
  • Technical Roadmapping & Scheduling: Create and maintain integrated master schedules that track hardware development milestones (e.g., CPU/GPU integration, system design, validation) alongside software release cadences. Identify and manage the critical path and interdependencies
  • Risk & Dependency Management: Proactively identify technical and logistical risks, develop mitigation strategies, and manage complex dependencies between internal teams and external partners (e.g., NVIDIA, AMD, Intel). You're not just tracking risks
  • you're actively solving them
  • Stakeholder Communication: Serve as the central point of communication for your programs. Clearly and concisely report on status, risks, and decisions to executive leadership and key stakeholders. Translate complex technical issues into clear business impact
  • Drive Technical Decisions: Leverage your deep technical knowledge of HPC/AI architectures—including accelerators (GPUs), high-speed interconnects, liquid cooling, and system management software—to facilitate technical trade-offs and drive architectural decisions that align with program goals
  • Manage complex projects following defined PLM process and governance, utilize PDP tools, and implement best practices across each phase of PLM
  • Create and manage high confidence program schedules with clear dependencies, critical path, and systematic methodology to communicate program status. Manage risks and mitigations, and re-plan as events warrant
  • Provide clear, timely and objective communication to executive management and other stakeholders
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Failure Analysis Test Engineer

As part of the Manufacturing Test Engineering team, the Failure Analysis Test El...
Location
Location
United States , San Jose
Salary
Salary:
175000.00 - 225000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Electrical Engineering or a related field
  • 5+ years of experience in manufacturing, product, or electrical engineering
  • Strong knowledge of system hardware architectures and PCB fabrication/assembly processes
  • Strong understanding of electronics circuits, schematics, and PCB Layouts
  • Experience with DFX/DFM, yield assessment, and board-level debug
  • Hands-on experience with PCB CAD/CAE tools (e.g., Allegro, Valor) and understanding of design-to-build flows
  • Proven ability to work across NPI and sustaining environments, from proto bring-up to ramp
  • Collaborative mindset to work effectively with cross-functional teams and suppliers
  • 20% international travel expectation to Taiwan and other future factory locations
Job Responsibility
Job Responsibility
  • Own the electrical debug and failure analysis engineering side of our AI systems, ensuring seamless product introduction from NPI through high-volume ramp
  • Collaborate with hardware, ASIC, diagnostics, test, and reliability engineering to drive diagnosability, manufacturing predictability, yield process and material improvements, and product reliability
  • Focus on PCBA and L10 Server diagnosability and efficiency of repair, and sustaining improvements
  • Troubleshooting & Repair: Diagnose and repair electronic assemblies (PCBA) down to the component level
  • Test Execution: Set up and operate test equipment (oscilloscopes, spectrum analyzers, network analyzers, DMMs)
  • Documentation & Reporting: Maintain detailed records of failure data, repair actions, and update debug procedures
  • Cross-Functional Support: Collaborate with design and production teams to resolve recurring issues and improve product yield
  • Partner with Component Engineering, Supply Chain, and Hardware teams to influence component selection and mitigate manufacturing risk early in the design cycle
  • Perform board bring-up and debug, applying DOE/EFA and hands-on lab work
  • Review and influence Process FMEA and Design FMEA, providing diagnostics input
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Plus Significant Equity
  • Fulltime
Read More
Arrow Right

Technical Program Manager, Hardware Systems

The Compute team works on the design of our AI supercomputers, doing everything ...
Location
Location
United States , San Francisco
Salary
Salary:
207000.00 - 335000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Impressive track record of leading complex projects from concept to launch
  • Ability to work with cross-functional teams to ensure successful execution of programs
  • Creative, detail-oriented, and self-motivated individual
  • Stay up to date on industry trends and technologies
  • Experience driving development timelines for new platform introduction and manage internal review process
  • Design thermal management hardware including material selection, heat sinks, heat exchangers, air movers, pumps, and all supporting equipment
  • Comfortable with ambiguity and rapidly changing conditions
Job Responsibility
Job Responsibility
  • Drive technical initiatives to ensure product success from concept to launch
  • Lead technical program management of next-gen AI hardware system development
  • Lead the team to establish NPI product development process, defining clear milestones and deliverables
  • Drive internal process improvements across multiple terms and functions
  • Provide hands-on program management during analysis, design, development, testing, implementation and post implementation phases
  • Own overall program success spanning the end to end development of the hardware product
  • Develop and manage overall program requirement, scope, schedules, and deliverables with engineering teams, partners and stakeholders
  • Communicate effectively with cross-functional teams to ensure successful execution of programs
  • Utilize problem-solving skills to resolve issues and overcome obstacles, perform risk assessment, risk mitigation and change management on projects
  • Manage multiple projects simultaneously
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Product Quality Engineer

OpenAI’s Hardware organization develops silicon and system-level solutions desig...
Location
Location
United States , San Francisco
Salary
Salary:
123000.00 - 285000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Impressive track record of leading complex projects from concept to production launch
  • Ability to work with cross-functional teams to ensure successful execution of programs
  • Creative, detail-oriented, and self-motivated individual
  • Stay up to date on industry trends and product quality engineering of high-performance computing systems
  • Experience driving development timelines for new platform introduction and managing internal review processes
  • Significant experience driving creativity, quality, reliability, and schedule at multinational JDM/CM vendors
  • Deeply familiar with mechanical and electrical Design For Quality (DFQ), Excellence (DFX), Manufacturability (DFM), Assembly (DFA), Test (DFT), and Design for Reliability (DFR)
  • Experience with 8D and various troubleshooting/root cause analysis methodology such as 5 Whys, Fishbones
  • Excited by capturing, analyzing, and presenting key quality metrics data
  • Keen on root cause corrective actions
Job Responsibility
Job Responsibility
  • Define product quality and reliability targets and drive initiatives to ensure product success from concept to launch
  • Lead and develop the overall product quality and reliability strategy and process for next-gen AI hardware system development
  • Lead the team to establish NPI product quality control process, define reliability qualification strategy, define clear milestones and deliverables, drive internal process improvements across multiple terms and functions
  • Drive the design and execution of comprehensive quality and reliability assurance plans, testing protocols, and validation processes
  • Champion the use of advanced data analytics and statistical methods for analysis and monitoring of key quality metrics
  • Lead failure analysis, root cause identification, and corrective actions for system failures
  • Collaborate with suppliers to ensure that all sub-components meet quality standards and performance specifications
  • Visit vendors' facilities and review their manufacturing environment to assess and improve their procedures
  • Lead component supplier and JDM/CM quality control audits and performance reviews
  • Own overall product quality process development of the hardware product
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right