CrawlJobs Logo

Debug Program Manager

United States, Austin Employment contract 162640.00 - 243960.00 USD / Year · Job Posted May 04, 2026
Apply Position
Job Link Share

Job Description

This role serves as the debug execution backbone of AMD’s Customer Program Management organization, driving complex silicon, system, and fleet-level issues to resolution across all major customer segments. The Debug Program Manager plays a critical role in ensuring customer success, product quality, and large-scale deployment confidence through disciplined, end‑to‑end debug execution. This is a high-visibility, high-impact position requiring deep technical expertise and strong cross-functional program leadership.

Job Responsibility

  • Debug Program Leadership - Lead debug execution across hyperscale, OEM, HPC, and enterprise customer programs. Own high‑impact, cross‑customer and systemic issues and maintain visibility into top risks and trends.
  • Customer Program Integration - Partner with Customer Program Managers to align debug execution with customer deliverables, platform readiness, and deployment schedules. Support escalations and executive‑level customer engagements.
  • Technical Debug Coordination - Drive cross‑functional debug efforts across design, validation, product engineering, and failure analysis. Align pre‑ and post‑silicon debug strategies and connect lab debug to real‑world customer environments.
  • Field Failure & Fleet Quality Management - Lead resolution of field failures, fleet anomalies, and data center reliability issues. Aggregate fleet, RMA, and production signals and feed learnings back into design, validation, and manufacturing.
  • Governance & Process Improvement - Own debug tracking, prioritization, risk management, and executive reporting. Apply structured methodologies (8D, CAPA, FMEA) and drive continuous improvement in execution speed and consistency.

Requirements

  • 12+ years of experience in the semiconductor industry
  • Deep hands-on experience with silicon debug (pre‑silicon and post‑silicon)
  • Strong background in product engineering, validation, failure analysis, or customer engineering
  • Proven experience managing complex debug programs across multiple customer segments
  • Strong program management skills with ability to drive execution across global, cross-functional teams
  • Excellent written and verbal communication skills, including executive-level engagement
  • Bachelor’s degree in Electrical Engineering, Computer Engineering, Computer Science, or related field required

Nice to have

  • Experience supporting data center, hyperscale, OEM, HPC, or enterprise AI deployments
  • Deep understanding of data center system architecture (CPU, GPU, memory, I/O, RAS, hotplug)
  • Familiarity with manufacturing and test flows
  • Knowledge of reliability and quality metrics (yield, DPM, FIT)
  • Advanced degree preferred

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Debug Program Manager

8 matching positions

New

Technical Program Manager - Manufacturing Test and Readiness - Data Center GPU

We are currently looking for a Manufacturing Technical Program Manager who will ...
Location
Location
United States , Austin
Salary
Salary:
136320.00 - 204480.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's degree in engineering preferably Electrical Engineering
  • Experience in managing a complex Software/Hardware/FW program execution from new product introduction through production
  • Strong background in ASIC, Validation, SW/HW/FW
  • Experience leading strategic tech initiatives, optimizing operations with business goals
  • Proven track record in boosting team productivity, driving cross-functional execution, and enhancing stakeholder engagement among cross-functional teams
  • Knowledge of tools: Office365, JIRA system, Kanban, Confluence, Power BI, AGILE Methodology
  • AGILE project management, PMP certification
  • Demonstrated ability to lead and influence cross organization teams, peers and senior leaders in the organization
Job Responsibility
Job Responsibility
  • Work closely with validation, platform, program, MFG teams to understand components of test program and define an overall test program for the AMD data center GPU products
  • Works with engineering teams to track status and summarize issues/updates
  • Interacts with and guide a wide variety of internal and external teams in support of project milestones
  • Active management of project risks including early identification of key risks
  • Lead structured debug of manufacturing failures, coordinating with ASIC, FW, SW, validation, and platform domain leads by tracking systemic vs. random defects, enabling data-driven decisions for test escapes and yield improvement
  • Architecting Manufacturing Test Program, to identify and eliminate redundant or low-value test coverage, while ensuring high defect screening efficiency
  • Drive test time reduction strategies through content optimization, parallelization, and intelligent binning
  • Define metrics and support ad-hoc Operational activities like Test coverage improvements, Test time optimizations, DPPM, Yield improvements, Debug, planning of bring up at contract manufacturer for overall program success with high sense of urgency and responsibility
  • Fulltime
Read More
Arrow Right

System Design & Debug Manager – AI Customer Engineering

This role serves as the debug execution backbone of AMD's AI Customer Engineerin...
Location
Location
United States , Santa Clara
Salary
Salary:
186080.00 - 279120.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep understanding of data center system architecture (CPU, GPU, FPGA, memory, connectivity, RAS, hotplug)
  • Familiarity with hardware bring up, validation, manufacturing, and test flows
  • Knowledge of reliability and quality metrics (yield, DPM, FIT)
  • Proven years of experience in the semiconductor industry
  • Deep hands-on experience with silicon debug (pre-silicon and post-silicon)
  • Strong background in product development, debug tools, validation, failure analysis, or customer engineering
  • Proven experience managing complex debug programs across multiple customer segments
  • Strong functional team and project management skills with ability to drive execution across global, cross-functional teams
  • Excellent written and verbal communication skills, including executive-level engagement
  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or related field required
Job Responsibility
Job Responsibility
  • Debug Program Leadership - Lead debug execution across hyperscale, OEM, HPC, and enterprise customer programs. Own high-impact, cross-customer and systemic issues and maintain visibility into top risks and trends
  • Customer Program Integration - Partner with Customer Program Managers to align debug execution with customer deliverables, platform readiness, and deployment schedules. Support escalations and executive-level customer engagements
  • Technical Debug Coordination - Drive cross-functional debug efforts across design, validation, product engineering, and failure analysis. Align pre- and post-silicon debug strategies and connect lab debug to real-world customer environments
  • Field Failure & Fleet Quality Management - Lead resolution of field failures, fleet anomalies, and data center reliability issues. Aggregate fleet, RMA, and production signals and feed learnings back into design, validation, and manufacturing
  • Governance & Process Improvement - Own debug tracking, prioritization, risk management, and executive reporting. Apply structured methodologies (8D, CAPA, FMEA) and drive continuous improvement in execution speed and consistency
  • Fulltime
Read More
Arrow Right

Program Manager, Engineering

The GSS Quality Assurance team drives the Quality Assurance strategy for various...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must have completed a Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field
  • Minimum 3 years of experience in User Acceptance Testing (UAT) or software testing
  • Familiarity with test case creation, execution, Bug reporting, and analysis of bugs in pre and post-test phases
  • Strong ability to analyze business requirements and identify test scenarios to generate robust test plan
  • Excellent verbal and written communication skills to collaborate with stakeholders, developers, and business users
  • Knowledge of software development life cycle (SDLC), databases, and testing methodologies
Job Responsibility
Job Responsibility
  • Review & Understand Requirements: Analyze business requirements, Product Requirement Documents (PRD), and flow charts to create relevant test scenarios
  • Collaborate with stakeholders to clarify requirements and improve product testing quality
  • Create & Execute Test Cases: Design effective test plans and test cases based on business workflows and try to discover edge cases for the product so that no defects are leaked
  • Stakeholder Communication: Regularly interact with product managers, developers, and stakeholders to discuss testing progress and critical issues
  • Test Automation & Data Validation: Use JavaScript, JSON, and SQL to validate code logic and extract test data
  • Identification & Reporting: Identify, document, and track bugs and defects. Work closely with developers to troubleshoot and ensure timely fixes
  • Document steps to reproduce issues, expected vs. actual results, and severity levels to assist developers in debugging
  • Scrum Meetings: Before the meeting, the tester comes prepared with all the necessary information to contribute effectively and avoid back-and-forth communication
  • Fulltime
Read More
Arrow Right

Staff System Development Technical Program Manager

As part of the Data Center Platform Engineering organization, the Staff Product ...
Location
Location
United States , Austin, Texas or Secaucus, New Jersey
Salary
Salary:
136320.00 - 204480.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience owning system-level hardware programs across the full development lifecycle
  • from architecture definition and early prototyping through production ramp and sustaining
  • Strong technical depth in hardware systems
  • Demonstrated ability to manage cross-functional execution across hardware engineering
  • BMC/BIOS
  • manufacturing
  • quality
  • operations
  • and supply chain teams
  • Experience driving design for manufacturability
Job Responsibility
Job Responsibility
  • Own system-level technical program strategy and execution across feature delivery
  • product quality
  • and continuity of supply
  • Lead triage and debug activities
  • driving root-cause identification and issue resolution to closure
  • Drive cross-functional alignment between Engineering
  • Product Management
  • Sales
  • Operations
  • and Supply Chain
  • Fulltime
Read More
Arrow Right

Principal Technical Program Manager

Global-e is the world’s leading platform to enable and accelerate global, direct...
Location
Location
United States , Hoboken
Salary
Salary:
Not provided
global-e.com Logo
Global-e
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in technical program or project management, with a track record of delivering complex initiatives on time
  • Hands-on experience working directly with software engineering teams
  • you understand how software gets built
  • Deep fluency with agile methodologies (Scrum, Kanban, SAFe) and the judgment to know when to apply which
  • Experience navigating cross-functional programs spanning engineering, product, and business stakeholders
  • Strong proficiency with program management and engineering tools (JIRA, GitHub, GSuite, and equivalent)
  • Experience working across distributed, international teams is a strong plus
  • Background in B2B SaaS, e-commerce, or platforms is a plus
Job Responsibility
Job Responsibility
  • Define and drive the execution strategy for our most complex, high-impact cross-functional programs—bringing clarity to ambiguous goals, aligning teams around a shared plan, and making sure that plan actually lands
  • Identify the systems, teams, and dependencies involved in delivering new capabilities
  • build the overall plan others rally around
  • track it with precision and adapt it without drama when reality changes
  • Develop a deep understanding of Global-e's technical architecture and business model—well enough to ask sharp questions, challenge assumptions, and translate trade-offs between engineering, product, and commercial stakeholders without losing signal
  • Own risk—actively uncover unknowns before they become problems, make the call on when to escalate, and drive issues through to resolution rather than just flagging them
  • Build trusted relationships across levels and functions—from engineers to executives, from Dublin to Tel Aviv—so you can drive alignment and move fast without relying on authority
  • Balance strategic and tactical: be the person who can help define where we’re going in one conversation and debug why a critical path is slipping in the next
  • Shape how we work – not by imposing process, but by observing where friction lives, proposing lightweight interventions, and knowing when to get out of the way of teams that are already working well
  • Keep senior leadership informed with crisp, well-framed updates – make the complex digestible without losing accuracy
  • Fulltime
Read More
Arrow Right

Technical Program Manager- AI Cluster Validation

We are seeking a Technical Program Manager to lead execution of AI cluster engin...
Location
Location
United States , Austin
Salary
Salary:
162640.00 - 243960.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience leading complex hardware or AI infrastructure programs with ownership across bring-up, validation, and deployment phases
  • Strong technical understanding of GPU-based AI systems, rack architectures, and datacenter infrastructure
  • Proven ability to manage ambiguity, drive debug execution, and lead cross-functional teams without direct authority
  • Strong written and verbal communication skills, including executive-level status reporting
  • Proficiency with program management and execution tools (Jira, Confluence, dashboards, Excel/PowerPoint)
  • Bachelor's or master's degree in systems, EE, CS, or related engineering discipline
  • PMP, Scrum Master, or equivalent program management training
Job Responsibility
Job Responsibility
  • Define, plan, and drive program plans for AI infrastructure systems validation and readiness, including server integration, rack bring-up, and cluster-scale deployment readiness
  • Create and maintain core PM artifacts: schedules, dependency maps, resource forecasts, risk/issue logs, and program dashboards/status reports
  • Identify and drive mitigation plans for issues/risks, including cross-team escalations and corrective actions across multiple engineering areas
  • Drive regular execution reviews with engineering teams and provide concise, data-driven updates to senior leadership
  • Own program execution for GPU-based AI platforms, spanning system bring-up, qualification, scale readiness, and deployment validation across server, rack, and cluster levels
  • Drive alignment across GPU, CPU, firmware, BIOS/BMC, and system teams to ensure readiness for scale testing and customer workloads
  • Track platform issues, and debug dependencies
  • ensure risks are clearly documented, owned, and mitigated
  • Own program planning and execution for multi-node and multi-rack scale testing, including test strategy, scheduling, coverage tracking, and readiness gates
  • Lead end-to-end delivery of rack-level AI solutions, including compute trays, switch trays, cabling, power, cooling, and management infrastructure
  • Fulltime
Read More
Arrow Right

Operations Program Manager, AI Infrastructure

OpenAI’s Hardware organization develops silicon and system-level solutions desig...
Location
Location
United States , San Francisco
Salary
Salary:
177000.00 - 285000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Operations, Engineering, Program Management, or equivalent, within hardware development, manufacturing, or supply chain domains (compute, networking, datacenter, or similarly complex systems)
  • Proven track record leading complex hardware NPI programs end-to-end, from early bring-up through production ramp
  • Strong understanding of manufacturing and supply chain fundamentals, including BOM management, ECO/MCO processes, build readiness, factory test, quality controls, and material planning
  • Demonstrated ability to lead cross-functional teams, influence senior stakeholders, and drive decisions in ambiguous, time-compressed environments
  • Exceptional written and verbal communication skills, with the ability to distill complex issues for executive and external audiences
Job Responsibility
Job Responsibility
  • Act as the single-threaded owner for operational readiness across NPI and ramp, accountable for outcomes from early bring-up through sustained production
  • Translate OpenAI’s infrastructure strategy and engineering objectives into clear operating plans, execution priorities, and decision frameworks
  • Drive alignment across Engineering, Operations, Strategic Sourcing, Finance, Capacity Planning, and Executive stakeholders by framing tradeoffs, risks, and recommendations
  • Proactively identify inflection points where decisions or investments are required to protect long-term scale, reliability, or cost targets
  • Influence operational strategy with manufacturing partners by setting expectations on execution rigor, accountability, and continuous improvement
  • Drive overall NPI build readiness, including material accountability, manufacturing and test readiness, product data availability, factory infrastructure, and qualification plans
  • Lead transition activities from NPI to mass production, partnering closely with Sustaining Operations teams to ensure seamless ownership transfer
  • Translate engineering requirements into actionable, factory-ready plans with tier-1 manufacturing and integration partners
  • Lead cross-functional build and debug cadences
  • ensure issues are clearly owned, aggressively driven, and formally closed with root cause and prevention
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Robotics Program Manager, Product Data Operations

Meta is seeking a Program Manager to lead our teleoperation and robotics data co...
Location
Location
United States , Burlingame
Salary
Salary:
152000.00 - 214000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in a directly related field, or equivalent practical experience
  • 10+ years of experience of project management or operations management experience
  • 5+ years of experience in technical project management, specifically within robotics, autonomous vehicles, or hardware-in-the-loop (HIL) environments
  • Direct Robotics Ops Experience: Proven track record of managing teleoperation and/or collections with complex hardware. You will be working around active robotic cells and specialized sensors
  • Technical Depth: proficiency in Unix/Linux environments and the ability to write/debug scripts (Python, Bash) to automate tasks or troubleshoot data pipelines
  • Stakeholder Management: Experience navigating the tension between "perfect" research data and "high-volume" operational reality, and can influence research and engineering leaders
  • Onsite Leadership: Commitment to being onsite daily in Burlingame to drive physical operations and manage the local environment
Job Responsibility
Job Responsibility
  • Floor Operations and Oversight: on-site leadership for Burlingame based data collection. Responsible for providing floor oversight, creating SOPs / ensure adherence, defining and implementing processes to scale throughput and maintain data quality to meet increasing data demands
  • Collection Ownership: Serve as the single point of truth for data quality and platform health for the site from a collections perspective. You will define the KPIs for "success" and ensure the embodiment is always optimized for collection
  • Researcher Partnership & Feedback Loops: Act as the primary on-site interface for AI researchers. You will translate research goals into operational protocols and provide high-signal "closed-loop" feedback to improve broad data and model performance
  • Strategic Communication & Visibility: Proactively surface technical blockers, hardware failures, and process bottlenecks to leadership. You will manage expectations across engineering and research stakeholders, ensuring the roadmap remains on track
  • Technical Debugging: You are expected to dive into the logs to understand root causes to any regression in throughput or data quality (leveraging Unix/Linux expertise and scripting (Python/Bash)) and recommending / implementing quick fixes or process changes
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right