CrawlJobs Logo

System Design & Debug Manager – AI Customer Engineering

United States, Santa Clara Employment contract 186080.00 - 279120.00 USD / Year · Job Posted May 29, 2026
Apply Position
Job Link Share

Job Description

This role serves as the debug execution backbone of AMD's AI Customer Engineering organization, driving complex silicon, system, and fleet-level issues to resolution across all major customer segments. The System Design Manager plays a critical role in ensuring customer success, product quality, and large-scale deployment confidence through disciplined, end-to-end debug execution. This is a high-visibility, high-impact position requiring deep technical expertise and strong cross-functional program leadership.

Job Responsibility

  • Debug Program Leadership - Lead debug execution across hyperscale, OEM, HPC, and enterprise customer programs. Own high-impact, cross-customer and systemic issues and maintain visibility into top risks and trends
  • Customer Program Integration - Partner with Customer Program Managers to align debug execution with customer deliverables, platform readiness, and deployment schedules. Support escalations and executive-level customer engagements
  • Technical Debug Coordination - Drive cross-functional debug efforts across design, validation, product engineering, and failure analysis. Align pre- and post-silicon debug strategies and connect lab debug to real-world customer environments
  • Field Failure & Fleet Quality Management - Lead resolution of field failures, fleet anomalies, and data center reliability issues. Aggregate fleet, RMA, and production signals and feed learnings back into design, validation, and manufacturing
  • Governance & Process Improvement - Own debug tracking, prioritization, risk management, and executive reporting. Apply structured methodologies (8D, CAPA, FMEA) and drive continuous improvement in execution speed and consistency

Requirements

  • Deep understanding of data center system architecture (CPU, GPU, FPGA, memory, connectivity, RAS, hotplug)
  • Familiarity with hardware bring up, validation, manufacturing, and test flows
  • Knowledge of reliability and quality metrics (yield, DPM, FIT)
  • Proven years of experience in the semiconductor industry
  • Deep hands-on experience with silicon debug (pre-silicon and post-silicon)
  • Strong background in product development, debug tools, validation, failure analysis, or customer engineering
  • Proven experience managing complex debug programs across multiple customer segments
  • Strong functional team and project management skills with ability to drive execution across global, cross-functional teams
  • Excellent written and verbal communication skills, including executive-level engagement
  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or related field required

Nice to have

Master's degree preferred

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

System Design & Debug Manager – AI Customer Engineering

8 matching positions

Forward Deployed AI Engineering Manager, Enterprise

As a Forward Deployed AI Engineering Manager on our Enterprise team, you'll be t...
Location
Location
United States , San Francisco; New York
Salary
Salary:
216000.00 - 270000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience
  • 2+ years of Management experience with strong fundamentals in data structures, algorithms, and system design
  • Production Python expertise with experience in modern ML/AI frameworks (e.g., LangChain, LlamaIndex, HuggingFace, OpenAI API)
  • Experience with cloud platforms (AWS, GCP, or Azure) and modern data infrastructure
  • Strong problem-solving skills with the ability to navigate ambiguous requirements and rapidly iterate toward solutions
  • Excellent communication skills with the ability to explain complex technical concepts to both technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Partner directly with enterprise customers to understand their technical infrastructure, data pipelines, and business requirements
  • Design and implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs)
  • Build robust data connectors and ETL pipelines to ingest, process, and prepare customer data for AI workflows
  • Deploy and configure AI models and agents within customer security and compliance boundaries
  • Develop production-grade AI agents tailored to customer use cases across domains like customer support, data analysis, content generation, and workflow automation
  • Architect multi-agent systems that orchestrate between different models, tools, and data sources
  • Implement evaluation frameworks to measure agent performance and iterate toward business objectives
  • Design human-in-the-loop workflows and feedback mechanisms for continuous agent improvement
  • Create sophisticated prompt engineering strategies optimized for customer-specific domains and data
  • Build and maintain prompt libraries, templates, and best practices for customer use cases
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity grant
  • commuter stipend
  • Fulltime
Read More
Arrow Right

Engineering Manager, Managed AI Platforms

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
210000.00 - 255000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven Leadership: Demonstrate a strong track record of people management, leading with empathy, and fostering a sustainable workload for your teams
  • 4+ years of experience managing high-performing engineering teams
  • Strong track record of hiring, developing, and retaining engineering talent
  • Technical Acumen: Possess the capability to manage a team where problems, opportunities, and strategies may not be fully defined, driving clarity and direction
  • This is a hands-on keyboard leadership role. This role requires the leader to be involved in architecture decisions and to be regularly making technical contributions
  • Expertise in LLM-based agentic systems is a requirement
  • Cross-Functional Collaboration: Exhibit excellent technical communication skills, both verbal and written, to effectively collaborate across diverse roles and functions
  • This team will have customers across the engineering organization, this role requires the ability to field multiple requests from multiple teams while still fulfilling a unified mission
  • Global Scale Experience: Experience building and operating global services at scale. This architecture will be adopted by teams around the world
  • Organizational Prowess: Be extremely organized, capable of managing complex initiatives and team priorities effectively
Job Responsibility
Job Responsibility
  • Agent scaffolding: Enable secure, scalable agentic development inside of Crusoe
  • Integrate a unified observability layer to debug and observe agent behavior
  • Secure architecture that ensures proper data handling and sandboxing for headless agent execution
  • Designing on-ramps for internal teams to deploy and operate their own agents the "Crusoe" way
  • Workflow optimization: Play a critical role in the development of internal tooling and AI-enabled workflows
  • Architect the next generation of developer tooling built on the internal Crusoe platform
  • Build lowest-common-denominator flexible solutions that leave room for a wide range of use cases
  • Push the envelope: Stay plugged into the latest in the AI enablement space
  • What new and innovative ways can Crusoe employ agentic workflows across engineering and beyond
  • How can Crusoe leverage its infrastructure as a cloud provider to retain an edge in the agent space
What we offer
What we offer
  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Principal Software Engineering Manager - Data Science & Engineering

The MSRC Data Science team is responsible in building data pipelines, data minin...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Leads team on the disciplined use of, and improving artificial intelligence (AI) tools and practices across the software development lifecycle (SDLC)
  • Guides team on proactively taking responsibility for the content of their AI-generated requirements, design documents, code, and other assets, and assisting other members of the team to do the same
  • Leads team on incorporating Responsible AI practices into the SDLC to ensure appropriate controls over AI-generated assets
  • Coaches team on applying SDLC and engineering health measures (e.g., Accelerate, SPACE framework, Engineering System Success Playbook [ESSP]) to guide improvements to processes and practices, especially those involving AI
  • Leads team on experimenting with AI tools and practices to improve their own capabilities, and providing recommendations on how to adopt them to others
  • Reviews debugging tools, tests, logs, telemetry, and other methods, and acts as an expert for others to proactively verify assumptions while developing code before issues occur across products in production
  • Guides team to perform machine learning/data extraction, transformation, and loading (ETL) pipelines (e.g., data collection, cleaning) based on data prepared
  • Guides the architecture of scalable pipelines and datasets
  • Influences the direction of the team
  • Begins to anticipate potential data pipeline issues and provides solutions
  • Fulltime
Read More
Arrow Right

Principal Software Engineering Manager -Graphics

Would you like to lead the next wave of innovation for Windows and build breakth...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Solid expertise in graphics pipelines, rendering engines, or composition systems.
  • Significant interest and experience with AI-powered, agentic coding tools and platforms
  • Solid design, coding, debugging, teamwork, partnership and communication skills
  • Experience in leading large dev teams to achieve complex goals on time and on budget.
  • Proven ability to find a shippable solution given conflicting and ambiguous requirements.
  • Excellent Technical skills in Driving Design, Architecture with cross product and services dependencies.
  • Ability to Drive Innovation with customer obsession.
  • Experience with cross group design and coordination is an advantage.
  • You must be self-driven, curious to learn, proactive, and result oriented.
Job Responsibility
Job Responsibility
  • Influence and align the product vision by collaborating with customers, partners, product management, and engineering teams.
  • Managing a team of high-caliber Software Engineers, ensuring project and development excellence and technical leadership.
  • Deliver high quality results with full ownership and take the product to next level.
  • Own career development of team through active coaching.
  • Create a strong team culture of engineering excellence, customer passion, collaboration, diversity and inclusion. And of course, having fun too!
  • Hire and develop the best!
  • Fulltime
Read More
Arrow Right

Software Engineering Manager

Build and lead a new engineering team in Noida that will own and operate the too...
Location
Location
India , Noida
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or a related technical field, and 9+ years of hands-on software engineering experience
  • 3+ years of technical leadership experience (e.g., engineering manager, team lead, or equivalent), including mentoring and driving execution across multiple workstreams
  • Proven ability to design, build, debug, and operate distributed systems with a strong focus on reliability, security, and privacy
  • Hands-on coding experience in one or more of: C#, Java, JavaScript, Python, and/or React
  • Experience with data platforms and services such as Synapse, Azure Data Explorer (ADX), Cosmos DB, and/or SQL
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including successful completion of the Microsoft Cloud Background Check upon hire and every two years thereafter
Job Responsibility
Job Responsibility
  • Provide technical and people leadership for a new team responsible for SOX compliance tooling, including hiring, onboarding, coaching, and performance development
  • Own execution and prioritization across taking over existing systems/tools and delivering new capabilities as requirements evolve
  • Define architecture and drive design reviews for scalable, reliable, and secure distributed systems that support SOX control execution and evidence generation
  • Ensure operational excellence: on-call readiness, operational health metrics, incident response, and continuous improvements to reliability and resiliency
  • Build security monitoring, auditing, and reporting capabilities that enable transparency, accountability, and continuous compliance at global scale
  • Implement robust control mechanisms aligned to access/change/operations controls while meeting regulatory, security, privacy, and SOX requirements
  • Partner with cross-functional stakeholders across engineering, compliance, and audit to define requirements and deliver auditor-ready artifacts
  • Contribute to long-term technical strategy and roadmap, ensuring compliance is embedded by design
  • Maintain a high engineering bar through coding standards, testing practices, code reviews, AI adoption, and maintainable design
  • Use data-driven insights to improve governance workflows and reduce toil for engineering teams executing controls
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager

The Windows Enterprise & Security (ENS) team builds foundational capabilities th...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Experience leading software engineering teams delivering large-scale, production systems
  • Solid technical background in systems, platform, or security-oriented engineering
  • Proven ability to hire, develop, and retain high-performing engineering talent
  • Experience driving execution across complex, cross-team dependencies in a global environment
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead multiple engineers delivering Windows enterprise and security features that ship broadly across the Windows ecosystem
  • Own end-to-end execution: roadmap planning, technical design, delivery, live-site readiness, and sustained quality
  • Balance feature delivery with security, reliability, and operational excellence, ensuring a consistently high engineering bar
  • Drive disciplined execution across multiple parallel workstreams, aligning priorities with ENS and Windows leadership
  • Hire, onboard, and develop engineers across levels, building a strong, inclusive, and growth-oriented engineering culture
  • Coach engineers and managers on technical depth, ownership, collaboration, and career progression
  • Foster a culture of accountability, learning, and customer obsession, aligned with Microsoft values
  • Create clarity through goals, expectations, and feedback loops, especially in a fast-growing IDC environment
  • Champion AI-assisted and agentic engineering workflows across the team
  • Establish guardrails and best practices for responsible use of AI tools—ensuring correctness, security, privacy, and compliance
  • Fulltime
Read More
Arrow Right

Principal Software Engineering Manager

Within AI Platform, the Azure AI Search team powers rich knowledge base experien...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years of people management experience, including hiring, coaching, performance management, and building high-trust, high-performing teams
  • Experience owning end-to-end customer and developer experiences across one or more product surfaces including defining requirements and driving delivery
  • Experience with distributed systems and production operations (reliability, incident response, observability/telemetry, and safe release practices)
  • Experience designing and delivering secure services, including identity/access patterns and privacy/compliance considerations
  • Demonstrated use of AI-assisted engineering tools to improve SDLC quality and velocity, including responsible use of AI-generated assets
  • Strong customer empathy with a track record of using qualitative and quantitative feedback to iterate product experiences
Job Responsibility
Job Responsibility
  • Leads the disciplined adoption and continuous improvement of AI tools and Responsible AI practices across the SDLC, ensuring accountability for AI-generated assets and using engineering health metrics to drive measurable process improvements and share learnings
  • Leads engineering excellence for production services by driving diagnosability and incident prevention (debugging, telemetry, retrospectives), strengthening secure and privacy-preserving operations (least privilege), and raising code quality through timely, high-signal reviews, automated analysis, and best practices (including GenAI) to deliver secure, maintainable, high-performing code while proactively managing blockers and risks
  • Managers deliver success through empowerment and accountability by modeling, coaching, and caring. Model: Live our culture. Embody our values. Practice our leadership principles. Coach: Define team objectives and outcomes. Enable success across boundaries. Help the team adapt and learn. Care: Attract and retain great people. Know each individual’s capabilities and aspirations. Invest in the growth of others
  • Leads cross-group planning and execution (project/release/work management) by breaking long-term vision into milestones, driving estimation and capacity planning, and ensuring secure, compliant delivery with operational readiness (flighting, rollback, and disaster recovery)
  • Partners with internal and external stakeholders to validate user requirements and feasibility, incorporates customer insights and success metrics (including accessibility/globalization), and advocates for customer security and privacy needs across the solution
  • Fulltime
Read More
Arrow Right

Principal Software Engineering Manager

Microsoft Teams is a mission critical collaboration platform used by hundreds of...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • 12+ years of software engineering experience building and operating large-scale services, including distributed systems
  • 12+ years leading engineering teams, with solid hiring, coaching, and delivery outcomes
  • Proven track record delivering complex backend/platform capabilities in areas such as identity/auth, messaging, media, data platforms, networking, or developer platforms
  • Solid expertise in system design, architecture, and cloud-scale engineering (multi-region deployment, resiliency patterns, and observability)
  • Technical depth in modern backend development (for example: C#, Java, C++, Go), distributed data stores, messaging, and cloud platforms (for example: Azure/AWS/GCP)
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Technical Leadership & Architecture: Own the architecture and evolution of core Teams services, building highly available, low-latency distributed systems
  • Lead design reviews and set standards for reliability, scalability, security, privacy, and performance (SLAs, threat modeling, capacity planning, and performance budgets)
  • Identify and mitigate systemic technical risks
  • drive simplification to reduce operational load and improve platform sustainability
  • Stay hands-on where it matters—deep dives, debugging, and prototypes—while empowering teams to execute independently
  • Execution & Delivery: Be accountable for shipping core platform capabilities that support millions of concurrent users with high availability and low latency
  • Drive predictable execution using agile practices, solid program management, and data-driven prioritization (quality, cost, and customer impact)
  • Own live-site excellence: SLAs, alerts, incident response, post-incident learning, and automation to prevent recurrence
  • Balance feature delivery with investments in platform robustness, scalability, and engineering efficiency
  • People & Organization Leadership: Lead, hire, mentor, and grow a team of software engineers
  • Fulltime
Read More
Arrow Right