CrawlJobs Logo

GPU Hardware Security Architect

United States, Santa Clara Employment contract 232000.00 - 348000.00 USD / Year · Job Posted June 16, 2026
Apply Position
Job Link Share

Job Description

We are seeking a self-motivated GPU Hardware Security Architect to join our growing GPU Architecture team. In this role, you will help define AMD’s next-generation GPU security architecture across AI/ML, Compute, and Discrete Gaming products. The position will drive security features that protect register access, firmware interfaces, configuration controls, and sensitive customer data. Responsibilities include threat modeling, access-path analysis, security architecture, and cross-functional execution support, and require strong technical depth, problem-solving ability, and communication skills.

Job Responsibility

  • Investigate and architect next‑generation GPU IP features that enhance protection of register access and sensitive customer data
  • research and quantify attack vectors through which bad actors can gain access to customer data or GPU configuration
  • experienced in the use of AI tools to identify and resolve security issues
  • ensure AMD's security offerings meet industry standards and anticipate industry trends
  • work with AMD SoC and other IP teams to track trends and development directions for GPU security
  • write and deliver architectural specifications to development teams (HW, SW, Firmware, etc)
  • architect new GPU algorithms to improve GPU security without compromising performance
  • provide technical and cross-functional debug support to execution teams
  • perform design and threat analysis of firmware and hardware
  • deliver architecture specifications and/or review proposals from internal/external sources
  • guide execution teams in comprehending and following GPU security guidelines
  • collaborate with internal AMD teams to deliver best-in-class security solutions

Requirements

  • Relevant work experience focused on computer architecture and security
  • strong understanding of factors influencing register and firmware access protections at chip, system, and product levels
  • thorough knowledge of RTL design and/or verification
  • proven track record of providing and following through with pragmatic security requirements
  • expert at tackling multi-variable problems via system-level modeling, testing and characterization, trend analysis/projection, and model verification
  • Computing and Graphics architecture
  • a drive to continuously learn and expand architectural breadth and depth
  • understands GPU security/power/performance, SW and FW access, and system-level trade-offs
  • Graphics Shader behavior is a benefit
  • knowledge of Machine Learning and AI usage is a benefit
  • threat modeling tools
  • accomplished listener who can analyze, abstract, communicate, and converge on the best ideas
  • develop strong partnerships over time with program stakeholders
  • excellent verbal, written, and presentation skills
  • excellent interpersonal, organizational, analytical, planning, and teamwork skills
  • Bachelor’s in Computer Engineering/Electrical Engineering or Masters in Computer Engineering/Electrical Engineering or PhD in Computer Engineering or Electrical Engineering

Nice to have

  • Graphics Shader behavior
  • knowledge of Machine Learning and AI usage
  • threat modeling tools

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

GPU Hardware Security Architect

8 matching positions

Senior AI Hardware Architect

Join the Systems Planning and Architecture (SPARC) team within Microsoft’s Azure...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer, and/or government security screening requirements for this role
  • Passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead performance analysis, profiling, and benchmarking across GPU and in-house AI accelerator architectures, applying rigorous data and statistical analysis to identify complex performance bottlenecks, root causes, and optimization opportunities across hardware, software, and system layers
  • Run and analyze end-to-end AI models on production-like serving infrastructure, performing deep dives into modern AI serving stacks (e.g., optimized LLM serving frameworks, schedulers, runtimes, and memory management systems) to understand performance behavior, scalability limits, and system-level trade-offs
  • Provide data-driven recommendations and architectural trade-offs to senior technical leadership, balancing performance, complexity, cost, quality, reliability, and development timelines to inform accelerator and system architecture decisions
  • Develop and implement technical solutions to complex performance, quality, and design challenges, including kernel-level optimization, architectural tuning, and system-level performance improvements across multiple products or feature areas
  • Correlate on-silicon measurements, software traces, and kernel execution behavior with architectural models and simulators, ensuring alignment between measured performance and architectural intent, and identifying gaps that drive future design enhancements
  • Design, build, and evolve data correlation, analysis, and visualization tools and workflows that scale performance insight, accelerate debugging, and improve clarity and communication of optimization opportunities across teams
  • Lead and contribute to design and performance documentation, including architecture reviews, performance reports, functional specifications, and customized analyses
  • communicate progress, risks, and recommendations within and across teams, and help identify and mitigate significant project risks
  • Fulltime
Read More
Arrow Right

AI Safety Business Development Manager

The Business Development Senior Manager entails responsibilities for working wit...
Location
Location
United States , Santa Clara
Salary
Salary:
186720.00 - 280080.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in SLMs/LLMs, transformer architectures, inference optimization, and on-device AI
  • Strong engineering background in Python, C/C++, or Rust
  • Experience with safety tuning, RLHF/RLAIF, policy enforcement layers, and red-teaming
  • Understanding of provenance, trusted compute, and security architectures
  • Strong command of AI risk frameworks and global regulatory landscape
  • Ability to convert governance requirements into technical controls and partner-ready solutions
  • Demonstrated cross-functional leadership across hardware, software, policy, and research teams
  • Bachelors or Masters degree in computer engineering/Electrical Engineering preferred
Job Responsibility
Job Responsibility
  • Lead global Trustworthy AI strategy aligned with ISO/IEC 42001, NIST RMF, EU AI Act
  • Design and implement AI safety guardrails, dialogue governance, jailbreak defense, content moderation, and accuracy assurance pipelines
  • Drive program management for safe-by-design development across hardware, software, and model teams
  • Translate silicon-level trust features into differentiated safety solutions for global partners
  • Define 'trusted inference mode' by integrating hardware signals with software guardrails
  • Build research and policy collaborations with universities, governments, AAAI, standards bodies, NGOs, and industry alliances
  • Represent the organization in global forums, conferences, and executive briefings
  • Architect safe AI deployment patterns for national AI strategies, enterprise rollouts, and sovereign AI initiatives
What we offer
What we offer
  • AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Principal Product Manager/Architect - Foundry Inference Platform (CoreAI)

We are seeking a Principal Product Manager/Architect to define and guide the tec...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 10+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • 1. Product Reliability: Own the product direction for Microsoft Foundry inference, with a primary mandate to make the platform the most reliable enterprise inferencing service available. This includes defining architectural standards for global serving, multi-region resiliency, automated failover, and platform-managed disaster recovery
  • Drive architectural alignment across global routing, capacity pooling, observability, and control plane abstractions to ensure consistent availability, predictable recovery behavior, and simplified customer operations at scale
  • Partner with engineering, infrastructure, and security leaders to ensure reliability targets, SLAs, SLOs and recovery objectives are designed into the platform by default
  • 2. GPU Fleet Efficiency & Capacity: Set the product direction for GPU fleet efficiency and capacity management, guiding platform-level design decisions that maximize utilization, minimize fragmentation, and accelerate timetomonetization of new hardware and models
  • This includes shaping the architecture for global capacity pooling, intelligent scheduling, fungibility across workloads, automated demand forecasting, and softwaredefined allocation
  • The Product Manager/Architect is expected to influence architectural investments across inference utilization, model serving, and hardware/system performance
  • 3. Strategic Customer & Innovation Engagement: Act as a senior technical advisor and architect for Foundry’s most innovative and strategic customers
  • Engage directly with customers on deep technical challenges, including largescale model migrations, reliabilitysensitive production deployments, and advanced serving architectures
  • Support competitive and strategic initiatives by articulating Foundry’s architectural advantages, turning bespoke requests into scalable features
  • 4. Cross-Company Technical Leadership: Serve as a unifying architectural voice across product management, engineering, infrastructure, and partner teams
  • Fulltime
Read More
Arrow Right

Director, Product Manager (Artificial Intelligence Hardware)

Do you want to be at the forefront of innovating the latest hardware designs to ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 3+ years of experience working on AI systems as an architect or a product manager
  • Strong understanding of several of the below areas: GPU Architecture, Scale Up and Scale out networking, Data Center Specifications, Model Architecture, Workloads and System Reliability
  • 7+ years of technical product management experience, including products within datacenter Hardware systems and/or Cloud infrastructure
  • 7+ years experience creating product roadmap(s) from conception to launch, driving end-to-end program execution, defining product go-to-market strategy, and leading program direction discussions
  • Understanding and passion for technical product management including presentation skills and written communication
  • Ability to build relationships and influence in a matrix organization
  • managing cross team deliverables including program costs, schedules, risks and issues mitigation, establishing process and framework for large scale collaboration
Job Responsibility
Job Responsibility
  • Collaborate with customers and partner organizations to define future generations of Artificial Intelligence (AI) Hardware for Azure at Microsoft
  • Lead the strategic product vision, roadmap and product requirements for our next generations of AI hardware platforms
  • Identify and prioritize customer needs, market opportunities, and competitive gaps, and translate them into clear and actionable product requirements and specifications
  • Drive executive decision making for new investments, including competitive analysis, program goals and business requirements, architectural concepts, risk management strategies, financial analysis, schedule and hardware strategy
  • Lead technical programs from concept to execution, collaborating with architecture, engineering and business teams to develop and drive end-to-end product development
  • Develop and maintain a high level of technical proficiency in AI workload requirements, AI technology landscape and AI Industry roadmaps
  • Engage with senior leadership, highlighting risks across functional teams and providing recommendations to support product level decisions
  • Operate effectively in ambiguity. Apply process where it creates value, and design process where it’s needed. Recognize the situations where each approach is most appropriate
  • Fulltime
Read More
Arrow Right

Principal Supercomputing Operations Software Engineer

Microsoft Azure’s Artificial Intelligence and High Performance Computing (AI/HPC...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 6+ years of experience operating large‑scale distributed systems, high‑performance computing (HPC), or artificial intelligence (AI) infrastructure in production environments
  • Demonstrated ownership of mission‑critical production infrastructure with direct impact on service availability, GPU workloads, and customer SLAs
  • Hands‑on experience operating and debugging interconnect fabrics supporting large‑scale compute workloads
  • Strong Linux systems knowledge with experience debugging low‑level infrastructure issues across operating systems, drivers, and services
  • Proven ability to reason across hardware, firmware, drivers, and software stacks to diagnose and resolve complex production issues
Job Responsibility
Job Responsibility
  • Serve as the technical authority and DRI for InfiniBand and GPU interconnect fabric operations across large scale AI supercomputing environments, ensuring sustained GPU availability, training stability, and SLA compliance
  • Lead and orchestrate complex, high severity fabric incidents end to end, including detection, triage, mitigation, recovery, and root cause analysis, making high impact decisions under ambiguity
  • Perform deep, multi layer systems debugging across InfiniBand, Subnet Manager, GPU interconnect, PCIe, GPUs, firmware, drivers, and OS layers to identify true root causes at fleet scale
  • Drive operational excellence and systemic prevention by identifying recurring failure patterns, defining reliability models and failure domains, and authoring authoritative TSGs, playbooks, and escalation frameworks adopted across teams
  • Architect and drive automation, telemetry, diagnostics, and tooling that materially improve detection, observability, debuggability, and mean time to mitigation, raising the operational bar for interconnect fabrics across the platform
  • Fulltime
Read More
Arrow Right

Senior Principal System Solution Architect

As Microsoft's cloud business continues to grow the ability to deploy new offeri...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Technology Leadership – Drive concepts and definition for industry leading platforms focused on Azure data center covers Compute, Storage and GPU and AI accelerator-based solutions, with a strong focus on high performance and low latency networks at the forefront of density and speed
  • Cross-Functional Collaboration – Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals. Influence platform decisions across rack, chassis, and pod-level implementations
  • Technology Partnerships – build strong relationships with our technology and development partners to drive leading edge innovation into our next generation products
  • Customer Focus – partner across Microsoft teams and collaborate to deliver industry leading products
  • Design Strategy – champion innovative technical principles, design strategy and forward-looking technologies related to industry trends
  • Architecture Clarity - Distill and articulate architectural tradeoffs for the solution development encompassing electrical, optical, signal integrity, mechanical, power, and thermal inputs in terms of key metrics such as TCO, performance, schedule, and risk
  • Industry Influence - Drive and influence technology providers and design partners towards optimal components and solutions to meet the future requirements for Azure’s infrastructure
  • Fulltime
Read More
Arrow Right

Senior AI Network Architect

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience
  • OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 3+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
  • Proven expertise in system architecture across compute, networking, and accelerator domains
  • Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
  • Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
  • Familiarity with signal integrity modeling, link training, and physical layer optimization
Job Responsibility
Job Responsibility
  • Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms, with a focus on ultra-high bandwidth, low-latency backend networks
  • Drive system-level integration across compute, storage, and interconnect domains to support scalable AI training workloads
  • Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals
  • Influence platform decisions across rack, chassis, and pod-level implementations
  • Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co-develop differentiated solutions
  • Represent Microsoft in joint architecture forums and technical workshops
  • Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains
  • Frame decisions in terms of TCO, performance, scalability, and deployment risk
  • Lead design reviews and contribute to PRDs and system specifications
  • Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Senior Data Center Operations Engineer is responsible for the bedrock of our...
Location
Location
United States , Santa Clara
Salary
Salary:
147000.00 - 237500.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, IT, or equivalent experience
  • 5+ years of experience specifically operating Red Hat OpenShift (OCP) in a production environment
  • Deep experience racking/stacking and cabling high-density GPU systems (e.g., NVIDIA DGX or similar) and specialized AI/ML hardware
  • Advanced proficiency in Ansible or Pulumi for automating bare-metal provisioning and cluster configuration
  • Strong Python and Bash skills for developing custom health-check scripts and API integrations
  • Expert-level CoreOS and RHEL administration, including kernel tuning and systemd management
  • Solid understanding of BGP, VLAN tagging, LACP, and Load Balancing (F5/NGINX) essential for cluster ingress
  • Experience with vSphere or KVM, and persistent storage solutions like OpenShift Data Foundation (ODF) or Ceph
  • Familiarity with DCIM tools (Netbox) and monitoring stacks ( ELK/Lok ..etci)
  • Ability to lift and move equipment up to 50 pounds (e.g., high-density 2U/4U servers)
Job Responsibility
Job Responsibility
  • Design and development of a scalable distributed management plane infrastructure to manage Palo Alto Networks’ next-generation network security solutions
  • Ensure 99.99% availability by architecting resilient physical layouts and automating the deployment, scaling, and self-healing capabilities of our production clusters
  • Monitor and maintain data center systems with a focus on 'Zero Single Point of Failure' (ZSPoF) architecture for OpenShift control planes and worker nodes
  • Implement and manage OpenShift 4.x clusters across multiple power and cooling zones to ensure 99.99% uptime
  • Design, test, and execute automated failover strategies and backup/restore procedures using tools like OADP (Velero) and Red Hat ACM
  • Perform routine maintenance and upgrades using GitOps (ArgoCD) and the Machine Config Operator to ensure zero-downtime node evacuations and patching
  • Resolve deep-stack hardware and software issues, from faulty GPU firmware to OpenShift SDN (OVN-Kubernetes) network latencies
  • Coordinate with vendors for specialized hardware (e.g., NVIDIA, Dell, Cisco) while maintaining strict security and firmware compliance
  • Optimize rack density for high-performance GPU clusters while managing thermal loads and power distribution (PDU) to prevent circuit-trip outages
  • Maintain accurate documentation and integrate hardware health metrics (IPMI/SNMP) into Prometheus/Grafana for proactive alerting
  • Fulltime
Read More
Arrow Right