CrawlJobs Logo

Server Memory Validation Lead

China, Shanghai · Job Posted January 29, 2026
Apply Position
Job Link Share

Job Description

As a Principal Member of Technical Staff (PMTS), you will play a pivotal role in the enablement and validation of memory interfaces for cutting-edge processor silicon. Your responsibilities will include leading efforts to innovate, test, and debug within the memory sub-system of new processors, as well as consistently interacting and collaborating with memory vendors. A key part of your role will involve the enablement, debugging, and qualification of these vendors to be included in the Approved Vendor List (AVL). This role requires advanced technical expertise, leadership, and strategic vision to drive efficiencies and quality improvements.

Job Responsibility

  • Lead, mentor, and manage a high-performing team of validation engineers dedicated to memory sub-system validation on AMD Server platforms
  • Provide strategic technical direction and leadership in the development of sophisticated test and validation plans for DDR interfaces
  • Design and implement comprehensive system-level memory sub-system validation test plans across AMD products, continually seeking innovation
  • Facilitate collaboration with silicon design teams, firmware, software, and automation teams to ensure smooth integration and validation processes
  • Consistently interact and collaborate with local memory vendors for enablement, debugging, and qualification to be included in the Approved Vendor List (AVL)
  • Deliver insightful project reports, analyze risks, and propose innovative solutions, ensuring timely resolution and strategic foresight
  • Utilize smart data analysis to constantly refine processes, enhance efficiency, and drive quality improvements across validation methodologies
  • Identify strategic opportunities to elevate validation processes, enhance automation tools, and drive methodologies for improved efficiency and effectiveness

Requirements

  • Master’s degree with 10+ years of experience or a Bachelor's degree with 12+ years of demonstrated expertise in the development & execution of platform level electrical & functional test plans
  • Extensive hands-on experience and profound expertise in debugging I/O interfaces such as DDR memory systems
  • Proven experience in vendor management and AVL qualification processes
  • Thorough familiarity with signal measurement equipment, schematics, and layout documentation
  • Exceptional written and verbal communication skills, adept at conveying complex technical information
  • Advanced programming skills in Python, Ruby, Perl, or similar languages
  • Demonstrated self-motivation, strategic thinking, and program management skillsets, with a focus on innovation and team leadership

Nice to have

DDR/GDDR/LPDDR Memory test experience on electronic components such as Processors would be considered a big plus

What we offer

AMD benefits at a glance

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Server Memory Validation Lead

8 matching positions

Microsoft SQL Server Analysis Services (SSAS) Technical Lead

Sopra Steria is a major player in Europe with 50,000 employees across nearly 30 ...
Location
Location
India , Noida
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in DAX: complex measures, time intelligence, context transition, and row context handling
  • Solid experience with SSAS Tabular Models, Power BI Data Modeling
  • Translate business requirements into scalable and optimized data models and reports
  • Optimize DAX queries for performance and memory usage in large data models
  • Create and manage semantic data models using best practices in dimensional modeling
  • Collaborate with data engineers, analysts, and architects to ensure high data quality and governance
  • Proficiency in SQL/SQL server for data validation and backend query support
  • Experience working with large datasets and optimizing performance in enterprise environments
  • Data Modeling: Proficiency in designing data models that underpin reports and dashboards
  • DAX (Data Analysis Expressions): Mastery of DAX, a formula language used in Power BI for calculations and data manipulation
Job Responsibility
Job Responsibility
  • Translate business requirements into scalable and optimized data models and reports
  • Optimize DAX queries for performance and memory usage in large data models
  • Create and manage semantic data models using best practices in dimensional modeling
  • Collaborate with data engineers, analysts, and architects to ensure high data quality and governance
What we offer
What we offer
  • Inclusive and respectful work environment
  • Open to people with disabilities
  • Fulltime
Read More
Arrow Right

Data Center Technician

As a Microsoft Data Center Technician (DCT), you will stage, set up and perform ...
Location
Location
Norway , Oslo
Salary
Salary:
530000.00 - 752000.00 NOK / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High school diploma, GED, or equivalent and basic knowledge of computer hardware and components AND experience supporting IT equipment or related technology
  • Ability to work shifts, including shift assignments during non-standard business hours that may include evening, nighttime, weekends, and/or holidays
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
  • Valid driver’s license
  • Fluency in conversational English
  • Experience supporting IT equipment or related technology
  • Applicable certifications: CompTIA (A+, Server+, Network+), Basic Structure Cabling (BSC)
Job Responsibility
Job Responsibility
  • Performs diagnostics and troubleshooting following standard procedures, quickly identifies the cause(s) of issues, and replaces faulty components with minimal customer and business disruption
  • Performs post-execution quality checks and verifies that grounding, staging, labeling, and cabling are set up properly according to safety protocols, deployment standards, and planned Network Design Tasks (NDTs)
  • Decommissions hardware for simple changes and refreshes (e.g., memory upgrades, rebuilds) following standard procedures with minimal guidance
  • Follows procedures to communicate, report, and escalate incidents to appropriate Microsoft data center operations management units, Technician Leads, and engineering specialists
  • Assists and provides guidance to other technicians to complete challenging or complex tasks
  • Completes required training aligned to the role and workload
  • observes more experienced technicians to gain hands-on experience and relevant on-the-job training
  • Contributes to a positive and effective team environment by sharing information with others, contributing to regular team meetings, asking questions, and staying apprised of the status of others' work
  • Has pride and a sense of accountability for the service quality, completeness, and resulting user experience
  • displays accountability and ownership of the data center facilities
  • Fulltime
Read More
Arrow Right

Infrastructure Hardware Technical Program Manager (Server And Network Systems)

As an Infrastructure Hardware Technical Program Manager (Server and Network Syst...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. or M.S. in Computer Science, Electrical/Computer Engineering, or equivalent experience
  • 8+ years in Technical Program Management (or similar delivery leadership) for server, network, or infrastructure platforms from concept through production
  • Experience coordinating complex server and/or datacenter network programs across OEM/ODMs, switch vendors, and internal engineering teams
  • Working knowledge of server architecture (CPU/NUMA, memory bandwidth, PCIe, NIC and storage IO) and enough networking fundamentals (leaf-spine fabrics, switch platforms, high-performance interconnects) to run effective technical reviews
  • Familiarity with Linux server fleet management (provisioning, firmware/BIOS, drivers, field triage)
  • Strong multi-team program execution skills: integrated plans, risk management, dependency tracking, and executive-level communication
  • Ability to operate in ambiguity and keep parallel server and network workstreams aligned
Job Responsibility
Job Responsibility
  • Own end-to-end program execution for server systems and network equipment in Cerebras clusters, including new platforms, refreshes, and major component/config changes
  • Drive requirements gathering and convert inputs into executable plans with clear milestones, readiness gates, and cross-functional deliverables
  • Represent Cluster Architecture in executive reviews, OKR cycles, and leadership/customer forums as needed
  • Build and manage integrated schedules across vendors and internal teams, track dependencies, critical path, and risks
  • Manage OEM/ODM and switch/vendor engagements (RFI/RFP, samples, escalations, roadmap alignment)
  • Partner with Compute / Server Platform / Network Architects to turn architectural decisions into qualification plans, acceptance criteria, and rollout strategies
  • Lead qualification and release readiness (lab/staging validation, regression tracking, go/no-go decisions)
  • Own risk and change management into production, including versioning, rollout sequencing, and stakeholder communication
  • Ensure operational readiness with deployment and fleet teams and maintain alignment with rack/physical DC owners on power, cooling, space, and cabling constraints
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
  • Fulltime
Read More
Arrow Right

Senior Principal AI Infrastructure Architect

The Senior Principal AI Infrastructure Architect is a highly skilled and advance...
Location
Location
Italy , Milano
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
  • Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
  • Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
  • Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
  • Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
  • Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
  • Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
  • Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
  • Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
  • Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Job Responsibility
Job Responsibility
  • Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
  • Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
  • Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
  • Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
  • Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
  • Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
  • Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
  • Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
  • Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
  • Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps
  • Fulltime
Read More
Arrow Right

Product Development Eng.

The Data Center Platform Engineering organization is excited to hire a highly sk...
Location
Location
Taiwan
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid experience in system or SOC level hardware design & debug
  • Bring-up of complex computer systems that reside in Data Centers (CPU/GPU)
  • Strong system level debugging skill from HW, SW & Driver perspective
  • Experience in working directly with local ODM partner or key OEM/CSP customers and enabling them to production
  • Team player with passion and willing to do whatever it takes for business success with a keen sense of urgency and strong drive
  • Significant experience in SoC and/or System bring-up, validation, verification and debug of complex issues
  • Strong system design experience on CPU, GPU, HSIO, Memory and PWR, capable to provide schematics and board review for partners & customers
  • Familiar with GPU & CPU system architecture and design is a plus
  • Familiar with HSIO debug & strong Signal Integrity knowledge is a plus
  • Be the go-to person for debugging of issues for a given system/platform/subsystem
Job Responsibility
Job Responsibility
  • Work closely with ODM/OEM partners on Data center server solutions to support during NPI development phase till mass production
  • Develop and execute comprehensive customer enablement plans for Data Center products
  • Lead and coordinate cross-functional engineering teams to deliver on customer requirements
  • Transform internal product deliverables into customer-ready solutions, aligning with customer product development needs
  • Plan and manage customer/partner training programs, ensuring readiness of documentation, collateral, hardware, and software tools, etc.
  • Orchestrate and facilitate diverse workshops to identify, assess, and mitigate customer product risks
  • Own and drive customer program readiness to ensure successful and timely product launches (TTM)
  • Collaborate effectively with global technical teams across North America and APAC time zones to deliver results
  • Provide design review and recommendation for partner/customer's platform using AMD Data Center solution
  • Provide leadership input/recommendations to design, validation & manufacturing test based on the SOC to platform design
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance.
Read More
Arrow Right

Platform Engineer

We are looking for a Platform Engineer to support and enhance a large-scale Linu...
Location
Location
United States , Duluth
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience in DevOps or platform engineering with strong responsibility for Linux or UNIX infrastructure
  • Proven expertise working with Kubernetes and OpenShift in enterprise or multi-environment settings
  • Practical knowledge of automation and infrastructure-as-code tools such as Ansible and Terraform
  • Experience managing operating system patching, upgrades, system tuning, and infrastructure monitoring
  • Familiarity with identity and access management integrations, backup validation, and disaster recovery support
  • Understanding of container networking, ingress configuration, load balancing, and persistent storage concepts
  • Ability to support CI/CD processes for containerized applications and collaborate effectively with application teams
Job Responsibility
Job Responsibility
  • Administer a broad Linux server estate across production, quality assurance, and development environments, ensuring consistent performance and system stability
  • Plan and carry out operating system maintenance activities, including version upgrades, routine patching, and kernel updates with minimal service disruption
  • Track infrastructure health and fine-tune compute, memory, storage, and network performance to improve reliability and efficiency
  • Oversee account access controls and support authentication connectivity with enterprise directory services
  • Verify backup integrity and contribute to disaster recovery preparedness through regular review and testing activities
  • Manage Kubernetes and OpenShift platforms across environments, including deployment, day-to-day administration, and ongoing support
  • Lead cluster lifecycle activities such as installation, expansion, patching, version upgrades, and capacity adjustments
  • Configure and maintain core platform components such as namespaces, security settings, ingress, load balancing, and persistent storage resources
  • Assist application teams with container onboarding, issue resolution, and workload optimization while supporting CI/CD integration for container-based deployments
  • Monitor platform health with tools such as Prometheus, Grafana, and native OpenShift capabilities, and use findings to guide capacity planning
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • 401(k) plan
Read More
Arrow Right

Platform Engineer

We are looking for a Platform Engineer to support and enhance a large-scale Linu...
Location
Location
United States , Duluth
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience in DevOps or platform engineering with strong responsibility for Linux or UNIX infrastructure
  • Proven expertise working with Kubernetes and OpenShift in enterprise or multi-environment settings
  • Practical knowledge of automation and infrastructure-as-code tools such as Ansible and Terraform
  • Experience managing operating system patching, upgrades, system tuning, and infrastructure monitoring
  • Familiarity with identity and access management integrations, backup validation, and disaster recovery support
  • Understanding of container networking, ingress configuration, load balancing, and persistent storage concepts
  • Ability to support CI/CD processes for containerized applications and collaborate effectively with application teams
Job Responsibility
Job Responsibility
  • Administer a broad Linux server estate across production, quality assurance, and development environments, ensuring consistent performance and system stability
  • Plan and carry out operating system maintenance activities, including version upgrades, routine patching, and kernel updates with minimal service disruption
  • Track infrastructure health and fine-tune compute, memory, storage, and network performance to improve reliability and efficiency
  • Oversee account access controls and support authentication connectivity with enterprise directory services
  • Verify backup integrity and contribute to disaster recovery preparedness through regular review and testing activities
  • Manage Kubernetes and OpenShift platforms across environments, including deployment, day-to-day administration, and ongoing support
  • Lead cluster lifecycle activities such as installation, expansion, patching, version upgrades, and capacity adjustments
  • Configure and maintain core platform components such as namespaces, security settings, ingress, load balancing, and persistent storage resources
  • Assist application teams with container onboarding, issue resolution, and workload optimization while supporting CI/CD integration for container-based deployments
  • Monitor platform health with tools such as Prometheus, Grafana, and native OpenShift capabilities, and use findings to guide capacity planning
What we offer
What we offer
  • Medical
  • Vision
  • Dental
  • Life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

Systems Design Engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great prod...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or above in Electrical/Electronic/Power Engineering, or related field
  • 10+ years of hands-on experience in power hardware design, preferably for laptop platforms
  • Proficiency in mobile products, including battery technology/management, battery fuel gauge operation/calibration
  • Proficiency in charger, firmware-level settings for BIOS/UEFI and Embedded Controllers (EC)
  • Expertise in hardware power management, in-depth knowledge of System & SoC power states (ACPI, C-States, P-States) and low-power modes for peripherals (display, Wi-Fi, SSD, USB)
  • Proficiency in DC-DC converter topologies (buck, boost, multiphase), PWM controllers, MOSFET, and power stage
  • Experience with simulation tools (e.g., Simplis, SPICE, PowerDC, Ansys SIWave)
  • Strong knowledge of PCB layout considerations for power circuits (e.g., parasitic reduction)
  • PDN and PI analysis capability is a plus
  • Familiarity with lab equipment: oscilloscopes, spectrum analyzers, and power integrity test setups
Job Responsibility
Job Responsibility
  • Own the endtoend power architecture for x86 platforms (desktop, notebook, or server), spanning APU/CPU, memory, chipset, PCIe, and auxiliary power rails
  • Design, simulate, and optimize high-efficiency DC-DC power solutions for battery life and high-performance CPU/APU platforms
  • Schematic design, PCB layout reviews, and component selection to meet power delivery requirement (e.g., voltage ripple, transient response, efficiency etc.)
  • Collaborate with cross-functional teams (Arch, EE, thermal, layout, SI/PI etc.) to delivery motherboard reference board on time
  • Conduct PDN(power delivery network) simulation and analysis, ensure power integrity
  • Handle system-level power optimization, from platform hardware (SoC, display, WIFI, SSD, charger, connectivity etc.) to firmware (BIOS/UEFI, EC), ensuring superior battery life
  • Perform fullstack idle power breakdown, measurement, and bottleneck identification
  • Own batterylife KPI improvements across platform/silicon / firmware / system layers
  • Improve VR/PDN lightload efficiency and cooperate with power architecture teams
  • Drive display idle power optimization through PSR, DCN, refreshrate and driver improvements
  • Fulltime
Read More
Arrow Right