Datacenter Power and Performance Modeling Engineer Job at AMD (Santa Clara)

Sr. Power Hardware Engineer

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the...

Location

Taiwan , Taipei

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor’s Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience
OR Master’s Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience
OR equivalent experience
3 - 5 years of working experience in datacenter and xPU power delivery design
Experience designing and optimizing high‑efficiency DC‑DC power converters for server or data‑center applications
Proven track record of designing multi-phase buck/buck-boost regulators, digital PWM controllers, bus converters, LDOs, PMICs, charge pumps, ADCs & DACs, etc.
Experience with Data Center 48V power conversion systems
Good knowledge of control theory, closed loop compensation, stability analysis, phase margin, etc.
Strong knowledge of high frequency magnetics design for power electronics
Experience designing printed circuit boards using schematic captures and layout tools such as Cadence

Job Responsibility

Support the architect and design of state-of-the-art xPU power conversion/delivery systems serving a broad range of applications such as AI/machine learning, Data Center general purpose computing, and storage
Participate in Power delivery solution space analysis to drive team insights and architecture forward into most optimal end to end power system implementation
Collaborate with suppliers, industry partners, project managers managers, and cross-functional engineering teams to define power system requirements, specifications, and solutions
Perform simulations, modeling, and validation of power conversion/delivery systems using tools such as SIMPLIS and SPICE
Support project execution from concept through production, including technical documentation and troubleshooting
Support creation of product specifications
Support testing/test planning of all solutions in all stages of development

Fulltime

Operations Planning and Analytics Engineer IV - VI

The electric utility industry is undergoing its most significant transformation ...

Location

United States , Tucker

Salary:

114200.00 - 183700.00 USD / Year

Georgia System Operations

Expiration Date

Until further notice

Requirements

B.S. in Electrical Engineering, Computer Science, Data Science, or related technical field with emphasis in power systems
E-IV: Six years (four years with P.E. License) in Power Systems with at least three years in operational planning/utility operations/energy markets along with one year of data science project work
E-V: Eight Years (six years with P.E. License) in Power Systems with at least four years in operational planning/utility operations/energy markets including two years in data science project work
E-VI: Ten Years (eight years with P.E. License) in Power Systems with at least six years in operational planning/utility operations/energy markets including two years in data science project work along with demonstrated leadership experience
Equivalent Education & Experience: Master's degree in electrical engineering or related technical field and specified years of experience
Licenses, Certifications and/or Registrations: EIT, PE, & PMP are applicable but not required
Special Skills: Power Systems Knowledge
Technical Programming & Data Science
Communication & Project Management

Job Responsibility

Developing advanced grid optimization models including solar production forecasting, N-1 contingency analysis, transmission congestion prediction, and economic dispatch optimization across generation and storage resources
Creating operational intelligence platforms that provide visibility into grid performance, member load patterns, production costs, and system economics through automated reporting and data visualization
Building analytical frameworks for the evolving grid to accommodate high renewable penetration, large datacenter loads, and distributed energy resource integration
Collaborating across the organization with operations planning, operations engineering, transmission planning, member services, and regional entities while representing GSOC in industry committees and technology working groups
Develop and maintain models for grid optimization including solar production forecasting, N-1 contingency analysis using full network power flow models, transmission congestion prediction, and pumped storage dispatch optimization
Coordinate with OPC to integrate generation assets with member-owned resources in dispatch optimization models
Create real-time visibility platforms for grid operations, member load patterns, production costs, and system economic performance
Develop automated reporting systems tracking operational KPIs, member cost allocation, and economic optimization results
Implement data visualization tools supporting both technical operations and executive decision making
Create automated single system dispatch optimization integrating fuel prices, generation and transmission constraints, and resource portfolios

What we offer

comprehensive medical, dental, and vision coverage
a strong retirement program
career development
flexible work schedules
wellness focus
supportive member of the community

Fulltime

System Thermal Design Engineer

AMD is seeking an experienced and highly motivated MTS System Thermal Design Eng...

Location

Malaysia , Penang

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

8+ years of relevant thermal engineering experience
Strong experience in system-level thermal design and validation for CPU/GPU platforms
Experience with server, datacenter, AI, or high-performance computing systems
Proven ability to lead technical activities and drive cross-functional collaboration
Strong problem-solving and debugging skills
System thermal design and optimization
Thermal chamber testing
Thermal instrumentation and thermocouple setup
Thermal characterization and workload validation
Thermal margin analysis

Job Responsibility

Lead system-level thermal design and validation activities for CPU/GPU server and AI platforms
Develop and execute thermal validation plans, methodologies, and thermal design strategies
Drive thermal characterization and thermal margin analysis across various workloads and environmental conditions
Lead thermal issue debugging, root-cause analysis, and corrective actions
Perform chamber testing, instrumentation setup, and thermal data analysis
Collaborate with BIOS, power, SI/PI, mechanical, platform, and architecture teams to optimize thermal solutions
Drive fan tuning and system thermal optimization efforts
Correlate lab measurements with simulation/CFD data
Support platform bring-up, qualification, and customer issue resolution
Mentor junior engineers and provide technical leadership within the team

Research Scientist, HW/SW Co-Design (PhD)

Our teams’ mission is to explore, develop and help productionize high performanc...

Location

United States , Menlo Park

Salary:

122000.00 - 181000.00 USD / Year

Senior Principal AI Infrastructure Architect

The Senior Principal AI Infrastructure Architect is a highly skilled and advance...

Location

Italy , Milano

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)

Job Responsibility

Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps

Fulltime

Cloud Infra Engineering Lead

The Infrastructure Technology Lead Analyst is a senior level position responsibl...

Location

India , Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

12-15 years of relevant experience in a Storage Operations role with sound knowledge of software defined storage and Cloud object storage
Proficient in software defined storage like Dell PowerFlex, power scale, and IBM cloud object storage, netapp StorageGrid solutions
Experience working in Financial Services or a large complex and/or global environment
Sound knowledge of RHEL operating system, VTM remediations/patch installation, firmware upgrades, and troubleshooting experience in a complex software defined storage estate and Cloud object storage estate
Design testing approaches, complex processes, reporting streams, and creating automation of repetitive tasks using shell scripting, Pearl scripting, C scripting, ansible and python scripts
Provide technical/strategic direction and act as advisor/coach to lower-level analysts
Perform hardware capacity forecasting, planning and utilization monitoring
To analyze and apply patches / code upgrade, enhancements and perform management tools upgrades
Apply new technology and processes to improve system operation, supportability, recoverability, availability and performance
Ensure compliance to Citigroup Information Technology Management Policies (CITMP) and Standards

Job Responsibility

Create complex project and task plans related to operational initiatives such as version upgrades, service improvement plans, perform impact analyses, solve/work high impact problems/projects, and provide resolution to restore services
Provide follow the sun operational support model related to SDS and Cloud object storage
Provide Root Cause Analysis (RCA) post restoration of service
Design testing approaches, complex processes, reporting streams, and assist with the automation of repetitive tasks
Provide technical/strategic direction to team members
Review requirement documents, define hardware requirements and update processes and procedures as necessary
Ensure ongoing compliance with regulatory requirements
Responsible for applications dealing with the overall operating system
Conduct project related research
Has the ability to operate with a limited level of direct supervision

Fulltime

Business Analyst II

Microsoft Cloud Operations + Innovation (CO+I) is the engine that powers our clo...

Location

United States , Redmond

Salary:

85100.00 - 169800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Mathematics, Analytics, Engineering, Computer Science, Marketing, Business, Economics or related field AND 1+ year(s) experience in data analysis and reporting, business intelligence, or business and financial analysis
OR Bachelor's Degree in Statistics, Finance, Mathematics, Analytics, Engineering, Computer Science, Marketing, Business, Economics or related field AND 2+ years experience in data analysis and reporting, business intelligence, or business and financial analysis
OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Partner with stakeholders across CO+I to gather business requirements and translate them into analytical frameworks, dashboards, and reporting solutions.
Analyze operational, financial, and performance data to identify trends, risks, and opportunities that inform leadership decisions.
Support executive review rhythms (e.g., fundamentals reviews, planning cycles) by developing insights, summaries, and data narratives.
Contribute to the design and evolution of platforms such as OKR tracking, budget planning, and datacenter insights hubs.
Build and maintain data models, queries, and reports using tools such as Power BI, Excel, Fabric, or similar analytics platforms.
Drive data quality and governance practices by collaborating with data owners and engineering partners.
Create clear documentation, visualizations, and presentations that communicate insights to both technical and non-technical audiences.
Identify opportunities to automate manual reporting processes and improve operational efficiency through scalable solutions.

Fulltime

Signal Integrity Engineer

OpenAI’s Hardware organization develops silicon and system-level solutions desig...

Location

United States , San Francisco

Salary:

225000.00 - 445000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

At least 10 years of industry experience
Experience design hardware system and SerDes testing for data center applications
Experience and good knowledge of system design experience in the SI areas, from chip, SerDes, board, rack level
Experience with PCB, connector and cable design

Job Responsibility

Lead system signal integrity (SI) design for AI supercomputer product in the data center application
Collaborate with chip, package, boards, rack and system engineers, design partners to drive system SI design and develop innovative interconnect and high-speed technologies
Identify and evaluate new technologies and methodologies to improve signal and power integrity in product design, and contribute to the development of new products and technology by providing expertise in signal integrity
Perform simulation and modeling to identify and troubleshoot signal integrity issues
Lead system interconnect design, bring up and qualification
As the scope of the role and team grows, understand and influence roadmaps for hardware partners for our datacenter networks, racks, and buildings

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Select Country

Datacenter Power and Performance Modeling Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?