Technical Program Manager, AI Infrastructure Job at Cerebras Systems (Sunnyvale)

Principal Technical Program Manager- AI Infrastructure

Microsoft is developing advanced AI infrastructure platforms that require deep i...

Location

United States , Redmond

Salary:

142800.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree AND 8+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
6+ years of experience managing cross-functional and/or cross-team projects.
Ability to meet Microsoft, client, and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check. This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Job Responsibility

Own end-to-end delivery from development through production readiness, including integrated planning across the software stack
Drive execution by managing dependencies, risks, and cross-team tradeoffs to keep delivery on track
Ensure platform and performance readiness (bring-up, key workloads, benchmarking, optimization)
Establish strong operating rhythm (reporting, alignment, and clear escalation paths) while improving tools and processes to increase predictability
Identify systemic gaps and act as the bridge across infrastructure, research, and product, driving alignment and translating complexity into clear, actionable updates

Fulltime

Senior Technical Program Manager – AI Infrastructure, Site Operations

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. ...

Location

United States , Sunnyvale

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

8+ years in Technical Program Management, Infrastructure Ops, or Data Center Ops
Experience leading large, cross-functional infrastructure programs
Strong understanding of: Data center power and cooling fundamentals
Network and storage basics
Hardware-centric platforms
Proven ability to define and operationalize metrics
Strong written and executive-level communication skills

Job Responsibility

Own end-to-end technical programs for data center and site operations
Act as single-threaded owner across: Hardware & Systems Engineering
AI Cloud Infrastructure & Operations
Network & Storage Engineering
Facilities, power, cooling, and colo partners
Drive site readiness for Cerebras Wafer-Scale Engine systems
Partner on installation, commissioning, change management, and break/fix workflows
Lead incident reviews and postmortems
ensure corrective actions are closed
Define and own operational metrics and KPIs, including: Availability and reliability

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Senior Technical Program Manager, AI Execution Architecture

With more than 45,000 employees and partners worldwide, the Customer Experience ...

Location

United States , Redmond

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree AND 4+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
2+ years of experience managing cross-functional and/or cross-team projects

Job Responsibility

Own program delivery end-to-end
Serve as a TPM for cross-functional programs, partnering closely with engineers, product managers, and leadership to drive day-to-day execution, ensuring release readiness through consistent application of milestones, dependency management, risk mitigation, and quality checkpoints as products and requirements evolve
Drive technical execution at scale
Operate across engineering and product teams to maintain integrated program plans, manage dependencies and resource tradeoffs, drive execution against milestones, and communicate progress, risks, and decisions to stakeholders and leadership
Establish and track success metrics for program health and efficiency, and ensure technical discussions result in clear actions, owners, and follow-through
AI-Augmenting
Apply AI to accelerate execution to use tools such as GitHub, GitHub Copilot, AI assistants, and low-code platforms to prototype solutions, automate workflows, and improve delivery outcomes
Operate with technical depth to understand system architecture, APIs, data flows, and integration patterns well enough to drive informed decisions and shape technical direction without owning production engineering
Build the playbook
This is an evolving discipline

Fulltime

Technical Program Manager- AI Cluster Validation

We are seeking a Technical Program Manager to lead execution of AI cluster engin...

Location

United States , Austin

Salary:

162640.00 - 243960.00 USD / Year

AMD

Expiration Date

Until further notice

Requirements

Experience leading complex hardware or AI infrastructure programs with ownership across bring-up, validation, and deployment phases
Strong technical understanding of GPU-based AI systems, rack architectures, and datacenter infrastructure
Proven ability to manage ambiguity, drive debug execution, and lead cross-functional teams without direct authority
Strong written and verbal communication skills, including executive-level status reporting
Proficiency with program management and execution tools (Jira, Confluence, dashboards, Excel/PowerPoint)
Bachelor's or master's degree in systems, EE, CS, or related engineering discipline
PMP, Scrum Master, or equivalent program management training

Job Responsibility

Define, plan, and drive program plans for AI infrastructure systems validation and readiness, including server integration, rack bring-up, and cluster-scale deployment readiness
Create and maintain core PM artifacts: schedules, dependency maps, resource forecasts, risk/issue logs, and program dashboards/status reports
Identify and drive mitigation plans for issues/risks, including cross-team escalations and corrective actions across multiple engineering areas
Drive regular execution reviews with engineering teams and provide concise, data-driven updates to senior leadership
Own program execution for GPU-based AI platforms, spanning system bring-up, qualification, scale readiness, and deployment validation across server, rack, and cluster levels
Drive alignment across GPU, CPU, firmware, BIOS/BMC, and system teams to ensure readiness for scale testing and customer workloads
Track platform issues, and debug dependencies
ensure risks are clearly documented, owned, and mitigated
Own program planning and execution for multi-node and multi-rack scale testing, including test strategy, scheduling, coverage tracking, and readiness gates
Lead end-to-end delivery of rack-level AI solutions, including compute trays, switch trays, cabling, power, cooling, and management infrastructure

Fulltime

Principal Technical Program Manager - AI Frameworks

Microsoft is laying the foundation for the next generation of cloud and AI platf...

Location

United States , Redmond

Salary:

163000.00 - 296400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree AND 8+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
6+ years of experience managing cross-functional and/or cross-team projects
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Execution orchestration: Drive multi-team execution across numerous infrastructure components, services, and systems. Maintain the single source of truth for execution health, delivery status, and alignment to product intent. Ensure schedule integrity and build mechanisms to detect slippage or divergence early
Dependency management: Identify, track, and manage dependencies across org boundaries, plus writing white papers
Build and maintain critical-path plans that reveal coupling, blockers, and downstream risks
Operations: Establish and run cross-team rhythms of business (RoB), including reviews, readiness checkpoints, and launch orchestration
Risk identification and mitigation: Surface execution risks, capacity constraints, misalignment, and timeline threats before they impact delivery. Drive mitigation strategies without altering product priorities or increasing scope. Reveal implications and tradeoffs with clarity and objectivity
Governance and compliance: Ensure teams meet security, privacy, and compliance milestones. Track audit status and ensure risks are surfaced and addressed proactively
Resource utilization: Develop and maintain visibility into team utilization, workload distribution, and bottlenecks. Highlight when teams are overloaded, underutilized, or misaligned with business priorities
Accountability: Ensure engineering execution supports reliability, availability, and operational health requirements. Track execution debt and ensure teams have mechanisms to resolve or mitigate it

Fulltime

Technical Program Manager - Infrastructure

At Microsoft AI, we are on a mission to train the world’s most capable AI fronti...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree AND 6+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
3+ years of experience managing cross-functional and/or cross-team projects
Deeply understand the design, deployment, and optimization of large-scale infrastructure for AI/ML workloads
Have experience collaborating with AI researchers, engineers, and infrastructure teams to deliver robust, scalable solutions
Thrive in a scrappy, 0->1, innovative environment, managing high-stakes, time-sensitive, large-scale programs
Take initiative and enjoy navigating complexity, driving progress across offices, teams, and time zones
Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies in infrastructure and platform engineering

Job Responsibility

Coordinate projects and programs related to AI/ML infrastructure (e.g. pre-training, post-training pipelines, inference & model serving stacks), including end-to-end planning, timelines, milestones, performance metrics, and resource needs
Collaborate with product teams, engineers, researchers, and external partners to identify gaps and drive timelines toward resolution and mitigation
Leverage data and analytics to identify opportunities for improvement, track progress, and measure the impact of quality and efficiency programs
Foster a culture of collaboration, continuous improvement, and growth
Own the status of key infrastructure projects, proactively identifying risks and proposing solutions to ensure timely delivery
Communicate program strategies, progress, and results to executive leadership and key stakeholders, advocating for quality and efficiency within the team
Advance the AI frontier responsibly
Embody Microsoft’s culture and values

Fulltime

Technical Program Manager - AI Research

Meta is seeking a Technical Program Manager to support the Artificial Intelligen...

Location

France , Paris

Salary:

Not provided

Technical Program Manager - AI Research

Meta is seeking a Technical Program Manager to support the Artificial Intelligen...

Location

United Kingdom , London

Salary:

Not provided

Select Country

Technical Program Manager, AI Infrastructure

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?