CrawlJobs Logo

Technical Program Manager - Infrastructure

United States, Mountain View 139900.00 - 274800.00 USD / Year · Job Posted April 20, 2026
Apply Position
Job Link Share

Job Description

At Microsoft AI, we are on a mission to train the world’s most capable AI frontier models, pushing the boundaries of scale, performance, and product deployment. Our Infrastructure team is responsible for building and optimizing the platforms, systems, and tools that enable large-scale training, deployment, and serving of foundation models across Microsoft AI. Help deliver world-class infrastructure for foundational AI models at Microsoft AI. This role is part of Microsoft AI's Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.

Job Responsibility

  • Coordinate projects and programs related to AI/ML infrastructure (e.g. pre-training, post-training pipelines, inference & model serving stacks), including end-to-end planning, timelines, milestones, performance metrics, and resource needs
  • Collaborate with product teams, engineers, researchers, and external partners to identify gaps and drive timelines toward resolution and mitigation
  • Leverage data and analytics to identify opportunities for improvement, track progress, and measure the impact of quality and efficiency programs
  • Foster a culture of collaboration, continuous improvement, and growth
  • Own the status of key infrastructure projects, proactively identifying risks and proposing solutions to ensure timely delivery
  • Communicate program strategies, progress, and results to executive leadership and key stakeholders, advocating for quality and efficiency within the team
  • Advance the AI frontier responsibly
  • Embody Microsoft’s culture and values

Requirements

  • Bachelor's Degree AND 6+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • 3+ years of experience managing cross-functional and/or cross-team projects
  • Deeply understand the design, deployment, and optimization of large-scale infrastructure for AI/ML workloads
  • Have experience collaborating with AI researchers, engineers, and infrastructure teams to deliver robust, scalable solutions
  • Thrive in a scrappy, 0->1, innovative environment, managing high-stakes, time-sensitive, large-scale programs
  • Take initiative and enjoy navigating complexity, driving progress across offices, teams, and time zones
  • Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies in infrastructure and platform engineering

Nice to have

  • Bachelor's Degree AND 10+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • 8+ years of experience managing cross-functional and/or cross-team projects
  • 1+ year(s) of experience reading and/or writing code (e.g., sample documentation, product demos)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Technical Program Manager - Infrastructure

8 matching positions

Technical Program Manager - Infrastructure and Platform

Drive infrastructure execution at Hadrian as our Technical Program Manager. You'...
Location
Location
United States , Los Angeles
Salary
Salary:
200000.00 - 250000.00 USD / Year
hadrian.co Logo
Hadrian Automation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong program management fundamentals (scheduling, dependencies, risk)
  • Excellent stakeholder management across engineering, operations, and executive levels
  • Clear written and verbal communication, especially for technical audiences
  • Experience managing complex technical programs with multiple dependencies
  • Ability to build consensus and drive decisions without direct authority
  • Experience with JIRA, Confluence, and agile methodologies
  • Understanding of capacity planning and resource optimization
  • Comfort with ambiguity and changing priorities
  • Proven track record delivering large-scale technical projects
  • Ability to translate between technical and business stakeholders
Job Responsibility
Job Responsibility
  • Execute against the infrastructure, IT and data team's quarterly roadmap and OKRs (owned by respective Directors)
  • Coordinate cross-functional initiatives across Infrastructure, IT, Data Engineering, and Security organizations
  • Plan and coordinate complex technical initiatives (cloud migrations, Kubernetes rollouts, edge deployments)
  • Facilitate cross-functional planning between Infrastructure, Software, Manufacturing, and Data teams
  • Build and maintain project tracking systems with clear visibility into status, risks, and blockers
  • Lead capacity planning for compute, storage, networking across multiple sites
  • Drive infrastructure, IT, and data cost optimization initiatives with Finance
  • Run vendor evaluation processes for critical infrastructure and IT purchases
  • Create and maintain architecture decision records (ADRs) and technical documentation
  • Facilitate RFC (Request for Comments) processes for major technical decisions
What we offer
What we offer
  • Medical, dental, vision, and life insurance plans for employees
  • 401k
  • Relocation support may be provided for certain situations, based on business need
  • Flexible vacation policy
  • Fulltime
Read More
Arrow Right

Senior Technical Program Manager – Infrastructure

As a Senior Technical Program Manager – Infrastructure, you’ll help scale and ev...
Location
Location
United States , Chicago
Salary
Salary:
140000.00 - 200000.00 USD / Year
optiver.com Logo
Optiver
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hold a degree in computer science, electrical engineering, or a related STEM field
  • Have 5+ years of experience in engineering (IT infrastructure, hardware, or networks) and/or technical program management in a high-performance environment
  • Strong technical understanding of data centers, compute, storage, and network systems
  • Proven success leading large-scale, complex infrastructure or hardware delivery programs
  • Excellent communicator and structured problem-solver with a bias for action
  • Motivated by building and scaling the systems that power high-performance trading
  • Highly self-motivated and thrive in environments with minimal oversight
  • Demonstrated hands-on experience in successfully managing complex, technical and business projects through completion
  • Possess strong problem-solving skills with a pragmatic approach
  • Have excellent communication skills, with the ability to adapt to different audiences
Job Responsibility
Job Responsibility
  • Lead delivery of global infrastructure programs – from data center expansion and hardware lifecycle management to network and platform modernization
  • Partner with engineering and operations leadership to define scope, manage dependencies, and drive execution across teams
  • Build program frameworks to track progress, manage risk, and improve delivery predictability
  • Drive continuous improvement in infrastructure planning, capacity management, and process maturity
  • Ensure alignment between infrastructure, development, and trading stakeholders to support Optiver’s global growth
What we offer
What we offer
  • 401(k) match up to 50%
  • Fully paid health insurance
  • 25 paid vacation days alongside market holidays
  • Extensive office perks, including breakfast, lunch and snacks, regular social events, clubs, sporting leagues and more
  • The opportunity to work alongside best-in-class professionals from over 40 different countries
Read More
Arrow Right

Technical Program Manager, Infrastructure Security

The Infrastructure Security Engineering (ISE) organization is responsible for sa...
Location
Location
United States , Bellevue
Salary
Salary:
168000.00 - 234000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Information Security, or related field (or equivalent experience)
  • 5+ years of experience in technical program management in infrastructure or security engineering
  • Understanding of infrastructure security concepts (network security, cloud security, identity & access management, etc.)
  • Experience leading large-scale, cross-functional programs in a fast-paced environment
  • Demonstrated experience with communication, organizational, and stakeholder management skills
  • Demonstrated experience analyzing complex technical problems and driving solutions
Job Responsibility
Job Responsibility
  • Embody program leadership by leading cross-functional programs focused on infrastructure security, including planning, execution, and delivery of key security initiatives
  • Stakeholder management across engineering, product, legal, and compliance teams to define requirements, set priorities, and align on security goals
  • Lead program risk management and process improvements to ensure timely delivery of programs
  • Define and drive measurement and reporting to ensure communication and program transparency across leadership and stakeholders
  • Provide technical direction and guidance on security best practices, architecture reviews, and secure development lifecycle
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Technical Program Manager, Infrastructure

Glean is seeking a Senior Infrastructure Technical Program Manager (TPM) to lead...
Location
Location
United States , Palo Alto
Salary
Salary:
198000.00 - 235500.00 USD / Year
glean.com Logo
Glean
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Computer Science, Engineering, or a related technical field
  • 8-10+ years of experience in technical program management, infrastructure, or SRE, with at least 3-5 years managing infra or platform-scale programs
  • Proven success delivering cross-functional infrastructure programs in B2B or enterprise environments where scalability, uptime, and performance are critical
  • Experience working closely with Infra, SRE, and ML/AI teams on distributed systems or data infrastructure
  • Strong understanding of cloud infrastructure (AWS, GCP, or Azure) including compute, networking, storage, and orchestration systems
  • Ability to structure complex multi-quarter infrastructure programs with clear milestones and measurable impact
  • Strong written and verbal communication and ability to manage through ambiguity, anticipate scaling challenges, and align teams across priorities
  • Builder mindset with focus on automation, reliability, and efficiency
Job Responsibility
Job Responsibility
  • Lead end-to-end infra programs spanning compute, networking, storage, orchestration, and AI workloads
  • Partner with Engineering to define standards for environment provisioning, deployment automation, and configuration governance
  • Develop and operationalize frameworks for runtime health, scaling, and disaster recovery
  • Drive consistency and automation across deployment orchestration systems
  • Establish clear metrics for reliability, performance, and cost efficiency
  • Coordinate cross-team delivery of high-impact programs such as data pipeline scalability, LLM infrastructure expansion, or infra observability improvements
  • Communicate program status and technical risks effectively to leadership and stakeholders
  • Continuously identify process or system bottlenecks, and drive automation to improve speed and reliability of infra operations
What we offer
What we offer
  • Medical, Vision, and Dental coverage
  • generous time-off policy
  • opportunity to contribute to your 401k plan
  • home office improvement stipend
  • annual education and wellness stipends
  • vibrant company culture through regular events
  • healthy lunches daily
  • Fulltime
Read More
Arrow Right

Infrastructure Hardware Technical Program Manager (Server And Network Systems)

As an Infrastructure Hardware Technical Program Manager (Server and Network Syst...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. or M.S. in Computer Science, Electrical/Computer Engineering, or equivalent experience
  • 8+ years in Technical Program Management (or similar delivery leadership) for server, network, or infrastructure platforms from concept through production
  • Experience coordinating complex server and/or datacenter network programs across OEM/ODMs, switch vendors, and internal engineering teams
  • Working knowledge of server architecture (CPU/NUMA, memory bandwidth, PCIe, NIC and storage IO) and enough networking fundamentals (leaf-spine fabrics, switch platforms, high-performance interconnects) to run effective technical reviews
  • Familiarity with Linux server fleet management (provisioning, firmware/BIOS, drivers, field triage)
  • Strong multi-team program execution skills: integrated plans, risk management, dependency tracking, and executive-level communication
  • Ability to operate in ambiguity and keep parallel server and network workstreams aligned
Job Responsibility
Job Responsibility
  • Own end-to-end program execution for server systems and network equipment in Cerebras clusters, including new platforms, refreshes, and major component/config changes
  • Drive requirements gathering and convert inputs into executable plans with clear milestones, readiness gates, and cross-functional deliverables
  • Represent Cluster Architecture in executive reviews, OKR cycles, and leadership/customer forums as needed
  • Build and manage integrated schedules across vendors and internal teams, track dependencies, critical path, and risks
  • Manage OEM/ODM and switch/vendor engagements (RFI/RFP, samples, escalations, roadmap alignment)
  • Partner with Compute / Server Platform / Network Architects to turn architectural decisions into qualification plans, acceptance criteria, and rollout strategies
  • Lead qualification and release readiness (lab/staging validation, regression tracking, go/no-go decisions)
  • Own risk and change management into production, including versioning, rollout sequencing, and stakeholder communication
  • Ensure operational readiness with deployment and fleet teams and maintain alignment with rack/physical DC owners on power, cooling, space, and cabling constraints
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
  • Fulltime
Read More
Arrow Right

Technical Program Manager - Copilot Infrastructure Validation

We are on a mission to revolutionize how businesses harness the power of AI. By ...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND demonstrated years of experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • Sufficient years of experience managing cross-functional and/or cross-team projects
  • Technical Expertise: In-depth understanding of AI/ML frameworks, cloud platforms, and data pipelines. Ability to partner effectively with other product managers and engineering teams to deliver complex technical solutions
  • AI tools: Sufficient years of experience in one or more: Copilot Studio, Copilot/Graph connectors, Power Automate, UiPath
  • Workflow Expertise: Sufficient years experience in hands-on workflow automation using Power Automate, UiPath or other similar tools
  • Customer-Centric: Proven track record of engaging directly with customers to gather requirements and deliver solutions that exceed expectations. OR engaging with customers to troubleshoot complex issues
  • Leadership: Demonstrated strategic thinking and the ability to influence
  • Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Business, or a related field
  • Passion for AI: Deep interest in AI and its potential to transform business workflows
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Agents integration and feedback
  • Understand the business value and technical dependencies of pre-release agents and influence our early adopters to deploy these AI agents in their work life
  • Act as the bridge between customers and engineering, deeply understanding AI adoption challenges and feature needs. Evaluate and influence stakeholders to prioritize the right agent development model to exceed customer productivity demands
  • Engage with business users, and IT teams worldwide to gather insights and prioritize technical features
  • Technical excellence
  • Drive the technical validation of Microsoft’s AI infrastructure like Copilot Connectors, MCPs and more, ensuring quality, scalability, security, and efficiency
  • Ability to understand and develop custom solutions and Agents using development tools such as Copilot Studio
  • Demonstrate deep expertise in cloud-based AI/ML services, containerized workloads
  • Translate complex technical concepts into actionable strategies for AI infrastructure evolution
  • Demonstrate modern engineering excellence through cutting-edge customer testing & validation methodologies
  • Fulltime
Read More
Arrow Right

Infrastructure Hardware Technical Program Manager- Hardware Systems

Meta is seeking a Technical Program Manager (TPM) with experience in server and ...
Location
Location
United States , Menlo Park
Salary
Salary:
168000.00 - 234000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. in Computer Science or a related technical discipline, or equivalent experience
  • 10+ years of experience in software engineering, systems engineering, hardware engineering, or technical product/program management experience
  • Experience delivering tech programs or products from inception to delivery
  • Knowledge of user needs, gathering requirements, and defining scope
  • Experience operating under your own initiative across multiple teams, demonstrated critical thinking, and thought leadership
  • Communication experience and experience working with technical management teams to develop systems, solutions, and products
  • Organizational, coordination and multi-tasking experience
  • Analytical and problem-solving experience with large-scale systems
  • Experience establishing work relationships across multidisciplinary teams and multiple partners in different time zones
Job Responsibility
Job Responsibility
  • Lead technical program management of next-generation hardware platform(s) for Meta Infrastructure in a matrix organization covering a range of areas (Data Center, Network, Hardware Systems, Infrastructure Engineering, Software Engineering, Capacity Management) and across multiple physical locations
  • Own overall program success, spanning the end-to-end development of the hardware product. spanning internal and external development work through successful ingestion into Meta’s infrastructure and support of production workloads at scale
  • Develop and manage programs including defining scope, requirements, development model, schedules, and deliverables with engineering teams, partners, and stakeholders
  • Influence broader roadmaps through product interception and market fit, competitive analysis, and feasibility studies
  • Provide hands-on program management during analysis, design, development, testing, implementation, and post implementation phases
  • Partner with Engineering counterparts across a range of specialties as well as other teams to define product roadmaps
  • Drive overall communication to leadership, stakeholders and core working teams in regular cadence
  • Drive internal process improvements across multiple teams and functions
  • Analyze infrastructure needs and produce hardware designs and prototypes to meet those needs
  • Manage and drive strategic vendor engagement and deliveries
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Technical Program Manager – AI Infrastructure, Site Operations

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. ...
Location
Location
United States , Sunnyvale
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in Technical Program Management, Infrastructure Ops, or Data Center Ops
  • Experience leading large, cross-functional infrastructure programs
  • Strong understanding of: Data center power and cooling fundamentals
  • Network and storage basics
  • Hardware-centric platforms
  • Proven ability to define and operationalize metrics
  • Strong written and executive-level communication skills
Job Responsibility
Job Responsibility
  • Own end-to-end technical programs for data center and site operations
  • Act as single-threaded owner across: Hardware & Systems Engineering
  • AI Cloud Infrastructure & Operations
  • Network & Storage Engineering
  • Facilities, power, cooling, and colo partners
  • Drive site readiness for Cerebras Wafer-Scale Engine systems
  • Partner on installation, commissioning, change management, and break/fix workflows
  • Lead incident reviews and postmortems
  • ensure corrective actions are closed
  • Define and own operational metrics and KPIs, including: Availability and reliability
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right