This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Glean is seeking a Senior Infrastructure Technical Program Manager (TPM) to lead large-scale, cross-functional initiatives that define, scale, and optimize our infrastructure platform. This role sits at the intersection of infrastructure engineering, reliability, cost efficiency, and AI systems, driving programs that ensure Glean’s platform remains performant, scalable, and resilient as we continue to grow.
Job Responsibility:
Lead end-to-end infra programs spanning compute, networking, storage, orchestration, and AI workloads
Partner with Engineering to define standards for environment provisioning, deployment automation, and configuration governance
Develop and operationalize frameworks for runtime health, scaling, and disaster recovery
Drive consistency and automation across deployment orchestration systems
Establish clear metrics for reliability, performance, and cost efficiency
Coordinate cross-team delivery of high-impact programs such as data pipeline scalability, LLM infrastructure expansion, or infra observability improvements
Communicate program status and technical risks effectively to leadership and stakeholders
Continuously identify process or system bottlenecks, and drive automation to improve speed and reliability of infra operations
Requirements:
BS/MS in Computer Science, Engineering, or a related technical field
8-10+ years of experience in technical program management, infrastructure, or SRE, with at least 3-5 years managing infra or platform-scale programs
Proven success delivering cross-functional infrastructure programs in B2B or enterprise environments where scalability, uptime, and performance are critical
Experience working closely with Infra, SRE, and ML/AI teams on distributed systems or data infrastructure
Strong understanding of cloud infrastructure (AWS, GCP, or Azure) including compute, networking, storage, and orchestration systems
Ability to structure complex multi-quarter infrastructure programs with clear milestones and measurable impact
Strong written and verbal communication and ability to manage through ambiguity, anticipate scaling challenges, and align teams across priorities
Builder mindset with focus on automation, reliability, and efficiency
Nice to have:
Understanding of data pipelines, ML training workflows, and LLM runtime infrastructure is a plus