CrawlJobs Logo

Ai Capacity Planning Engineer

meta.com Logo

Meta

Location Icon

Location:
United States , Menlo Park

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

184000.00 - 257000.00 USD / Year

Job Description:

Meta is seeking a Performance and Capacity Engineer to join the Capacity team to focus on AI strategy and planning projects. This person would be required to work cross-functionally with a number of teams to ensure optimal operation and growth of our AI computing resources from both a cost and technology perspective. Tens of billions of user requests, hundreds of peta bytes of data, thousands of giga bps of network flow. Help build one of the largest AI training and inference services in the world!

Job Responsibility:

  • Own AI infrastructure capacity planning for Meta: including Servers, Data Centers, Network
  • Design, implement and launch software systems to improve AI capacity planning efficiency and quality, partnering with software engineers
  • Contribute to end to end AI capacity planning processes, methodologies, and data to deliver executable and optimized plans
  • Manage and resolve critical escalations and exceptions in all areas of AI capacity planning
  • Build mathematical models to perform simulation and optimization studies of AI demand and supply projections, scenario planning, and feasibility analysis while balancing various constraints
  • Work cross-functionally to define problem statements, collect data, build analytical models and make recommendations to drive change and optimization at the most strategic levels
  • Partner across Infra: such as platform teams, operations, networking planning, data center planning as well as Product and Finance teams to find the most optimal ways to scale our AI Infrastructure
  • Effectively navigate complex tradeoffs and relationships to balance solving for team, cross-functional partner / stakeholders, and Meta company priorities. Balance the need to “keep things running” with longer-term, high-impact projects

Requirements:

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 8+ years of experience in performance or software engineering and/or optimization pertinent data science, data engineering or equivalent practical experience
  • 8+ years of experience in designing and implementing models and optimization algorithms
  • 4+ years of experience in coding/scripting languages such as Python, R, Java, C, C++, PHP
  • Experience working with distributed systems at scale
  • Experience in infrastructure operations and technical infrastructure knowledge
  • Experience working with cross-functional teams
  • Experience optimizing complex systems, working with large datasets, and driving business impact

Nice to have:

  • MS or PhD degree in Computer Science, Electrical Engineering, Operations Research or other technical field
  • Experience working with large scale AI/ML systems (GPU based)
  • Direct experience in capacity planning for a major private or public cloud
  • Practical experience and demonstrated success in performance or capacity engineering
What we offer:
  • bonus
  • equity
  • benefits

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Ai Capacity Planning Engineer

Director of Solutions Engineering

We’re looking for a Director of Solutions Engineering to lead and scale our pre-...
Location
Location
United States , Flexible
Salary
Salary:
Not provided
crescendo.ai Logo
Crescendo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in Solutions Engineering / Sales Engineering
  • 5+ in leadership roles
  • Proven success building and scaling teams in high-growth SaaS (ideally CX, AI, or infrastructure)
  • A track record of winning large, strategic deals through technical depth, creativity, and strong executive presence
  • Ability to operate with agility — from being hands-on in a demo to guiding C-suite strategy sessions
  • Strong cross-functional instincts: you know how to influence product, marketing, and sales to create leverage
  • A builder’s mindset: process design, playbook creation, and team development energizes you
Job Responsibility
Job Responsibility
  • Build and scale Crescendo’s Solutions Engineering team, setting the bar for technical excellence and customer storytelling
  • Partner with Sales Leadership to craft winning deal strategies for key accounts and enterprise prospects
  • Design repeatable demos, frameworks, and proofs-of-concept that showcase Crescendo’s Managed AI + Superhuman differentiation
  • Design and oversee Solutions Engineer training, career level framework, and professional development
  • Own department metrics, monthly and quarterly business reviews, and capacity planning in a high-growth environment where every quarter will be bolder than the last
  • Own the playbooks by which we deliver demos, gather technical requirements, and translate customer feedback into clear product insights that inform roadmap decisions
  • Collaborate with technical stakeholders and other Sales Enablement stakeholders on translating technical documentation into customer-facing resources
  • Develop scalable processes and enablement for pre-sales engagements, ensuring consistency across the global team
  • Build trusted advisor relationships with prospects, executives, and technical stakeholders to drive deal velocity and expansion
What we offer
What we offer
  • Competitive compensation, equity, and benefits that reflect the value of top talent
  • Fulltime
Read More
Arrow Right

Tpm / capacity planning manager

Join Crusoe Energy as a Capacity Planning Manager, a pivotal role providing crit...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
100000.00 - 150000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Industrial Engineering, Mathematics, Computer Engineering, Computer Science, Operations Research, Data Science, or equivalent professional experience
  • 1+ years of experience in Capacity Planning or Demand Planning within a related field
  • Thrive in fast-paced environments, effectively collaborating with multidisciplinary teams
  • Exceptional problem-solving skills and a data-driven approach to decision-making
  • Excellent communication, presentation, and collaboration abilities to work effectively with internal and external stakeholders
Job Responsibility
Job Responsibility
  • Scale Infrastructure Resources: Own and manage services and plans to efficiently scale compute, network, and storage infrastructure
  • Drive Executive Decision-Making: Develop and analyze business and technical data and scenarios to inform high-level executive decisions regarding infrastructure and Crusoe products
  • Execute Capacity Planning: Implement end-to-end capacity planning processes, methodologies, and data to deliver optimized and executable capacity plans
  • Proactively Address Issues: Identify and build solutions for capacity-related challenges, ensuring timely resolution
  • Forecast and Manage Deployment: Understand demand sources and trends to generate accurate forecasts, analyze changes, and develop strategic deployment plans
  • Partner for Optimal Scaling: Collaborate with product and Go-To-Market/Sales teams to align on demand signals and requirements, ensuring optimal infrastructure scaling and service placement
  • Optimize Across Engineering: Partner across the engineering landscape to optimize at the intersection of hardware, infrastructure, and software, working closely with service owners, SRE, and hardware teams
  • Balance Cost and Product: Work with the Finance team to balance cost efficiency with technical and product considerations
  • Drive Strategic Change: Define problem statements, collect data, build analytical models, and recommend changes to drive strategic optimization
  • Improve Planning Efficiency: Define and implement improvements to planning efficiency and quality, partnering with software engineers for development
What we offer
What we offer
  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Head of AI & Engineering

The Director of Engineering & AI will guide teams across Software Engineering, C...
Location
Location
United States , Boston
Salary
Salary:
250000.00 - 350000.00 USD / Year
daleyaa.com Logo
Daley and Associates
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science
  • 10+ years across software, data, and cloud environments (AWS preferred)
  • 8+ years in engineering leadership roles
  • 2+ years working with modern AI / LLM-powered solutions
  • Strong background in cloud architecture and full-stack development
  • Proven leadership across teams, vendors, and complex initiatives
  • Excellent communication skills with experience presenting to executives
  • Experience with enterprise data strategy and platform development
Job Responsibility
Job Responsibility
  • Set the technical vision and articulate the company’s engineering and AI roadmap
  • Direct software engineering, cloud infrastructure, data, and AI teams while advancing overall technical maturity
  • Enhance data integrity, availability, and platform stability across systems
  • Identify, prioritize, and deliver technology initiatives with meaningful business impact
  • Embed AI-powered solutions and automation into core operational processes
  • Own and manage partnerships with cloud providers and technology vendors
  • Maintain secure, compliant, and scalable technology environments
  • Lead engineering budgets, capacity planning, and resource allocation
  • Fulltime
Read More
Arrow Right

Software Engineer, Site Reliability

As a Site Reliability Engineer (SRE) at Fireworks AI, you will play a critical r...
Location
Location
United States , San Mateo
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems
  • Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems
  • Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services
  • Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
  • Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing
  • Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development
  • In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging
  • Proven ability to troubleshoot complex issues across the entire stack
  • Excellent communication, collaboration, and problem-solving skills
Job Responsibility
Job Responsibility
  • Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure
  • Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability
  • Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance
  • Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management
  • Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization
  • Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence
  • On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts
  • Fulltime
Read More
Arrow Right

Principal Consultant A2- Data & AI

The Principal Consultant is a senior leader responsible for the successful techn...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 20+ years of experience in software/solution engineering, with at least 3–5 years in delivery leadership roles
  • Proven experience in leading delivery of complex, multi-disciplinary projects
  • Strong understanding of modern delivery methodologies (Agile, Scrum, DevOps, etc.)
  • Excellent communication, stakeholder management, problem-solving, and team leadership skills
  • Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience)
  • Relevant certifications are a plus
  • Areas of Expertise: Enterprise Data Architecture & Modern Platforms
  • Data Engineering & Large Scale Data Processing
  • Database Platforms, Performance & Capacity Engineering
  • RealTime Analytics & Operational Intelligence
Job Responsibility
Job Responsibility
  • AI-First Delivery Leadership: Embed AI-first principles into delivery workflows
  • Lead end-to-end delivery of complex projects
  • Drive engineering excellence through reusable components, accelerators, and scalable architecture
  • Oversee technical execution across multiple projects
  • Collaborate with clients and internal stakeholders to define strategies, delivery plans, milestones, and risk mitigation approaches
  • Act as a technical point of contact for clients
  • Ensure delivery models are optimized for modern AI-native execution
  • Ability to step into at‑risk projects, quickly assess issues, and establish a credible path to recovery or exit
  • Engineering Excellence: Champion high-quality engineering practices across all delivery engagements
  • Ensure adherence to coding standards, architectural integrity, and performance benchmarks
  • Fulltime
Read More
Arrow Right

AI Supply Program Development Manager - Data Center Construction

This role is critical in supporting Infra Data Centers (IDC) capacity planning e...
Location
Location
United States , Menlo Park
Salary
Salary:
170000.00 - 238000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in a directly related field, or equivalent practical experience
  • 10+ years direct professional experience in project planning and construction management
  • Demonstrated knowledge to understand the design and deployment of technical electrical, mechanical and connectivity systems
  • Experience in commercial construction management (pre-construction, contracting, scheduling, estimating, cost management)
  • Knowledge of how to build, update, and apply the information found in project schedules including identification of the critical path
  • Coordination skills to lead a team of broad backgrounds and experience towards a single project outcome
  • Demonstrated experience in communication and reporting skills
Job Responsibility
Job Responsibility
  • Develop, assess and accurately report on planning project's cost, schedule and risk for each quarter/bi-annual/monthly submissions
  • Set strategy in partnership with Data Center and Network teams, Finance and Capacity teams to determine future connectivity and retrofit work
  • Providing feedback and pertinent information to cross-functional partners and peers to enable accurate delivery of a coordinated AI Supply Plan
  • Driving and holding upstream teams accountable for decision making and deliverables
  • Coordinating with internal design and construction partners to deliver a comprehensive plan that represents cost/schedule/risk accurately
  • Communicating and reporting to leadership on status of future work planning
  • Coordinate and communicate the plan across internal teams, enabling the execution teams to move forward with the approved plan
  • Creating and maintaining CapEx budgets for projects in the planning space
  • Creating and maintaining P6 schedules for projects in the planning space, including end to end schedule reporting
  • Creating a strategy and framework for risk reporting and communication as part of the Capacity Plan
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Principal Consultant - Data & AI

The Principal Consultant is a senior leader responsible for the successful techn...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 20+ years of experience in software/solution engineering, with at least 3–5 years in delivery leadership roles
  • Proven experience in leading delivery of complex, multi-disciplinary projects
  • Strong understanding of modern delivery methodologies (Agile, Scrum, DevOps, etc.)
  • Excellent communication, stakeholder management, problem-solving, and team leadership skills
  • Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience)
  • Relevant certifications are a plus
  • Enterprise Data Architecture & Modern Platforms expertise
  • Data Engineering & Large Scale Data Processing expertise
  • Database Platforms, Performance & Capacity Engineering expertise
  • RealTime Analytics & Operational Intelligence expertise
Job Responsibility
Job Responsibility
  • AI-First Delivery Leadership: Embed AI-first principles into delivery workflows, leveraging automation and intelligent orchestration where applicable
  • Lead end-to-end delivery of complex projects, ensuring solutions are scalable, robust, and aligned with client business outcomes
  • Drive engineering excellence through reusable components, accelerators, and scalable architecture
  • Oversee technical execution across multiple projects, ensuring adherence to best practices, quality standards, and compliance requirements
  • Collaborate with clients and internal stakeholders to define strategies, delivery plans, milestones, and risk mitigation approaches
  • Act as a technical point of contact for clients, translating business requirements into scalable technical solutions
  • Ensure delivery models are optimized for modern AI-native execution, including integration of automation and intelligent processes
  • Ability to step into at‑risk projects, quickly assess issues, and establish a credible path to recovery or exit
  • Engineering Excellence: Champion high-quality engineering practices across all delivery engagements
  • Ensure adherence to coding standards, architectural integrity, and performance benchmarks
  • Fulltime
Read More
Arrow Right

Director of Product Management

As the Director of Product Management, you will be at the forefront of transform...
Location
Location
United States , Bellevue
Salary
Salary:
191300.00 - 258800.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, IT, or equivalent experience (MBA preferred)
  • 7-10+ years of relevant product management experience, with a focus on portfolio management, governance, and process improvement
  • Demonstrated experience leading large-scale products or initiatives against ambitious deadlines
  • Experience transforming large product portfolios, particularly in an Agile software environment
  • Strong understanding of portfolio management, resource allocation, and capacity planning processes
  • Foundational understanding of AI and its application to optimize workflow and process delivery
  • Product mindset with the ability to identify hypotheses, design tests, and drive outcomes
  • Excellent communication and leadership skills, with the ability to influence and drive change across teams
  • Ability to collaborate effectively with senior leadership and cross-functional teams to align priorities and ensure successful execution
  • At least 18 years of age
Job Responsibility
Job Responsibility
  • Drive Process and Governance Transformation
  • Lead Product Board across partner teams
  • Enhance Product Lifecycle Management
  • Ensure High-Impact Prioritization
  • Leadership and Team Collaboration
  • Performance Monitoring and Reporting
What we offer
What we offer
  • Medical, dental and vision insurance
  • Flexible spending account
  • 401(k)
  • Employee stock grants
  • Employee stock purchase plan
  • Paid time off
  • Up to 12 paid holidays
  • Paid parental and family leave
  • Family building benefits
  • Back-up care
  • Fulltime
Read More
Arrow Right