This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Meta is seeking a forward-thinking, experienced individual to join the Data Center Fleet Operations team. The Fleet Operations Manager is accountable for managing and leading a geographically dispersed team, delivering SLA/KPI’s related to production server hardware, resolution of systemic technical issues, and repairs throughout the assigned geographic region of data centers. We are looking for someone who can effectively prioritize and adapt to shifting priorities in a dynamic operational environment. The ideal candidate is an IT professional with strong leadership skills and experience in Server Hardware, Project Management, Quality Management, Data Analytics, Networks, OS repair, Linux and Automation, ideally in a datacenter environment. Having an extensive understanding of managing servers in a large-scale distributed environment.
Job Responsibility
Build and lead a geographically dispersed, high-performing data center operations team, developing both the technical capabilities and leadership qualities of engineers
Establish and manage a Data Center Operations Team accountable for the maintenance and operation of server hardware and supporting infrastructure at scale
Become a technical expert in Meta's infrastructure, including platforms, tools, systems, architecture, workflows, and performance
Provide strategic direction, guidance, and support for site and fleet-level operations
Analyze and drive continuous improvement in the engineering and operational performance of our data centers
Employ data analytics to identify inefficiencies, opportunities, exceptions, and correlations in a complex, highly interconnected, technical environment
Collaborate with cross-functional partner teams to ensure fleet health and maintain targeted capacity levels, resulting in optimized operations, minimized downtime, and seamless scalability
Evolve and optimize processes in a globally consistent way to allow Meta to scale and grow effectively
Support and mentor engineers in their day-to-day work, as well as in finding opportunities to develop and grow based on their areas of strength and interest
Create and drive a culture of ownership, innovation, collaboration, accountability, continuous improvement, and safety
Conduct performance management for a technical engineering team, providing clear expectations and goals
Assume the role of incident manager during large-scale, site-wide, and region-wide production-impacting events
Support and contribute thought leadership to the development and implementation of business practices, processes and automated tooling
Develop deep knowledge and ownership of a hyper-scale computing fleet through the use of data analysis to identify trends and systemic issues and opportunities
Requirements
BS, BA, or BEng in a technical field or commensurate experience
Ability to travel up to 30% is required
Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation, including bringing in additional expertise as needed
5+ years of experience managing teams of technical resources, including people and performance management responsibilities
Understanding of data center infrastructure and/or operations, including power, cooling, and/or network systems
structured cabling
and management of projects, incidents, and vendors
Experience using data and metrics to drive decision-making
Ability to influence effectively, working on cross-functional teams to advance the needs of the company and adapting teams to meet these needs
10+ years of engineering or operations experience, preferably in a mature engineering or operations environment, working with cross-functional teams
Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
Nice to have
Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
Six Sigma knowledge/certification
Experience leading technical resources using Linux or an equivalent OS to support hardware systems in a complex IT environment
Experience with large-scale AI implementations and the use of AI to drive automation
Experience in large-scale data center hardware deployments and building scalable infrastructure
Knowledge of the interdependencies of data center functions and technologies, including electrical, cooling, structured cabling, security, and network