This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Meta is seeking a forward thinking experienced individual to join the Data Center Operations Team. The person should enjoy working in a fast-paced environment where adaptability and flexibility is key to their success. We seek an IT professional with management and leadership experience and advanced hands-on technical skills in Server Hardware, Project Management, Quality Management, Data Analytics, Networks, OS repair, Linux and Automation (ideally in a data center environment). The Data Center Server Engineering Manager is responsible for managing and maintaining server production including uptime, utilization, systemic technical issues and repairs throughout the Data Center.
Job Responsibility:
Managing a Data Center Operations Team accountable for the maintenance and operation of server hardware and supporting infrastructure at scale
Accountable for the health of server capacity delivering Meta's products and services from the data center site, and for ensuring operational delivery through collaboration and partnership with peer organizations
Work with peer organizations and regional teams that affect and deliver services to data center operations such as network operations, project management, facilities/maintenance management, logistics, hardware design, automated tooling and supply chain operations in order to successfully maintain data center uptime to enable ongoing business growth
Mentoring and developing engineers and technicians such that they can run daily operations with minimal supervision
Lead a high-quality data center operations team, with a broad range of experiences, perspectives, and backgrounds, developing both the technical and leadership qualities of engineers and technicians
Collaborating with other Production Operations Managers in data center sites around the globe to evolve and optimize processes and approaches in a globally consistent way to allow Meta to scale and grow effectively
Creating and driving a work environment of ownership, innovation, collaboration, accountability, and safety. Support and contribute thought leadership to the development and implementation of business practices, process and automated tooling which enables the growth and ongoing management of our global data center IT footprint
Manage server upgrades, integration, automated OS provisioning process, rebuilds and other projects as required. Understand and debug network, hardware, and Linux OS related issues
Identify and participate in the creation of documentation for the global DC knowledge base. Implement process improvements and inform best practices in data center operations
Predicting data center growth and scaling issues before they occur and implement solutions
Drive specifications for tooling and automation that facilitate deployment, monitoring, automated remediation and decommissioning of server hardware at scale
Knowledge and ownership of a hyper-scale computing fleet through the use of data trending and analysis to identify trends and systemic issues, reporting out globally
Requirements:
BS or BA in technical field or commensurate experience
10+ years experience in high availability technology environments working with cross functional teams
4+ years experience managing teams of technical resources including people and performance management responsibilities
Knowledge with Linux and hardware systems support in an Internet operations environment
Familiarity with Python, SQL and/or shell scripting knowledge
Solid knowledge of enterprise level infrastructure
Understanding of out-of-band/lights-out server communication methods, such as IPMI and serial console
Proven time and project management skills
Having depth and breadth of knowledge of managing servers in a large-scale distributed environment is a core competency of this individual
Nice to have:
4+ years of experience in large-scale data center hardware deployments and building scalable infrastructure