This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Meta's data center network is the heart of connecting servers and GPUs. The demand for data center network capacity is rapidly growing, requiring the development of new data center network products. As the speed of capacity deployment increases, so is the demand for it to be reliable. Network interruptions inside of Meta's data center network have an exponentially negative impact on Meta's goals. In this role, you'll be playing a leadership role for supporting the team responsible for the reliability and operations of Meta's production data center network. Working with your team you'll drive reliability initiatives to ensure a stable data center network for both our AI initiatives and all of our products. You'll focus on streamlining operations through automation and software development while preparing your team to support new network topologies and platforms. This role will place you in the critical path for enabling Meta's business objectives and provide you the opportunity to experience unprecedented network scale.
Job Responsibility:
Support and lead engineers working on Meta's production data center network focusing on challenges related to reliability, scalability, and efficiency of operations and the data center network
Understand and contribute to technical architectures, tooling needs, automation plans, network platform launch plans and create comprehensive plans for prioritizing technical and resourcing challenges
Actively drive and participate in the handling of incidents on Meta's data center network
Work with your team to develop automation software to streamline network operations
Develop lasting partnerships with organizational leaders across Meta's network and Infrastructure teams
Empower engineers to develop their careers, matching their strengths with projects tailored to their skill levels, long-term skill development, personalities, and work styles
Help build and enrich an collaborative work environment comprised of people with a broad range of experiences, perspectives, approaches, and backgrounds
Assess employee performance on an ongoing basis, address under-performance, and recognize and promote performance
Work closely with dedicated recruiting staff to expand the team including interviewing candidates, participating in conferences/events, and on-boarding new employees
Balance the need to “keep things running” with allocating time to long-term, high-impact projects
Requirements:
4+ years of direct management experience in a technology role
BS or MS in Computer Science, Engineering, or a related technical discipline, or equivalent experience
Experience supporting network devices in a production setting
Experience with building teams and/or organizations, including hiring and managing performance
Communication and cross-collaboration experience
Nice to have:
Experience developing software to support network operations
Intimate knowledge of common data center network routing topologies (BGP)
Familiarity with common network switch hardware architecture