This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Meta is seeking a Production Systems & Software Engineering Manager to join our Data Center Site Operations (SiteOps) team. This role leads the Systems & Software Engineering team which drives the integration, performance, and alignment of tooling, automation, break/fix triage, and related workflows critical to Site Operations.
Job Responsibility:
Develop and collaboratively own the roadmap for all tooling, automation, processes and workflows for compute, storage and accelerator delivery from Infra into mass production (MP) deployments. Serve as the central point of contact representing these functions across SiteOps
Develop and collaboratively own the processes and workflows required to support Global Operations in maintaining a high SLA for our compute, storage and accelerator platforms
Build relationships and collaboration with engineering and cross functional teams across the company. Actively solicit feedback from teams, and use that feedback to improve operational effectiveness as infrastructure scales
Lead the team to identify and root cause systemic issues in the fleet and drive resolution. Deliver maximum server fleet up-time and utilization rates, by leveraging data to understand hardware failure conditions and root cause
Provide people management, mentorship, coaching, and career development to build an environment fostering commitment to impact
Support leadership meetings and facilitate alignment on key issues and opportunities
provide timely alerts and data for enabling cross-functional teams to develop requisite corrective actions and forward looking implementations
Collaborate with stakeholders, functional owners and subject matter experts to interpret and articulate business and operations needs
Travel up to 30% is required
Requirements:
BS or BA in technical field or commensurate experience
10+ years experience in managing teams in software design, workflows and validation, working with cross functional teams to deliver products to production
Experience working across a global organization and building partnerships with cross functional teams inside and outside of the organization
Demonstrated success in developing and executing a strategic roadmap that supports organizational scaling
Experience in processing and analyzing large sets of data
Demonstrated knowledge of server and storage platforms, principles, technologies, protocols, and standards
Experience managing multiple concurrent projects and managing tight timelines
Nice to have:
Large-scale data center environment experience, including tooling and automation deployments
Experience in data center system and workflows development and deployments