Site Reliability Engineering Manager Job at Wikimedia Foundation

Job Description

The Wikimedia Foundation is looking for an Engineering Manager to join our SRE team, reporting to the Director of Site Reliability Engineering. As Engineering Manager, you will be responsible for supporting the engineers developing our infrastructure and supporting the services that depend on it, used by hundreds of millions of people around the world.

Job Responsibility

Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization
Providing guidance, mentorship, and support to ensure the team's effectiveness and growth
Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path
Recruiting, hiring, and helping onboard new team members
Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members
Coordinating and communicating with other members of the Wikimedia product & engineering teams on relevant projects, executing complex projects and contributing to the organizational strategy
Continuously developing the roadmap of the team in alignment with other SRE and Product & Technology teams, and helping to draft and execute the team’s annual and quarterly plans
Project managing new and existing initiatives
Leading the definition, refinement, and execution of the processes through which the team manages and performs work
Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure
Be part of 24/7 on-call rotation to handle escalations and provide support for teams to resolve issues
Facilitating the definition and establishment of Service Level Indicators and Objectives with service owners and stakeholders

Requirements

Prior experience managing teams
Prior hands-on experience with software or reliability engineering (within the last 3 years preferred)
Ability to analyze complex systems, troubleshoot issues, and devise effective solutions under pressure
Proficiency in project management methodologies to effectively plan, execute, and track new and existing initiatives
Strong understanding of cloud computing, networking, Linux systems administration, containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible) to be able to provide technical support to the team
Aptitude for automation and streamlining of tasks
Communicate effectively in both spoken and written English
Ability to work independently, as an effective part of a globally distributed team
Ability to travel several times a year for occasional in-person meetings
B.S. or M.S. in Computer Science or the equivalent in related work experience

Nice to have

Experience working in a distributed, largely remote environment
Experience contributing to open source projects

Wikimedia Foundation - All Job Offers

Select Country

Site Reliability Engineering Manager

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Site Reliability Engineering Manager

Site Reliability Engineering Manager

Loan IQ Product Development and Site Reliability Engineering Manager

Principal Site Reliability Engineering Manager

Senior Site Reliability Engineering Manager

Principal Site Reliability Engineering Manager

Principal Site Reliability Engineering Manager

Manager, Site Reliability Engineering and Incident Management

Manager of Site Reliability Engineering (SRE)

Our AI answers in your language