Data Center Production Operations Engineer Job at Meta (Temple)

Data Center Production Operations Engineer

Meta is seeking a forward thinking experienced engineer to join the Production O...

Location

United States , New Albany

Salary:

53.37 - 76.44 USD / Hour

Meta

Expiration Date

Until further notice

Requirements

BS, BA or BEng in technical field or commensurate experience
7+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
Expert in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
Hands-on experience and knowledge of server hardware and components, including storage
Expert knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
Experience managing multiple technical issues concurrently driving to the root cause
Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation. Brings peers, partners and other resources into the project where additional expertise is needed, and to provide growth and learning opportunities for others
Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
Deep technical knowledge of technologies such as HTTP, DNS, RAID, and DHCP
Experience in providing technical guidance to external vendors

Job Responsibility

Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
Perform deep dives and root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues
Facilitate collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
Lead the introduction of new platforms and hardware to the site and geographical area, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
Drive corrective actions of complex hardware issues, work with internal teams and vendors
provide an ownership stake, and influence future design changes to ensure ease of serviceability
Solve complex and systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements

What we offer

bonus
equity
benefits

Fulltime

Data Center Production Operations Engineer

Meta is seeking a Production Operations Engineer looking to apply their technica...

Location

United States , Mesa, AZ +9 locations

Salary:

34.13 - 49.52 USD / Hour

Meta

Expiration Date

Until further notice

Requirements

Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
Currently has, or is in the process of obtaining, a Bachelor's or Master's degree in technical field, or equivalent experience/certification
Knowledge of Linux and server hardware support
Working knowledge and experience in at least one of the following core areas: Networking, Programming/Scripting, Hardware, or OS repair
Solid communication skills are a requirement for this role

Job Responsibility

Work within Meta's ticketing system
First point of contact for break fix technicians
Responsible for assisting with projects (retrofits, new process details, etc.) and repairs throughout the data center
Understand and debug hardware and Linux OS related issues
Identify and help create documentation for the global data center knowledge base
Assist with process improvements and best practices in data center operations
Participate in on-call rotation (once a month on call for a week, after hours, first point of contact)

What we offer

bonus
equity
benefits

Data Center Production Operations Engineer

Meta is seeking a forward thinking experienced engineer to join the Production O...

Location

Singapore

Salary:

Not provided

Meta

Expiration Date

Until further notice

Requirements

BS, BA or BEng in technical field or commensurate experience
7+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
Expert in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
Hands-on experience and knowledge of server hardware and components, including storage
Experience of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
Experience managing multiple technical issues concurrently driving to the root cause
Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation. Brings peers, partners and other resources into the project where additional expertise is needed, and to provide growth and learning opportunities for others
Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
Extensive technical knowledge of technologies such as HTTP, DNS, RAID, and DHCP
Experience in providing technical guidance to external vendors

Job Responsibility

Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
Perform in-depth exploration and root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues
Facilitate collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
Lead the introduction of new platforms and hardware to the site and geographical area, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
Drive corrective actions of complex hardware issues, work with internal teams and vendors
provide an ownership stake, and influence future design changes to ensure ease of serviceability
Solve complex and systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements

SiteOps Data Center Production Operations Engineer

Meta is seeking a forward thinking experienced engineer to join the Production O...

Location

United States , Los Lunas

Salary:

40.38 - 62.50 USD / Hour

Meta

Expiration Date

Until further notice

Requirements

BS, BA or BEng in technical field or commensurate experience
5+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
Intermediate-level understanding in Linux (or equivalent OS) in a complex IT environment with the capacity to triage, debug, and troubleshoot server issues
Hands-on experience and knowledge of server hardware and components, including storage
Intermediate-level knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
Experience managing technical issues and driving to the root cause
Experience participating in technical projects related to areas such as process improvement, technology, and/or automation
Capacity to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
Intermediate-level knowledge of technologies such as HTTP, DNS, RAID, and DHCP
Experience in providing technical guidance to external vendors

Job Responsibility

Support platform health by successfully resolving and closing tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
Participate in root cause analysis of highly technical issues within the data center, ranging from automated tooling to hardware failures and network issues
Collaborate with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
Point of contact for the introduction of new platforms and hardware to the site, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
Use tools and data analysis effectively to identify issues. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
Identify corrective actions of hardware issues, work with internal teams and vendors
influence future design changes to ensure ease of serviceability
Solve systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements

What we offer

bonus
equity
benefits

Fulltime

Senior Data Center Operations Engineer

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...

Location

United States , Vernon

Salary:

128000.00 - 170000.00 USD / Year

Lambda

Expiration Date

Until further notice

Requirements

Strong experience with critical infrastructure systems supporting data centers (power distribution, air flow management, environmental monitoring, capacity planning, DCIM software, structured cabling, cable management)
Familiar with carrier DIA circuit test and turn ups, understanding LOA’s, and fiber testing and troubleshooting
Solid understanding of cable, fiber, and optics and their different use cases
Solid understanding of single and three phase power theories including PDU balancing
Base level network fundamentals (CCNA preferred but not required)
Knowledge of cold aisle and hot aisle containment
Solid understanding of server hardware and boot process (PXE, DHCP, & TFTP)
Work with product management, support, and other teams to align operational capabilities with company goals
Translating business priorities into technical and operational requirements
Supporting cross-functional projects where infrastructure plays a critical role

Job Responsibility

Ensure new server, storage and network infrastructure is properly racked, labeled, cabled, and configured
Troubleshoot hardware and software issues in some of the world’s most advanced GPU and Networking systems
Document and update data center layout and network topology in DCIM software
Work with supply chain & manufacturing teams to ensure timely deployment of systems and project plans for large-scale deployments
Manage a parts depot inventory and track equipment through the delivery-store-stage-deploy-handoff process in each of our data centers
Partner with HW Support teams to ensure data center hardware incidents with higher level troubleshooting challenges are resolved, reported on and solutions are disseminated to the large operations organization
Work with the RMA team to ensure faulty parts are returned and replacements are ordered
Follow installation standards and documentation for placement, labeling, and cabling to drive consistency and discoverability across all data centers
Improve installation standards, MOPs, and runbooks
Act as a technical escalation point for DC infrastructure issues

What we offer

Generous cash & equity compensation
Health, dental, and vision coverage for you and your dependents
Wellness and commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible paid time off plan

Fulltime

Data Center Operations Engineer

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...

Location

United States , Vernon

Salary:

109000.00 - 145000.00 USD / Year

Lambda

Expiration Date

Until further notice

Requirements

Strong experience with critical infrastructure systems supporting data centers (power distribution, air flow management, environmental monitoring, capacity planning, DCIM software, structured cabling, cable management)
Familiar with carrier DIA circuit test and turn ups, understanding LOA’s, and fiber testing and troubleshooting
Solid understanding of cable, fiber, and optics and their different use cases
Solid understanding of single and three phase power theories including PDU balancing
Base level network fundamentals (CCNA preferred but not required)
Knowledge of cold aisle and hot aisle containment
Solid understanding of server hardware and boot process (PXE, DHCP, & TFTP)
Work with product management, support, and other teams to align operational capabilities with company goals
Translating business priorities into technical and operational requirements
Supporting cross-functional projects where infrastructure plays a critical role

Job Responsibility

Ensure new server, storage and network infrastructure is properly racked, labeled, cabled, and configured
Troubleshoot hardware and software issues in some of the world’s most advanced GPU and Networking systems
Document and update data center layout and network topology in DCIM software
Work with supply chain & manufacturing teams to ensure timely deployment of systems and project plans for large-scale deployments
Manage a parts depot inventory and track equipment through the delivery-store-stage-deploy-handoff process in each of our data centers
Partner with HW Support teams to ensure data center hardware incidents with higher level troubleshooting challenges are resolved, reported on and solutions are disseminated to the large operations organization
Work with the RMA team to ensure faulty parts are returned and replacements are ordered
Follow installation standards and documentation for placement, labeling, and cabling to drive consistency and discoverability across all data centers
Improve installation standards, MOPs, and runbooks
Act as a technical escalation point for DC infrastructure issues

What we offer

Generous cash & equity compensation
Health, dental, and vision coverage for you and your dependents
Wellness and commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible paid time off plan

Fulltime

Data Center Production Operations Manager

Meta is seeking a forward thinking experienced individual to join the Data Cente...

Location

United States , Houston

Salary:

135000.00 - 191000.00 USD / Year

Meta

Expiration Date

Until further notice

Requirements

BS or BA in technical field or commensurate experience
10+ years experience in high availability technology environments working with cross functional teams
4+ years experience managing teams of technical resources including people and performance management responsibilities
Knowledge with Linux and hardware systems support in an Internet operations environment
Familiarity with Python, SQL and/or shell scripting knowledge
Solid knowledge of enterprise level infrastructure
Understanding of out-of-band/lights-out server communication methods, such as IPMI and serial console
Proven time and project management skills
Having depth and breadth of knowledge of managing servers in a large-scale distributed environment is a core competency of this individual

Job Responsibility

Managing a Data Center Operations Team accountable for the maintenance and operation of server hardware and supporting infrastructure at scale
Accountable for the health of server capacity delivering Meta's products and services from the data center site, and for ensuring operational delivery through collaboration and partnership with peer organizations
Work with peer organizations and regional teams that affect and deliver services to data center operations such as network operations, project management, facilities/maintenance management, logistics, hardware design, automated tooling and supply chain operations in order to successfully maintain data center uptime to enable ongoing business growth
Mentoring and developing engineers and technicians such that they can run daily operations with minimal supervision
Lead a high-quality data center operations team, with a broad range of experiences, perspectives, and backgrounds, developing both the technical and leadership qualities of engineers and technicians
Collaborating with other Production Operations Managers in data center sites around the globe to evolve and optimize processes and approaches in a globally consistent way to allow Meta to scale and grow effectively
Creating and driving a work environment of ownership, innovation, collaboration, accountability, and safety. Support and contribute thought leadership to the development and implementation of business practices, process and automated tooling which enables the growth and ongoing management of our global data center IT footprint
Manage server upgrades, integration, automated OS provisioning process, rebuilds and other projects as required. Understand and debug network, hardware, and Linux OS related issues
Identify and participate in the creation of documentation for the global DC knowledge base. Implement process improvements and inform best practices in data center operations
Predicting data center growth and scaling issues before they occur and implement solutions

What we offer

bonus
equity
benefits

Fulltime

Is Data Center Operations Engineer

Bridging Information Technology (IT) and the Mechanical, Electrical, and Plumbin...

Location

United States , New Albany

Salary:

91731.00 - 114948.00 USD / Year

Amgen

Expiration Date

Until further notice

Requirements

Master’s degree
Bachelor’s degree and 2 years of data center operations experience
Associate’s degree and 6 years of data center operations experience
High school diploma / GED and 8 years of data center operations experience
Hands-on experience with rack/stack, structured cabling, and IT hardware installation
Familiarity with Dell PowerEdge, Nutanix, NetApp, and Cisco platforms
Ability to interpret electrical and mechanical drawings (awareness-level competency)
Experience using monitoring, alerting, or automation systems (AI-enabled platforms preferred)
Solid understanding of IT operations concepts including hardware lifecycle management and disaster recovery
Ability to read and update documentation, diagrams, and cable records

Job Responsibility

Serve as the liaison between IT teams and facilities staff, ensuring flawless communication
Interpret electrical one-line diagrams, distribution drawings, and cooling schematics to support incident response and planning
Install, rack, cable, and support enterprise IT systems including Dell PowerEdge, Nutanix, NetApp, and Cisco technologies
Support day-to-day moves, adds, and changes (MACs) in building IDF and VDER environments
Perform fiber and copper patch cabling in data centers, IDFs, and VDER closets
Trace and troubleshoot cabling issues to restore connectivity
Monitor infrastructure, proactively detect issues, and bring up with urgency to appropriate teams
Apply AI-enabled monitoring and automation platforms to enhance data center operations
Maintain documentation of infrastructure layouts, procedures, and operational standards
Participate in capacity planning, disaster recovery drills, and continuous improvement initiatives

What we offer

A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts
A discretionary annual bonus program, or for field sales representatives, a sales-based incentive plan
Stock-based long-term incentives
Award-winning time-off plans
Flexible work models, including remote and hybrid work arrangements, where possible

Fulltime

Select Country

Data Center Production Operations Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Data Center Production Operations Engineer

Data Center Production Operations Engineer

Data Center Production Operations Engineer

Data Center Production Operations Engineer

SiteOps Data Center Production Operations Engineer

Senior Data Center Operations Engineer

Data Center Operations Engineer

Data Center Production Operations Manager

Is Data Center Operations Engineer

Our AI answers in your language