CrawlJobs Logo

Data Center Production Operations Engineer

United States, Temple Employment contract 70990.00 - 103002.00 USD / Year · Job Posted June 15, 2026
Apply Position
Job Link Share

Job Description

Meta is seeking a Data Center Production Operations Engineer to support the reliability, efficiency, and scalability of our global data center infrastructure. In this role, you will perform hands-on server hardware operations, including deployment, maintenance, troubleshooting, and decommissioning of production server fleets that power Meta's family of apps and services. You will work within established operational procedures to ensure data center systems meet performance and availability standards, collaborating closely with infrastructure engineering, facilities, and supply chain teams to keep production environments running at scale.

Job Responsibility

  • Work within Meta's ticketing system
  • First point of contact for break fix technicians
  • Responsible for assisting with projects (retrofits, new process details, etc.) and repairs throughout the data center
  • Understand and debug hardware and Linux OS related issues
  • Identify and help create documentation for the global data center knowledge base
  • Assist with process improvements and best practices in data center operations
  • Participate in on-call rotation (once a month on call for a week, after hours, first point of contact)

Requirements

  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • Currently has, or is in the process of obtaining, a Bachelor's or Master's degree in technical field, or equivalent experience/certification
  • Knowledge of Linux and server hardware support
  • Working knowledge and experience in at least one of the following core areas: Networking, Programming/Scripting, Hardware, or OS repair
  • Solid communication skills are a requirement for this role

Nice to have

  • Experience modifying and developing in Python, SQL, and/or shell scripting
  • Working conceptual knowledge of technologies such as HTTP, DNS, RAID, and DHCP

What we offer

  • bonus
  • equity
  • benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Data Center Production Operations Engineer

8 matching positions

Data Center Production Operations Engineer

Meta is seeking a forward thinking experienced engineer to join the Production O...
Location
Location
United States , New Albany
Salary
Salary:
53.37 - 76.44 USD / Hour
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, BA or BEng in technical field or commensurate experience
  • 7+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
  • Expert in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
  • Hands-on experience and knowledge of server hardware and components, including storage
  • Expert knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
  • Experience managing multiple technical issues concurrently driving to the root cause
  • Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation. Brings peers, partners and other resources into the project where additional expertise is needed, and to provide growth and learning opportunities for others
  • Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
  • Deep technical knowledge of technologies such as HTTP, DNS, RAID, and DHCP
  • Experience in providing technical guidance to external vendors
Job Responsibility
Job Responsibility
  • Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
  • Perform deep dives and root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues
  • Facilitate collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
  • Lead the introduction of new platforms and hardware to the site and geographical area, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
  • Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
  • Drive corrective actions of complex hardware issues, work with internal teams and vendors
  • provide an ownership stake, and influence future design changes to ensure ease of serviceability
  • Solve complex and systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
  • Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
  • Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Data Center Production Operations Engineer

Meta is seeking a Production Operations Engineer looking to apply their technica...
Location
Location
United States , Mesa, AZ +9 locations
Salary
Salary:
34.13 - 49.52 USD / Hour
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • Currently has, or is in the process of obtaining, a Bachelor's or Master's degree in technical field, or equivalent experience/certification
  • Knowledge of Linux and server hardware support
  • Working knowledge and experience in at least one of the following core areas: Networking, Programming/Scripting, Hardware, or OS repair
  • Solid communication skills are a requirement for this role
Job Responsibility
Job Responsibility
  • Work within Meta's ticketing system
  • First point of contact for break fix technicians
  • Responsible for assisting with projects (retrofits, new process details, etc.) and repairs throughout the data center
  • Understand and debug hardware and Linux OS related issues
  • Identify and help create documentation for the global data center knowledge base
  • Assist with process improvements and best practices in data center operations
  • Participate in on-call rotation (once a month on call for a week, after hours, first point of contact)
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Data Center Production Operations Engineer

Meta is seeking a forward thinking experienced engineer to join the Production O...
Location
Location
Singapore
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, BA or BEng in technical field or commensurate experience
  • 7+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
  • Expert in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
  • Hands-on experience and knowledge of server hardware and components, including storage
  • Experience of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
  • Experience managing multiple technical issues concurrently driving to the root cause
  • Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation. Brings peers, partners and other resources into the project where additional expertise is needed, and to provide growth and learning opportunities for others
  • Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
  • Extensive technical knowledge of technologies such as HTTP, DNS, RAID, and DHCP
  • Experience in providing technical guidance to external vendors
Job Responsibility
Job Responsibility
  • Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
  • Perform in-depth exploration and root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues
  • Facilitate collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
  • Lead the introduction of new platforms and hardware to the site and geographical area, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
  • Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
  • Drive corrective actions of complex hardware issues, work with internal teams and vendors
  • provide an ownership stake, and influence future design changes to ensure ease of serviceability
  • Solve complex and systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
  • Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
  • Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements
Read More
Arrow Right

SiteOps Data Center Production Operations Engineer

Meta is seeking a forward thinking experienced engineer to join the Production O...
Location
Location
United States , Los Lunas
Salary
Salary:
40.38 - 62.50 USD / Hour
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, BA or BEng in technical field or commensurate experience
  • 5+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
  • Intermediate-level understanding in Linux (or equivalent OS) in a complex IT environment with the capacity to triage, debug, and troubleshoot server issues
  • Hands-on experience and knowledge of server hardware and components, including storage
  • Intermediate-level knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
  • Experience managing technical issues and driving to the root cause
  • Experience participating in technical projects related to areas such as process improvement, technology, and/or automation
  • Capacity to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
  • Intermediate-level knowledge of technologies such as HTTP, DNS, RAID, and DHCP
  • Experience in providing technical guidance to external vendors
Job Responsibility
Job Responsibility
  • Support platform health by successfully resolving and closing tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
  • Participate in root cause analysis of highly technical issues within the data center, ranging from automated tooling to hardware failures and network issues
  • Collaborate with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
  • Point of contact for the introduction of new platforms and hardware to the site, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
  • Use tools and data analysis effectively to identify issues. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
  • Identify corrective actions of hardware issues, work with internal teams and vendors
  • influence future design changes to ensure ease of serviceability
  • Solve systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
  • Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
  • Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Data Center Operations Engineer

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...
Location
Location
United States , Vernon
Salary
Salary:
128000.00 - 170000.00 USD / Year
lambda.ai Logo
Lambda
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with critical infrastructure systems supporting data centers (power distribution, air flow management, environmental monitoring, capacity planning, DCIM software, structured cabling, cable management)
  • Familiar with carrier DIA circuit test and turn ups, understanding LOA’s, and fiber testing and troubleshooting
  • Solid understanding of cable, fiber, and optics and their different use cases
  • Solid understanding of single and three phase power theories including PDU balancing
  • Base level network fundamentals (CCNA preferred but not required)
  • Knowledge of cold aisle and hot aisle containment
  • Solid understanding of server hardware and boot process (PXE, DHCP, & TFTP)
  • Work with product management, support, and other teams to align operational capabilities with company goals
  • Translating business priorities into technical and operational requirements
  • Supporting cross-functional projects where infrastructure plays a critical role
Job Responsibility
Job Responsibility
  • Ensure new server, storage and network infrastructure is properly racked, labeled, cabled, and configured
  • Troubleshoot hardware and software issues in some of the world’s most advanced GPU and Networking systems
  • Document and update data center layout and network topology in DCIM software
  • Work with supply chain & manufacturing teams to ensure timely deployment of systems and project plans for large-scale deployments
  • Manage a parts depot inventory and track equipment through the delivery-store-stage-deploy-handoff process in each of our data centers
  • Partner with HW Support teams to ensure data center hardware incidents with higher level troubleshooting challenges are resolved, reported on and solutions are disseminated to the large operations organization
  • Work with the RMA team to ensure faulty parts are returned and replacements are ordered
  • Follow installation standards and documentation for placement, labeling, and cabling to drive consistency and discoverability across all data centers
  • Improve installation standards, MOPs, and runbooks
  • Act as a technical escalation point for DC infrastructure issues
What we offer
What we offer
  • Generous cash & equity compensation
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible paid time off plan
  • Fulltime
Read More
Arrow Right

Data Center Operations Engineer

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...
Location
Location
United States , Vernon
Salary
Salary:
109000.00 - 145000.00 USD / Year
lambda.ai Logo
Lambda
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with critical infrastructure systems supporting data centers (power distribution, air flow management, environmental monitoring, capacity planning, DCIM software, structured cabling, cable management)
  • Familiar with carrier DIA circuit test and turn ups, understanding LOA’s, and fiber testing and troubleshooting
  • Solid understanding of cable, fiber, and optics and their different use cases
  • Solid understanding of single and three phase power theories including PDU balancing
  • Base level network fundamentals (CCNA preferred but not required)
  • Knowledge of cold aisle and hot aisle containment
  • Solid understanding of server hardware and boot process (PXE, DHCP, & TFTP)
  • Work with product management, support, and other teams to align operational capabilities with company goals
  • Translating business priorities into technical and operational requirements
  • Supporting cross-functional projects where infrastructure plays a critical role
Job Responsibility
Job Responsibility
  • Ensure new server, storage and network infrastructure is properly racked, labeled, cabled, and configured
  • Troubleshoot hardware and software issues in some of the world’s most advanced GPU and Networking systems
  • Document and update data center layout and network topology in DCIM software
  • Work with supply chain & manufacturing teams to ensure timely deployment of systems and project plans for large-scale deployments
  • Manage a parts depot inventory and track equipment through the delivery-store-stage-deploy-handoff process in each of our data centers
  • Partner with HW Support teams to ensure data center hardware incidents with higher level troubleshooting challenges are resolved, reported on and solutions are disseminated to the large operations organization
  • Work with the RMA team to ensure faulty parts are returned and replacements are ordered
  • Follow installation standards and documentation for placement, labeling, and cabling to drive consistency and discoverability across all data centers
  • Improve installation standards, MOPs, and runbooks
  • Act as a technical escalation point for DC infrastructure issues
What we offer
What we offer
  • Generous cash & equity compensation
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible paid time off plan
  • Fulltime
Read More
Arrow Right

Data Center Production Operations Manager

Meta is seeking a forward thinking experienced individual to join the Data Cente...
Location
Location
United States , Houston
Salary
Salary:
135000.00 - 191000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or BA in technical field or commensurate experience
  • 10+ years experience in high availability technology environments working with cross functional teams
  • 4+ years experience managing teams of technical resources including people and performance management responsibilities
  • Knowledge with Linux and hardware systems support in an Internet operations environment
  • Familiarity with Python, SQL and/or shell scripting knowledge
  • Solid knowledge of enterprise level infrastructure
  • Understanding of out-of-band/lights-out server communication methods, such as IPMI and serial console
  • Proven time and project management skills
  • Having depth and breadth of knowledge of managing servers in a large-scale distributed environment is a core competency of this individual
Job Responsibility
Job Responsibility
  • Managing a Data Center Operations Team accountable for the maintenance and operation of server hardware and supporting infrastructure at scale
  • Accountable for the health of server capacity delivering Meta's products and services from the data center site, and for ensuring operational delivery through collaboration and partnership with peer organizations
  • Work with peer organizations and regional teams that affect and deliver services to data center operations such as network operations, project management, facilities/maintenance management, logistics, hardware design, automated tooling and supply chain operations in order to successfully maintain data center uptime to enable ongoing business growth
  • Mentoring and developing engineers and technicians such that they can run daily operations with minimal supervision
  • Lead a high-quality data center operations team, with a broad range of experiences, perspectives, and backgrounds, developing both the technical and leadership qualities of engineers and technicians
  • Collaborating with other Production Operations Managers in data center sites around the globe to evolve and optimize processes and approaches in a globally consistent way to allow Meta to scale and grow effectively
  • Creating and driving a work environment of ownership, innovation, collaboration, accountability, and safety. Support and contribute thought leadership to the development and implementation of business practices, process and automated tooling which enables the growth and ongoing management of our global data center IT footprint
  • Manage server upgrades, integration, automated OS provisioning process, rebuilds and other projects as required. Understand and debug network, hardware, and Linux OS related issues
  • Identify and participate in the creation of documentation for the global DC knowledge base. Implement process improvements and inform best practices in data center operations
  • Predicting data center growth and scaling issues before they occur and implement solutions
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Is Data Center Operations Engineer

Bridging Information Technology (IT) and the Mechanical, Electrical, and Plumbin...
Location
Location
United States , New Albany
Salary
Salary:
91731.00 - 114948.00 USD / Year
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s degree
  • Bachelor’s degree and 2 years of data center operations experience
  • Associate’s degree and 6 years of data center operations experience
  • High school diploma / GED and 8 years of data center operations experience
  • Hands-on experience with rack/stack, structured cabling, and IT hardware installation
  • Familiarity with Dell PowerEdge, Nutanix, NetApp, and Cisco platforms
  • Ability to interpret electrical and mechanical drawings (awareness-level competency)
  • Experience using monitoring, alerting, or automation systems (AI-enabled platforms preferred)
  • Solid understanding of IT operations concepts including hardware lifecycle management and disaster recovery
  • Ability to read and update documentation, diagrams, and cable records
Job Responsibility
Job Responsibility
  • Serve as the liaison between IT teams and facilities staff, ensuring flawless communication
  • Interpret electrical one-line diagrams, distribution drawings, and cooling schematics to support incident response and planning
  • Install, rack, cable, and support enterprise IT systems including Dell PowerEdge, Nutanix, NetApp, and Cisco technologies
  • Support day-to-day moves, adds, and changes (MACs) in building IDF and VDER environments
  • Perform fiber and copper patch cabling in data centers, IDFs, and VDER closets
  • Trace and troubleshoot cabling issues to restore connectivity
  • Monitor infrastructure, proactively detect issues, and bring up with urgency to appropriate teams
  • Apply AI-enabled monitoring and automation platforms to enhance data center operations
  • Maintain documentation of infrastructure layouts, procedures, and operational standards
  • Participate in capacity planning, disaster recovery drills, and continuous improvement initiatives
What we offer
What we offer
  • A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts
  • A discretionary annual bonus program, or for field sales representatives, a sales-based incentive plan
  • Stock-based long-term incentives
  • Award-winning time-off plans
  • Flexible work models, including remote and hybrid work arrangements, where possible
  • Fulltime
Read More
Arrow Right