This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Senior Critical Environment Technician (CET) - Controls SME in Microsoft’s Cloud Operations & Innovation (CO+I) team, you will maintain the critical infrastructure that keeps our Datacenters up and running. This could be anything from coordinating with supplier/vendors, working closely with Management to address operational, risk and safety situations, mentoring other CE Technicians, having a hands-on understanding on how critical environment equipment works, performing various types of maintenance, responding to onsite incidents while coordinating with other critical facilities professionals, and using telemetry and other platforms to monitor equipment performance and operations.
Job Responsibility:
Understands, follows, and ensures safety and security requirements (e.g., job hazard assessments [JHAs], toolbox talks), and business processes and procedures are met, to properly perform work in a safe, quality, and reliable manner in accordance to applicable Authority Having Jurisdiction (AHJ) regulations, and Microsoft requirements
Processes method statement of work (MSOW) documents
Coordinates activities and associated schedules with contractors
Performs inspections of equipment in a facility
Participates in testing and commissioning activities
Advises engineer partners or project management colleagues on project scope process or execution methodology
Presents for review and approval MSOW in their area of responsibility
Prepares and submits highly complex reports as assigned following preexisting scripts and templates, or using ad hoc methods required to support trending and analysis (e.g., Root Cause Analysis [RCA] reports) and may review prior reports delivered by less experienced team members
Develops methods of operating procedure (MOPs), standard operating procedures (SOPs), and/or digital methods of operating procedures (DMOPs) for highly complex and/or interdependent equipment and disciplines to ensure safe and reliable execution
Reviews completed work using approved tools and procedural templates from less experienced technicians for accuracy and completeness
Completes and provides coaching to support less experienced technicians for mandatory, technical, and procedural training assignments
Analyzes findings from reports and documents observations
Performs various types of maintenance (e.g., planned, predictive, corrective) and repairs for multiple disciplines and multiple equipment types of increasing complexity with no supervision, while serving as a subject matter expert for one discipline - in consideration of Task Hazard Analysis (THA), Method Statement of Work (MSOW), or varying permit requirements
Incorporates security governance frameworks into maintenance practices and drives continuous security improvements
Communicates and/or escalates maintenance activities per established process and procedure
Prioritizes maintenance activities as required and/or appropriate
Documents tasks or issues during maintenance activities within appropriate systems per process and procedure as needed
Provides consultation to colleagues on maintenance and repairs through deep understanding of equipment, systems and their interrelations
Follows recommended maintenance schedules
Oversees everyday, complex, large-scale tasks for a single discipline or equipment across disciplines
Ensures follow up action items are addressed in a timely manner
Masters the maintenance of all systems and equipment in a safe and professional manner and understands levels of risk (LORs) associated with varying types of maintenance across all disciplines
Plans, coordinates, and presents maintenance items for review and approval in their area of responsibility
Serves as an expert on applying security principles to ensure all systems are protected against unauthorized access during maintenance
Acts as a subject matter expert, performing troubleshooting independently for multiple equipment, systems, subsystems, and component types
Documents issues found in troubleshooting process within appropriate systems per process and procedure as needed
Ensures equipment and system settings are consistent with established parameters and designs
Determines when troubleshooting efforts are deemed adequate and communicates or escalates to suppliers, engineers, or more experienced colleagues as needed
Has a hands-on understanding of how equipment in all disciplines work and how to troubleshoot to subsystem level
Provides consultation to less experienced colleagues with troubleshooting systems and problems
Oversees less experienced colleagues, or directly troubleshooting systems and investigates root causes
Ensures that security incidents identified during troubleshooting are addressed promptly and escalated as necessary
Provides necessary escort to third-party contractors, sub-contractors, vendors, and service providers on site based on all procedure levels of risk (LOR), enforcing security requirements for all third-party access and operations
Takes part in getting third-party work underway (e.g., making sure systems are properly energized/deenergized), ensuring the work is started and completed in a safe manner in accordance with standard practices, procedures, and Authority Having Jurisdiction (AHJ) regulations
Ensures work performed by suppliers/vendors is performed to scope, all documentation is performed correctly, and escalates as appropriate
Recognizes circumstances when to stop supplier/vendor work to address potential and/or identified concerns
Coordinates across all LOR applicable to preventative and/or corrective maintenance
Identifies and recommends procedure corrections if/when errors are detected or when appropriate
Coordinates and schedules supplier/vendor on-site activities
Coordinates with vendor to schedule maintenance and determines availability of equipment/parts, as directed
Resolves or escalates observed vendor quality issues
May review and approve vendor supplier field service reports, invoices, and work orders
Serves as an expert in the inspection and supervision of critical environment-related facility equipment (e.g., controls, heating, ventilation, and air conditioning [HVAC], mechanical systems), building, and grounds for unsafe or abnormal conditions
Understands critical system alarms for multiple discipline(s) of equipment, their meanings, and engages with appropriate escalation processes or procedures
Recognizes circumstances where execution would be considered safe to proceed
Performs various inspections and validations of equipment performance
Monitors the performance from central monitoring locations (e.g., Facility Operations Centers) of maintenance and operations of equipment (e.g., electrical, mechanical, fire/life safety) and understands risks or impacts to other subsystems across the data center
Escalates per applicable policies and standards
Utilizes telemetry, control systems, and other platforms to monitor site status, analyze past and current events, as well as other processes, and can identify all alarms
Uses technical expertise, prior experience, and device analytics to recognize trends with equipment behavior and checks potential issues as they arise
Advises less experienced colleagues on issues found while monitoring applicable CE systems
Performs all monitoring equipment repair, replacement, and maintenance work, which meets or exceeds Microsoft Service Level Agreement (SLA) requirements
Uses data trends to develop or produce predictive analyses of equipment performance
Safety and quickly responds to and leads an onsite incident response team for all abnormal conditions that impact operations, and coordinates with other critical facilities professionals to perform corrective repairs, without supervision
Gathers necessary information and creates incident timelines/data, root-cause analyses, and/or action items following an abnormal condition as required
Identifies and contacts/engages appropriate parties and security points of contact to mitigate incidents as they occur
Develops new or follows preexisting emergency operating procedures (EOPs), methods of procedure (MOPs), standard operating procedures (SOPs), and digital methods of operating procedures (DMOPs) in relation to incidents
Directly provides and/or leads and coordinates emergency monitoring response plans for irregular or malfunctioning conditions
Serves as technical expert in ensuring emergency operating procedures (EOPs) are consistent with proper incident response
Works on complex, advanced tasks (e.g., stabilization, resolution, recovery) independently
Serves as a subject matter expert in critical environments-related systems within the data center, advises less experienced colleagues on such topics, and provides oversight and training/mentorship to team members on tasks regarding these subsystems (e.g., electrical, mechanical, controls, generators)
Serves as a resource for less experienced team members to incorporate security-first principles into daily tasks, ensuring all equipment operations prioritize identifying potential security threats and mitigating them immediately
Demonstrates an understanding of and operates equipment and systems across all disciplines (e.g., electrical, mechanical, controls) with knowledge of the interactions between them and overall operation of a data center
Operates all systems and equipment in a safe and professional manner in alignment with Microsoft standards
Utilizes internal computerized maintenance management system (CMMS) to track all equipment assets and to complete work order requests for maintenance work in accordance with all Microsoft security policies
Tracks hours for performed tasks within applicable task management systems
Tracks utilization and time tracking results for team members, within applicable task management systems, as needed
Guides and coaches team in CMMS usage best practices
Adds required data, documents, logs changes, and upkeeps procedures related to building management systems and reports
Properly signals spare equipment and parts utilization within maintenance work orders
Requirements:
High School Diploma, GED, or equivalent
3+ years mission critical services work/applied learning experience (e.g., high availability assembly/manufacturing/critical infrastructure environments such as data centers, oil and gas refineries, hospitals, pharmaceutical, manufacturing, or related fields) OR equivalent experience
1+ year(s) experience in a specialized area (e.g., mechanical field, electrical field, controls field) or related field
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Ability to meet Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Associate's Degree or technical trade certification (e.g., military, trade school), or higher-equivalent education AND 4+ years mission-critical services experience (e.g., high-availability assembly/manufacturing/critical infrastructure environments such as data centers, oil and gas refineries, hospitals, pharmaceutical, manufacturing, or related fields) OR High School Diploma, GED, or equivalent AND 5+ years mission critical services experience (e.g., high-availability assembly/manufacturing/critical infrastructure environments such as data centers, oil and gas refineries, hospitals, pharmaceutical, manufacturing, or related fields) OR equivalent experience
3 + year(s) experience with knowledge and understanding of the critical electrical and mechanical systems of a facility (e.g. generators, Uninterruptable Power Supply [UPS] systems, static transfer systems, chillers, air handlers, controls) their interdependencies, and how they perform to support critical IT load
Experience leading or coordinating controls-related programs/projects within critical environments (BMS, EPMS, PLCs, automation systems)
Ability to manage multi-vendor controls scopes, schedules, and dependencies across construction, commissioning, and live operations
Strong understanding of change management, configuration control, and documentation standards for controls systems
Proven ability to translate operational requirements into executable controls roadmaps and milestones