This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. Our cloud datacenters run 24x7 and depend on reliable electrical and mechanical telemetry to operate safely, efficiently, and at scale. As a Senior Critical Environment Telemetry Engineer, you will own end-to-end delivery and lifecycle health of telemetry pipelines from Critical Environment (CE) systems, such as power and cooling, into Microsoft’s telemetry platforms. This role is ideal for an experienced industrial controls / automation engineer who can troubleshoot down to packet/register-level details and partner across engineering, operations, integrators, and vendors to deliver resilient, high-quality signals for mission-critical operations.
Job Responsibility:
Deliver telemetry onboarding and operations at scale: Configure, validate, and maintain high-availability telemetry from CE systems (electrical + mechanical) using industrial control and monitoring systems (e.g., SCADA, EPMS, BMS/BAS).
Deep troubleshooting and root cause analysis: Diagnose issues across control networks and telemetry stacks (field devices → gateways/connectors → SCADA/servers → cloud ingestion), including protocol-level troubleshooting (e.g., Modbus/BACnet register mapping, comms reliability, polling performance, device addressing).
Commissioning, integration, and quality: Support commissioning activities and integrations for new/retrofit sites
ensure telemetry meets defined quality standards (accuracy, completeness, timeliness, stability) before release to production stakeholders.
Operate with live-site accountability: Partner with datacenter operations and engineering teams to resolve incidents, drive corrective actions, and prevent recurrence
participate in an on-call rotation as needed to support mission-critical environments.
Partner and lead cross-functionally: Collaborate with internal teams (operations, engineering, platform/software, security), system integrators, and equipment vendors to deliver telemetry solutions and resolve systemic issues.
Improve standards and automation: Contribute to standard telemetry architectures, repeatable onboarding playbooks, and automated validation/troubleshooting approaches
identify gaps and propose roadmap improvements.
Documentation and enablement: Create and maintain clear technical documentation (site onboarding guides, point mapping standards, troubleshooting runbooks) and train engineers/operators on telemetry systems and best practices.
Security and reliability mindset: Apply strong operational rigor for OT systems (change control, access management, resilient designs, safe rollout practices) to protect uptime and reduce risk.
Requirements:
Bachelor's Degree in Mechanical Engineering, Electrical Engineering, Controls/Automation Engineering, Industrial Engineering or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have:
Data center controls domain depth: Prior experience in data center controls, including BMS and EPMS lifecycle support, troubleshooting, and project delivery.
Controls engineering tooling: Experience with PLC/HMI/SCADA development or modification (logic, graphics, alarming, historians), including commissioning practices such as FAT/SAT and structured validation.
OT networking & reliability: Understanding of OT networking fundamentals (segmentation, redundancy, routing basics), and the ability to troubleshoot field network issues that impact telemetry stability and latency.
Scripting and data analysis: Ability to use tools like Python/PowerShell, SQL, or equivalent to automate validation, analyze large telemetry datasets, and speed root-cause investigation.
Operational excellence: Experience building repeatable standards, reducing incident rates, improving telemetry quality KPIs, and contributing to continuous improvement in mission-critical operations.
Safety and compliance familiarity: Exposure to industrial safety/compliance practices and standards relevant to controls environments (e.g., change control rigor, audit readiness, secure configuration practices).
Broader industry controls experience: Background in utilities/energy & renewables, oil & gas pipelines, manufacturing automation, chemical/pharma processing, aerospace/automotive test systems, or robotics, especially where uptime, safety, and data integrity are critical.