This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join us to build the future of AI-powered observability for one of the world’s largest hyperscale data platforms supporting Office 365 services. You’ll work on mission-critical systems that process petabytes of data daily, enabling real-time diagnostics and predictive insights for hundreds of millions of users worldwide. In this role, you’ll develop SDKs and frameworks to create a unified Signal Fabric with consistent, self-registering, semantically rich telemetry across languages; transform traditional dashboards into AI-first observability streams; and drive automation through feedback loops that continuously improve signal quality, coverage, and relevance. If you’re passionate about distributed systems, cloud-scale engineering, and building AI-driven solutions that shape the future of service reliability and user experience, this is your opportunity to make a global impact.
Job Responsibility:
Design and build Signal Fabric: Develop SDKs and frameworks that ensure consistent, self-registering, semantically rich telemetry across multiple languages
Enable AI-first observability: Transform traditional dashboards and manual triage into machine-readable, self-improving telemetry streams optimized for AI systems
Scale hyperscale data ingestion: Work on systems that process petabytes of mission-critical data daily, powering real-time diagnostics and predictive insights for Office 365 services
Drive automation and intelligence: Implement feedback loops to continuously improve signal quality, coverage, and relevance, reducing iteration cost and accelerating insight cycles
Collaborate globally: Partner with engineering, product, and data science teams to deliver actionable insights that improve service quality, user experience, and feature usage
Innovate for impact: Prepare observability for the AI era by building solutions that empower hundreds of millions of users and influence Microsoft’s cloud reliability strategy
Responsible for the daily operations and maintenance of the live service and ensure quality of service
Requirements:
Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Solid coding skills (No requirements on certain language, assuming the candidate can demonstrate fast learning)
Solid Computer Science fundamentals
Solid problem analysis and solving skills
Solid communication skills (Good written English, Avg+ oral English)
Passionate to solve hard problems and AI-driven automation
Passionate to learn new skills/knowledge
Nice to have:
Knowledge and experience of distributed system, large-scale big data platform technologies is a plus
Knowledge and experience of performance tuning is a plus
Master's Degree in Computer Science or related technical field with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 1+ year(s) technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience