This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Microsoft 365(M365) Intelligent Conversation and Communications Cloud is the platform that powers billions of real-time customer conversations across Teams, Skype, Dynamics, and third-party solutions (through Azure Communication Services). The platform enables reliable and high-quality messaging, meeting, and audio/video calling services that work every time, from anywhere, seamlessly across all customer touchpoints. Conversations on our platforms are made more intelligent in real-time, empowering best-in-class productivity tools for the modern workplace where every call, meeting, or chat will make the next one better. The IC3 AI Ops team is a newly formed engineering group within Microsoft’s Intelligent Communications and Collaboration (IC3) organization, focused on transforming operational reliability through AI-driven automation and data intelligence. We are looking for Senior Software Engineer. Our mission is to automate incident management, reduce KTLO (Keep The Lights On) work, and improve customer reliability across IC3 and carrier ecosystems. We are building intelligent systems that proactively detect anomalies, accelerate root cause analysis, and streamline mitigation workflows. Key focus areas include: AI-powered anomaly detection and RCA automation; Copilot agents for operational triage and decision support; Centralized SRE workflows for incident handling; Data-driven insights to reduce Time to Mitigate (TTM) and Customer Reliability Incidents (CRI); Integration with carrier operations for global service health and regulatory compliance. This initiative is a cornerstone of IC3’s Quality Excellence Initiative (QEI) and aligns with Microsoft’s broader goals to deliver resilient, customer-obsessed services at scale.
Job Responsibility:
Design and develop large-scale distributed services using modern engineering practices.
Architect systems with well-defined interfaces and leverage telemetry data for decision-making. Ensure services are modular, secure, reliable, diagnosable, monitored, and reusable.
Improve test coverage, implement integration tests, and resolve problem areas.
Collaborate with cross-functional teams to co-develop scalable, impactful solutions.
Build reusable engineering tools that boost service health, reduce operational overhead, and empower teams with actionable insights.
Enhance observability across business-critical services to accelerate detection and diagnosis of issues.
Strengthen on-call effectiveness by modernizing incident response workflows and leveraging intelligent systems.
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have:
Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Experience working in large-scale enterprise environments.
Experience building or operating observability platforms (monitoring, logging, tracing) and applying AI/ML to anomaly detection or root cause analysis.
Passion for building reliable and performant systems.
Demonstrated ability to design and implement automated solutions that reduce manual effort.
Familiarity with cloud platforms (Azure/AWS/GCP) and microservices architectures.
Knowledge of AI/ML concepts and practical experience integrating AI-driven features into engineering workflows.
Good understanding of distributed systems and microservices architecture.