This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The M365 Core Foundation team is at the heart of Microsoft 365’s infrastructure, powering mission-critical services that serve billions of users. We are seeking talents to help design and implement an intelligent ultra-large-scale Load Balancing system. This system will utilize machine learning, reinforcement learning, and LLMs to analyze the real-time system status data. It will make intelligent load balancing decisions to ensure optimal shard placement and system resource allocation with the highest efficiency and precision.
Job Responsibility:
Build a global optimization engine that generates placement plans (blueprint) to avoid hotspots based on resource constraints, move cost, and availability goals
Design and implement centralized algorithms to maximize the utilization of system throughput resources, ensuring swift mitigation of hotspots
Collaborate with distributed LB execution layers to coordinate plan rollout and shard migration
Partner with Performance, Capacity, Search and Copilot teams to align placement strategies with performance and SLA targets
Contribute to observability and debugging tools to ensure transparency and traceability of placement decisions
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
3+ years of software development experience, including 2+ years in distributed systems or infrastructure
Solid coding skills in C#, C++, or Python
Experience with large-scale scheduling, optimization, or resource management systems
Familiarity with cloud platforms (Azure preferred) and telemetry pipelines
Experience applying ML models or leveraging LLMs to optimize decision-making is preferred
Excellent problem-solving and collaboration skills
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Experience with telemetry systems and QoS metric analysis
Experience applying ML, RL, and LLMs to optimize decision-making