This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Online Advertising is one of the fastest‑growing businesses on the Internet. Microsoft Ads powers large‑scale deep learning workloads across Search, Recommendations, Click Prediction, and Relevance. Deep learning sits at the core of how Ads drives business performance and delivers high‑quality user experiences. We are building a unified, high‑performance inference platform to serve Ads deep learning models at extreme scale. This platform serves billions of requests daily, with strict requirements on latency, throughput, reliability, and cost. We are seeking a Principal Software Engineer with solid expertise in high‑performance C++ systems and large‑scale distributed serving, with preferred experience in GPU inference and acceleration technologies. You will be a senior technical leader driving the architecture, performance, and reliability of the next‑generation serving stack for Ads.
Job Responsibility:
Design and build a unified inference platform for Ads, ensuring scalability, reliability, and efficiency
Optimize model inference via batching, quantization, scheduling, memory management, runtime optimization, and other performance improvements
Develop, optimize, and maintain performance‑critical components for high‑throughput, low‑latency production inference, including GPU‑accelerated paths when applicable
Collaborate with algorithm/model teams to co‑design serving‑aware model architectures and optimizations
Profile and improve end‑to‑end system performance: concurrency, memory footprint, throughput, and latency
Provide senior technical leadership across teams
elevate engineering best practices and influence long‑term technical strategy
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
6+ years' experience building high‑performance, large‑scale distributed systems or ML infrastructure
Experience building and optimizing performance‑critical production systems
Experience working in Ads, Search, Recommendation systems, or other large‑scale online serving systems
Nice to have:
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Experience with GPU inference runtimes such as TensorRT, ONNX Runtime, Triton, TRT‑LLM, or vLLM
Expertise in CUDA kernel development and GPU performance engineering