This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Modern ads platforms run on always-on, real-time data: streaming events, feature computation, near-real-time aggregations, and low-latency serving to power ML models that operate at massive scale under strict freshness, cost, and reliability requirements. Microsoft Ads builds and operates large-scale, latency-sensitive systems that serve billions of requests. We are looking for a Principal Software Engineer who is hands-on with production coding and system design to build the real-time data pipelines and feature/embedding materialization systems that feed online stores/caches and integrate tightly with ML inference serving. This role is ideal for engineers who enjoy: building robust streaming + ETL systems (correctness, idempotency, backfills, late data), owning SLOs with strong observability and operational maturity, and optimizing end-to-end performance and cost across compute, storage, and serving integrations. Primary success metrics are freshness, correctness, latency, reliability, and cost in production.
Job Responsibility:
Design and implement real-time streaming ETL / feature pipelines (e.g., Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints
Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling)
Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout
Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness
Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls, online/offline consistency patterns)
Partner with applied scientists on feature/embedding definitions, validation, and end-to-end quality measurement
Optimize end-to-end performance and efficiency: CPU/memory/I/O, serialization, caching, network overhead, concurrency, and pipeline compute cost
Contribute to serving/inference integrations where needed (e.g., Triton/ONNX Runtime/TensorRT) including batching and latency/cost tradeoffs
Ship safely with CI/CD, automated testing (unit/integration/data quality), and operational playbooks/runbooks
Requirements:
Bachelor’s or Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, with 8+ years of related experience
Strong programming skills in language C++,C# or Python (at least one required)
Hands-on experience in one or more: Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming), Distributed systems engineering with strong reliability and operational rigor, Messaging systems such as Kafka/Pulsar
Experience operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks)
Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana
Ability to debug complex production issues using logs/metrics/traces and performance profiling
Strong communication and collaboration skills, with experience working across engineering, applied science/ML, and product/business stakeholders
Nice to have:
Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation)
Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing
Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction)
understanding of pipeline correctness patterns: idempotency, dedup, watermarking, late data, exactly-once vs at-least-once tradeoffs
Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms
Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are jointly optimized