Software Engineer, Experimentation Platform - CoreAI Job at Microsoft Corporation (Redmond)

Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...

Location

United States , Redmond

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Experience working with cloud platforms (Azure, AWS, GCP), building and maintaining distributed systems including deployment, monitoring and troubleshooting of production workloads
Experience using observability tools (logging, metrics, tracing) to diagnose service issues and improve system reliability
Familiarity with AI-assisted development workflows or responsible use of AI coding tools

Job Responsibility

Design, implement, and maintain clean, reliable, testable code using best practices and responsible AI-assisted development while escalating blockers early
Use AI tools responsibly across the SDLC, reviewing and validating AI-generated changes to ensure correctness and maintainability
Work with partner engineering teams, PMs, and experts (privacy, security, SRE) to understand requirements, apply customer feedback/telemetry, and deliver scalable, reliable, user‑centric features
Build extensible, maintainable services and features with strong diagnosability, reliability, and production-readiness
Participate in on-call rotations, troubleshoot live-site issues using least-privileged access, and improve TSGs, telemetry, and fixes that reduce future incidents
Contribute to engineering and operational excellence through automation, tooling, documentation, and process improvements

Fulltime

Principal/Senior Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Independently leverage AI tools and practices across the software development lifecycle (SDLC), taking responsibility for AI-generated assets and coaching team members to adopt responsible AI-assisted development practices
Lead by example to produce extensible, maintainable, well-tested, secure, and performant code
apply metrics to drive code quality and stability, and continuously improve code performance, testability, and cost-effectiveness across the team
Own and drive the architecture and design of product components, creating design specifications, and ensuring system architecture meets performance, scalability, resiliency, and disaster recovery requirements with minimal technical oversight
Collaborate with partner teams, PMs, and subject matter experts (privacy, security, SRE) to determine customer requirements, incorporate feedback, and deliver scalable, reliable features with proper end-to-end testing
Drive engineering excellence through automation, tooling improvements, security best practices, and deployment infrastructure
Maintain operations of live site services on a rotational on-call basis, implement solutions to complex live-site issues, conduct and present incident postmortems, and proactively improve troubleshooting guides, telemetry, and monitoring to reduce incident volume

Fulltime

Principal Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Champion and improve AI tools and practices across the software development lifecycle (SDLC), incorporating appropriate controls over AI-generated assets
Lead by example across teams to produce extensible, maintainable, well-tested, secure, and performant code
identify and establish coding best practices, create and apply metrics to drive code quality and stability, and mentor engineers to continuously raise the engineering bar
Own and lead the architecture of complex product solutions, driving design discussions, evaluating new technologies to solve problems, and ensuring system architecture meets performance, scalability, resiliency and disaster recovery requirements
Lead cross-team collaboration to identify dependencies, negotiate delivery schedules, drive alignment across partner teams, and ensure proper end-to-end testing, live-site coverage, scalability and performance before going live
Drive engineering excellence across products
lead efforts targeting zero-touch deployment, production reliability, and security hardening for both protections and detections
Hold accountability as a designated responsible individual (DRI) across products and solutions, mentor engineers on live-site operations, lead incident retrospectives that drive systemic

Fulltime

New

Principal Software Engineer, CoreAI

The CoreAI AI Platform team is seeking a Principal Software Engineer in Redmond,...

Location

United States , Redmond

Salary:

142800.00 - 304200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.

Job Responsibility

Lead the architecture and implementation of large-scale platform services that support complex engineering and AI workflows in distributed cloud environments
Build internal tooling and automation that improve productivity for engineers and researchers across experimentation, deployment, and operational workflows
Design platform capabilities that make data easier to discover, access, and use in secure, governed, and auditable ways
Drive operational excellence through improvements in reliability, observability, deployment safety, and incident readiness
Partner across teams to resolve cross-cutting technical problems and align architecture, engineering standards, and long-term investments
Mentor engineers, contribute to technical reviews, and help raise the engineering bar across the organization

Fulltime

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

119800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
Strong collaboration and communication skills, with the ability to work across organizational boundaries.

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.

What we offer

Benefits and other compensation

Fulltime

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

139900.00 - 331200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
Strong problem-solving skills and the ability to debug complex, cross layer systems issues
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
Strong collaboration and communication skills, with the ability to work across organizational boundaries

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability

Fulltime

Principal Software Engineer - Growth (CoreAI)

We’re building AI‑first growth and experimentation systems that scale across Mic...

Location

United States , Mountain View

Salary:

163000.00 - 296400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Own growth through engineering excellence and experimentation — at a systems level
Architect and build paved paths for online experimentation: standardized metrics, guardrails, analysis workflows, and rollout automation that improve reliability and decision quality across teams
Lead multi‑workstream initiatives that span teams/products (e.g., unified growth measurement, cross‑surface funnels, experimentation quality improvements)
Build and evolve core capabilities: telemetry foundations, experiment assignment/targeting, feature flighting, and risk controls (kill‑switches, guardrails, progressive delivery)
Partner with Product, Data Science, Design, and Research to turn ambiguous goals into shippable, measurable systems
Stay close to the work: write production code, review designs/PRs, and coach others through architecture and implementation tradeoffs

Fulltime

Senior Software Engineer - GitHub Copilot - CoreAI

Do you want to help shape the future of AI-assisted software development for mil...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Design, implement, and maintain high-quality desktop client and plugin features for GitHub Copilot in 3rd Party IDEs, including JetBrains IDEs, Xcode, and Eclipse
Build AI-powered developer experiences such as code suggestions, chat, contextual assistance, and agentic workflows in IDE environments
Drive engineering excellence in client architecture, performance, startup, responsiveness, diagnostics, reliability, and maintainability
Partner with teams across time zones, including the China engineering team, to plan, develop, and ship end-to-end product experiences
Collaborate with shared platform and service teams to integrate capabilities such as telemetry, experimentation, model support, and context/tool orchestration
Ensure that our products and engineering systems meet security, privacy, compliance, and enterprise readiness requirements
Investigate and resolve customer and partner issues with urgency, using telemetry, debugging, and root cause analysis to improve product quality at scale
Contribute technical leadership through design reviews, code reviews, mentoring, and adoption of engineering best practices

Fulltime

Select Country

Software Engineer, Experimentation Platform - CoreAI

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Software Engineer, Experimentation Platform - CoreAI

Software Engineer, Experimentation Platform - CoreAI

Principal/Senior Software Engineer, Experimentation Platform - CoreAI

Principal Software Engineer, Experimentation Platform - CoreAI

Principal Software Engineer, CoreAI

Senior Software Engineer, CoreAI Workload Engines

Principal Software Engineer, CoreAI Workload Engines

Principal Software Engineer - Growth (CoreAI)

Senior Software Engineer - GitHub Copilot - CoreAI

Our AI answers in your language