CrawlJobs Logo

Software Engineer, Experimentation Platform - CoreAI

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

100600.00 - 199000.00 USD / Year

Job Description:

CoreAI sits at the center of Microsoft’s mission to redefine how software is built and experienced, providing the foundational platforms, services, and developer experiences that power the next generation of AI-driven applications. As part of CoreAI, the Experimentation Platform (ExP) enables trustworthy, high-scale online experimentation that accelerates product learning and drives progress across Microsoft’s AI ecosystem. You will play a key role in helping teams ship better AI experiences faster by providing the experimentation capabilities needed to evaluate, refine, and safely deploy new innovations. In this role, you will help strengthen one of the highest-scale experimentation platforms - critical infrastructure that enables rapid iteration in AI systems and product features. You will contribute to services that empower engineers and scientists across the company to measure impact, validate hypotheses, and advance state-of-the-art AI capabilities through rigorous experimentation. This is a unique opportunity to build systems at scale while deepening your expertise in distributed systems, service reliability, and experimentation methodologies. You will thrive in this role if you enjoy solving complex distributed systems challenges, learning experimentation fundamentals, and building reliable infrastructure that accelerates Microsoft’s progress in AI.

Job Responsibility:

  • Design, implement, and maintain clean, reliable, testable code using best practices and responsible AI-assisted development while escalating blockers early
  • Use AI tools responsibly across the SDLC, reviewing and validating AI-generated changes to ensure correctness and maintainability
  • Work with partner engineering teams, PMs, and experts (privacy, security, SRE) to understand requirements, apply customer feedback/telemetry, and deliver scalable, reliable, user-centric features
  • Build extensible, maintainable services and features with strong diagnosability, reliability, and production-readiness
  • Participate in on-call rotations, troubleshoot live-site issues using least-privileged access, and improve TSGs, telemetry, and fixes that reduce future incidents
  • Contribute to engineering and operational excellence through automation, tooling, documentation, and process improvements

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Nice to have:

  • Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience working with cloud platforms (Azure, AWS, GCP), building and maintaining distributed systems including deployment, monitoring and troubleshooting of production workloads
  • Experience using observability tools (logging, metrics, tracing) to diagnose service issues and improve system reliability
  • Experience with experimentation platforms, A/B testing at scale, and statistical methodologies for measuring product impact and driving data-informed ship decisions
  • Familiarity with AI-assisted development workflows or responsible use of AI coding tools

Additional Information:

Job Posted:
March 19, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer, Experimentation Platform - CoreAI

Principal/Senior Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Independently leverage AI tools and practices across the software development lifecycle (SDLC), taking responsibility for AI-generated assets and coaching team members to adopt responsible AI-assisted development practices
  • Lead by example to produce extensible, maintainable, well-tested, secure, and performant code
  • apply metrics to drive code quality and stability, and continuously improve code performance, testability, and cost-effectiveness across the team
  • Own and drive the architecture and design of product components, creating design specifications, and ensuring system architecture meets performance, scalability, resiliency, and disaster recovery requirements with minimal technical oversight
  • Collaborate with partner teams, PMs, and subject matter experts (privacy, security, SRE) to determine customer requirements, incorporate feedback, and deliver scalable, reliable features with proper end-to-end testing
  • Drive engineering excellence through automation, tooling improvements, security best practices, and deployment infrastructure
  • Maintain operations of live site services on a rotational on-call basis, implement solutions to complex live-site issues, conduct and present incident postmortems, and proactively improve troubleshooting guides, telemetry, and monitoring to reduce incident volume
  • Fulltime
Read More
Arrow Right

Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience working with cloud platforms (Azure, AWS, GCP), building and maintaining distributed systems including deployment, monitoring and troubleshooting of production workloads
  • Experience using observability tools (logging, metrics, tracing) to diagnose service issues and improve system reliability
  • Familiarity with AI-assisted development workflows or responsible use of AI coding tools
Job Responsibility
Job Responsibility
  • Design, implement, and maintain clean, reliable, testable code using best practices and responsible AI-assisted development while escalating blockers early
  • Use AI tools responsibly across the SDLC, reviewing and validating AI-generated changes to ensure correctness and maintainability
  • Work with partner engineering teams, PMs, and experts (privacy, security, SRE) to understand requirements, apply customer feedback/telemetry, and deliver scalable, reliable, user‑centric features
  • Build extensible, maintainable services and features with strong diagnosability, reliability, and production-readiness
  • Participate in on-call rotations, troubleshoot live-site issues using least-privileged access, and improve TSGs, telemetry, and fixes that reduce future incidents
  • Contribute to engineering and operational excellence through automation, tooling, documentation, and process improvements
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Champion and improve AI tools and practices across the software development lifecycle (SDLC), incorporating appropriate controls over AI-generated assets
  • Lead by example across teams to produce extensible, maintainable, well-tested, secure, and performant code
  • identify and establish coding best practices, create and apply metrics to drive code quality and stability, and mentor engineers to continuously raise the engineering bar
  • Own and lead the architecture of complex product solutions, driving design discussions, evaluating new technologies to solve problems, and ensuring system architecture meets performance, scalability, resiliency and disaster recovery requirements
  • Lead cross-team collaboration to identify dependencies, negotiate delivery schedules, drive alignment across partner teams, and ensure proper end-to-end testing, live-site coverage, scalability and performance before going live
  • Drive engineering excellence across products
  • lead efforts targeting zero-touch deployment, production reliability, and security hardening for both protections and detections
  • Hold accountability as a designated responsible individual (DRI) across products and solutions, mentor engineers on live-site operations, lead incident retrospectives that drive systemic
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - Growth (CoreAI)

We’re building AI‑first growth and experimentation systems that scale across Mic...
Location
Location
United States , Mountain View
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own growth through engineering excellence and experimentation — at a systems level
  • Architect and build paved paths for online experimentation: standardized metrics, guardrails, analysis workflows, and rollout automation that improve reliability and decision quality across teams
  • Lead multi‑workstream initiatives that span teams/products (e.g., unified growth measurement, cross‑surface funnels, experimentation quality improvements)
  • Build and evolve core capabilities: telemetry foundations, experiment assignment/targeting, feature flighting, and risk controls (kill‑switches, guardrails, progressive delivery)
  • Partner with Product, Data Science, Design, and Research to turn ambiguous goals into shippable, measurable systems
  • Stay close to the work: write production code, review designs/PRs, and coach others through architecture and implementation tradeoffs
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 331200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 304200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
What we offer
What we offer
  • Benefits and other compensation
  • Fulltime
Read More
Arrow Right

Principal Product Manager - DevOps AI - CoreAI

Microsoft’s mission is to empower every person and every organization to achieve...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • 10+ years of product management experience building and shipping platform, infrastructure, or developer tooling products
  • Proven experience working across multiple teams and systems to deliver outcomes in complex, highly matrixed organizations
  • Ability to define and drive ambiguous problem spaces, turning strategy and research into concrete, actionable product investments
  • Experience with AI assisted developer workflows, automation, or intelligent systems applied to software engineering
  • Understanding of developer workflows and DevOps systems, including build, test, release, and operational feedback loops
  • Excellent communication and stakeholder management skills, with the ability to align diverse partners around shared goals, tradeoffs, and success metrics
  • Familiarity with enterprise requirements around security, compliance, privacy, and reliability, especially in largescale engineering environments
Job Responsibility
Job Responsibility
  • Drive crosscutting platform investments across the GitHub / Azure DevOps / 1ES ecosystem, with a focus on AI assisted developer productivity across the full engineering lifecycle (inner loop,CI/CD ,operations, governance)
  • Identify high leverage opportunities where AI can meaningfully reduce friction and toil for developers while improving quality, reliability, and consistency of outcomes
  • Define clear problem statements, product bets, and success metrics that balance speed of iteration with trust, safety, and operational requirements
  • Operate in a startup mode, moving quickly from hypothesis to MVP to scaled rollout through rapid experimentation and iteration
  • Partner closely with engineering, security, compliance, privacy, and AI platform teams to ensure solutions are production ready and scalable, not experimental or one off
  • Drive execution across multiple systems and teams in a highly matrixed environment, influencing roadmaps and priorities without direct authority
  • Ensure AI assisted workflows are designed end-to-end, with clear ownership, feedback loops, and failure modes—not isolated point solutions
  • Use data, developer feedback, and operational signals to evaluate impact and continuously improve platform investments
  • Act as a thought partner to engineering and leadership teams on how AI should be applied responsibly within Microsoft’s engineering system
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - GitHub Copilot - CoreAI

Do you want to help shape the future of AI-assisted software development for mil...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, implement, and maintain high-quality desktop client and plugin features for GitHub Copilot in 3rd Party IDEs, including JetBrains IDEs, Xcode, and Eclipse
  • Build AI-powered developer experiences such as code suggestions, chat, contextual assistance, and agentic workflows in IDE environments
  • Drive engineering excellence in client architecture, performance, startup, responsiveness, diagnostics, reliability, and maintainability
  • Partner with teams across time zones, including the China engineering team, to plan, develop, and ship end-to-end product experiences
  • Collaborate with shared platform and service teams to integrate capabilities such as telemetry, experimentation, model support, and context/tool orchestration
  • Ensure that our products and engineering systems meet security, privacy, compliance, and enterprise readiness requirements
  • Investigate and resolve customer and partner issues with urgency, using telemetry, debugging, and root cause analysis to improve product quality at scale
  • Contribute technical leadership through design reviews, code reviews, mentoring, and adoption of engineering best practices
  • Fulltime
Read More
Arrow Right