CrawlJobs Logo

Principal Software Engineer, CoreAI

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

The CoreAI GPU Infrastructure team builds the foundational accelerated compute platforms that power large-scale AI training and inference across Azure. Our mission is to deliver secure, reliable, and highly efficient GPU infrastructure that enables multi-tenant AI systems at global scale while maximizing utilization, performance, and developer productivity. This role sits at the intersection of cloud infrastructure, systems software, virtualization, and container platforms, working closely with CoreAI, Azure Infrastructure, OS, Networking, and Hardware teams to deliver end-to-end platform capabilities.

Job Responsibility:

  • Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments
  • Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multi-tenant usage)
  • Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources
  • Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios
  • Optimize performance, reliability, and utilization across large GPU fleets, including scale-up and scale-out configurations
  • Partner with networking and storage teams to enable high-performance interconnects (e.g., RDMA/InfiniBand class networking) for distributed workloads
  • Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence
  • Influence platform architecture and technical direction across teams through design reviews and technical leadership

Requirements:

  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience
  • Proven ability to design and operate large-scale, production infrastructure with high reliability and performance requirements
  • Strong problem-solving skills and the ability to debug complex, cross-layer systems issues
  • Demonstrated technical leadership, including mentoring engineers and driving cross-team architectural alignment
  • Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries

Nice to have:

  • Familiarity with distributed training and inference stacks (e.g., NCCL-style collectives, model/data parallelism)
  • Experience in building or operating multi-tenant AI platforms in cloud environments
  • Familiarity with high-performance networking and low-latency communication stacks
  • Familiarity with GPU accelerated computing (e.g., CUDA, GPU drivers, device plugins, or runtime integration)
  • Familiarity with GPU virtualization, passthrough, or partitioning technologies
  • Knowledge of confidential computing, trusted execution environments, or hardware-backed isolation

Additional Information:

Job Posted:
March 25, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Software Engineer, CoreAI

Principal Software Engineer, Experimentation Platform - CoreAI

CoreAI sits at the center of Microsoft’s mission to redefine how software is bui...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Champion and improve AI tools and practices across the software development lifecycle (SDLC), incorporating appropriate controls over AI-generated assets
  • Lead by example across teams to produce extensible, maintainable, well-tested, secure, and performant code
  • identify and establish coding best practices, create and apply metrics to drive code quality and stability, and mentor engineers to continuously raise the engineering bar
  • Own and lead the architecture of complex product solutions, driving design discussions, evaluating new technologies to solve problems, and ensuring system architecture meets performance, scalability, resiliency and disaster recovery requirements
  • Lead cross-team collaboration to identify dependencies, negotiate delivery schedules, drive alignment across partner teams, and ensure proper end-to-end testing, live-site coverage, scalability and performance before going live
  • Drive engineering excellence across products
  • lead efforts targeting zero-touch deployment, production reliability, and security hardening for both protections and detections
  • Hold accountability as a designated responsible individual (DRI) across products and solutions, mentor engineers on live-site operations, lead incident retrospectives that drive systemic
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

The GenAI Infrastructure and Solutions team is building large-scale GenAI traini...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.
  • 6+ years designing, developing, and shipping high quality software.
  • 3+ years of experience with distributed systems and cloud-based infrastructure.
  • 2+ years of experience with containerization tools (e.g., Docker, Kubernetes).
  • 2+ years of experience with DevOps practices (CI/CD, automated testing, deployment, etc.).
  • Passionate and self-motivated. Strong ability in self-learning, entering new domain, managing through uncertainty in an innovative team environment.
  • Familiarity with virtualization technology.
  • Familiarity with production ML systems and concepts like model serving, caching, batching, and monitoring.
Job Responsibility
Job Responsibility
  • Lead the collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
  • Design, build and improve services with high scalability and reliability.
  • Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
  • Lead the efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
  • Contribute to the deployment and monitoring of services in production environments.
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Joining the CoreAI organization at Microsoft means becoming part of the team tha...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
Job Responsibility
Job Responsibility
  • Collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
  • Design, build and improve services with high scalability and reliability.
  • Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
  • Participate in efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
  • Contribute to the deployment and monitoring of services in production environments.
  • Fulltime
Read More
Arrow Right

Principal Software Engineer (CoreAI)

Core AI is at the forefront of Microsoft’s mission to redefine how software is b...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Strong advocate for AI-first development with hands-on experience leveraging AI tools to accelerate engineering productivity
  • 6+ years technical engineering experience designing and delivering highly available, large-scale cloud services and distributed systems
  • 1+ years of technical engineering experience with machine learning or Artificial Intelligence (AI) systems
Job Responsibility
Job Responsibility
  • Define technical vision and drive strategic direction for Microsoft Foundry's enterprise platform
  • Design and develop platform services including agent tooling frameworks, authentication flows, and control plane APIs
  • Deliver unified APIs and advanced security features (private networking, data encryption, cache resiliency)
  • Lead cross-org collaboration with partner teams across Azure, M365, and Security to deliver end-to-end solutions
  • Champion AI-first development practices and serve as a role model for leveraging AI to accelerate innovation and productivity
  • Drive livesite excellence through infrastructure reliability and operational excellence
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - CoreAI

The CoreAI organization at Microsoft builds the end-to-end Azure AI stack/PaaS a...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Lead architecture design, testing, and security compliance for products
  • Create design documents, oversee team efforts, and ensure test coverage, automation, and quality assurance
  • Address system dependencies and enable cross-team collaboration
  • Collaborate with stakeholders to identify user requirements, incorporate continuous feedback, and define critical metrics for product improvement and customer value
  • Mentor others in producing high-quality, maintainable code
  • Optimize, debug, and establish best practices
  • Conduct code reviews to ensure adherence to standards and resolve issues proactively using telemetry and diagnostics
  • Drive project planning, experimentation, and solution deployment
  • Optimize implementations to meet business objectives and ensure safe deployments while considering broader system impacts
  • Manage live service operations, resolve complex incidents, and create playbooks for issue resolution
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - Growth (CoreAI)

We’re building AI‑first growth and experimentation systems that scale across Mic...
Location
Location
United States , Mountain View
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own growth through engineering excellence and experimentation — at a systems level
  • Architect and build paved paths for online experimentation: standardized metrics, guardrails, analysis workflows, and rollout automation that improve reliability and decision quality across teams
  • Lead multi‑workstream initiatives that span teams/products (e.g., unified growth measurement, cross‑surface funnels, experimentation quality improvements)
  • Build and evolve core capabilities: telemetry foundations, experiment assignment/targeting, feature flighting, and risk controls (kill‑switches, guardrails, progressive delivery)
  • Partner with Product, Data Science, Design, and Research to turn ambiguous goals into shippable, measurable systems
  • Stay close to the work: write production code, review designs/PRs, and coach others through architecture and implementation tradeoffs
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Principal Software Architect - CoreAI

Are you passionate about making Artificial Intelligence (AI) systems security, t...
Location
Location
United States , Multiple Locations
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical discipline or equivalent experience AND 8+ years of technical engineering experience with coding in languages including, but not limited to, C++, C#, Go, Java, or Python
  • 6+ years technical engineering experience designing and delivering highly available, large-scale cloud services and distributed systems
  • Experience designing AI powered products and services
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Architect, design, and develop large-scale distributed cloud services and solutions with a focus on high availability, scalability, robustness, and observability
  • Lead project development across the organization and work with subject matter experts and stakeholders to drive development and release plans
  • Evaluate alternative architectures and technologies that best fit the business requirements and service KPIs
  • Take end-to-end responsibility for the development lifecycle and production readiness of the services you build and drive the team’s DevOps culture
  • Drive and uphold the best practices of modern software engineering through code and design reviews and take effective service decisions based on data and telemetry
  • Understand Microsoft businesses and collaborate with stakeholders towards cohesive, end-to-end experiences for Microsoft customers
  • Embrace a growth mindset and stay up to date with the current and state-of-the-art technologies to improve customer experience and better serve the product’s business needs
  • Fulltime
Read More
Arrow Right