Principal Software Engineer, CoreAI Job at Microsoft Corporation (Redmond)

Principal Software Engineer, CoreAI

Core AI is at the forefront of Microsoft’s mission to redefine how software is b...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years

Job Responsibility

Shape the Product Vision: Define and influence the product roadmap by aligning technical strategy with business goals and customer needs
Drive Strategic Clarity: Leverage data-driven insights and competitive intelligence to inform product direction, identify opportunities, and guide decision-making
Architect for Scale and Sustainability: Design and evolve durable, scalable system architectures that balance long-term maintainability with short-term delivery needs, making thoughtful engineering trade-offs
Foster Engineering Alignment: Work with the engineering teams and partner organizations by driving clarity, alignment, and shared ownership of technical direction
Deliver Cohesive End-to-End Experiences: Collaborate closely with partner teams—including experience, SDK, and platform groups—to ensure seamless integration and delivery of features across the stack
Build Foundational Capabilities: Contribute to and lead the development of core platform components and reusable building blocks that accelerate team velocity and product innovation
Champion Customer-Centric Development: Engage directly with customers and product teams to capture feedback, understand demand signals, and refine product messaging—ensuring the voice of the customer shapes product evolution
Lead Live Site Excellence: Drive operational excellence in managing and operating large-scale distributed systems with a high bar for service-level agreements (SLAs). Lead root cause analyses (RCAs) for key live site incidents and outages, identify systemic improvements, and set high standards for reliability and performance

Fulltime

Principal Software Engineer, CoreAI

The CoreAI GPU Infrastructure team builds the foundational accelerated compute p...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience
Proven ability to design and operate large-scale, production infrastructure with high reliability and performance requirements
Strong problem-solving skills and the ability to debug complex, cross-layer systems issues
Demonstrated technical leadership, including mentoring engineers and driving cross-team architectural alignment
Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes)
Strong collaboration and communication skills, with the ability to work across organizational boundaries

Job Responsibility

Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments
Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multi-tenant usage)
Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources
Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios
Optimize performance, reliability, and utilization across large GPU fleets, including scale-up and scale-out configurations
Partner with networking and storage teams to enable high-performance interconnects (e.g., RDMA/InfiniBand class networking) for distributed workloads
Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence
Influence platform architecture and technical direction across teams through design reviews and technical leadership

Fulltime

Principal Software Engineer, CoreAI

Joining the CoreAI organization at Microsoft means becoming part of the team tha...

Location

United States , Multiple Locations

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years

Job Responsibility

Collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
Design, build and improve services with high scalability and reliability.
Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
Participate in efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
Contribute to the deployment and monitoring of services in production environments.

Fulltime

Principal Software Engineer, CoreAI

The GenAI Infrastructure and Solutions team is building large-scale GenAI traini...

Location

United States , Redmond

Salary:

163000.00 - 296400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.
6+ years designing, developing, and shipping high quality software.
3+ years of experience with distributed systems and cloud-based infrastructure.
2+ years of experience with containerization tools (e.g., Docker, Kubernetes).
2+ years of experience with DevOps practices (CI/CD, automated testing, deployment, etc.).
Passionate and self-motivated. Strong ability in self-learning, entering new domain, managing through uncertainty in an innovative team environment.
Familiarity with virtualization technology.
Familiarity with production ML systems and concepts like model serving, caching, batching, and monitoring.

Job Responsibility

Lead the collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
Design, build and improve services with high scalability and reliability.
Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
Lead the efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
Contribute to the deployment and monitoring of services in production environments.

Fulltime

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

6+ years of experience in systems programming with strong expertise in C++
Proven experience building, deploying, and operating scalable cloud services
Strong debugging skills and experience using performance profiling and diagnostic tools
Hands-on experience with distributed systems, Kubernetes, and containerized workloads
Experience with largescale LLM inferencing infrastructure, including CUDA
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Design and implement high performance microservices and runtime components in C++
Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
Debug and resolve complex production issues related to performance, scaling, and service reliability
Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
Drive systems level innovations for realtime and batch inferencing efficiency
Participate in code reviews and provide technical mentorship to senior and peer engineers

Fulltime

Principal Software Engineer - CoreAI

At CoreAI, we empower developers and organizations to shape the future with Arti...

Location

United States , Redmond

Salary:

142800.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check
5+ years of experience leading software engineering and analytics projects that delivered measurable product and growth wins
Deep experience architecting and operating large scale data pipelines in cloud environment, preferably Azure
Ability to write clean, working code using core algorithms, data structures, and analytics-oriented problem-solving
Understanding of data governance, privacy, lineage, and security best practices, especially within highly regulated or enterprise environments
Excellent communication skills to convey complex technical concepts to both technical and non-technical audiences
Experience using AI tools in software engineering, data science, and analytics workflows
Experience both prototyping and deploying data products

Job Responsibility

Leads by example and mentors others to produce extensible and maintainable code used across the company
Leverages deep subject-matter expertise of cross-product features with appropriate stakeholders to lead multiple product's project plans, release plans, and work items
Own and define end-to-end data and analytics architecture for CoreAI and Foundry platforms, setting long-term technical direction for scalable, reliable, and cost-effective analytics supporting AI workloads
Design, build, and optimize large-scale, robust data pipelines and architectures that support CoreAI's analytics initiatives
Data Governance & Trust: follow best practices for data quality, lineage, security, and compliance
Collaborate with stakeholders to define trustworthy data sets and implement rigorous data validation protocols, ensuring CoreAI's analytics are both accurate and auditable
Analytics Enablement: Partner with data scientists, analysts, and business leaders to translate business needs into technical solutions
Enable self-service analytics and empower teams by building data models, semantic layers, and tools that streamline access to trusted information
Cross-Functional Collaboration: Work closely with product managers, software engineers, AI researchers, and business stakeholders to align data solutions with business goals
Contribute actively to the infrastructure and culture needed to scale quantity and quality of data insights across CoreAI

Fulltime

Principal Software Engineer, CoreAI FIT Agentic Systems

Joining the CoreAI organization at Microsoft means becoming part of the team tha...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
Experience in distributed computing and architecture, and/or developing and operating high scale, reliable online services

Job Responsibility

Engage directly with key partners to understand and implement complex inferencing and agentic capabilities for Microsoft Copilot and other Microsoft products and Azure services
Design and implement API orchestration layer by leveraging OpenAI models, tools and capabilities
Work on cutting edge agentic platforms and automate and solve real-world problems with latest and greatest reasoning AI models
Work with cutting edge hardware stacks and a fast-moving software stack to deliver best of class inference and optimal cost
Anticipate, identify, assess, track, and mitigate project risks and issues in a fast-paced start up like environment
Motivated to build constructive and effective relationships and solve problems collaboratively
Support production inference SLAs for core AI scenarios on one of the largest GPU fleets in the world

Fulltime

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

139900.00 - 331200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
Strong problem-solving skills and the ability to debug complex, cross layer systems issues
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
Strong collaboration and communication skills, with the ability to work across organizational boundaries

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability

Fulltime

Select Country

Principal Software Engineer, CoreAI

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Principal Software Engineer, CoreAI

Principal Software Engineer, CoreAI

Principal Software Engineer, CoreAI

Principal Software Engineer, CoreAI

Principal Software Engineer, CoreAI

Principal Software Engineer, CoreAI

Principal Software Engineer - CoreAI

Principal Software Engineer, CoreAI FIT Agentic Systems

Principal Software Engineer, CoreAI Workload Engines

Our AI answers in your language