CrawlJobs Logo

Principal Software Engineer, CoreAI Workload Engines

United States, Redmond Employment contract 139900.00 - 331200.00 USD / Year · Job Posted April 23, 2026
Apply Position
Job Link Share

Job Description

The CoreAI Workloads team builds the foundational inference engines and APIs that power largescale AI inference across Azure - from cutting-edge startups to Fortune 500 enterprises and Microsoft Copilots and agents. Our mission is to deliver secure, reliable, and highly efficient GPU inference that enable multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity. We own inference serving and performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI serving some of the largest workloads on the planet with trillions of inferences per day. Our converged AI fabric and engines deliver inference capabilities for all LLMs in Microsoft catalog, including OpenAI, Anthropic, Mistral, Cohere, Llama, and more.

Job Responsibility

  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability
  • Work across multiple layers of the AI software stack (abstractions, programming models, engine runtimes, libraries, and APIs) to enable large-scale model inference
  • Benchmark OpenAI and other LLMs for performance across Azure OpenAI Service workload tiers and segments, and translate results into production improvements
  • Debug, profile, and optimize production inference performance across the stack (abstractions, runtime, scheduling, and serving pipelines) to improve latency, throughput, and utilization
  • Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint
  • Collaborate across engineering teams to deliver scalable, production-ready serving efficiency and availability improvements, using experimentation results to guide prioritization and rollout
  • Build durable engine interfaces that enable fast experimentation and safe shipping of new strategies for class of service (QoS), replica load balancing, KV management (including offload/retrieval), quantization, and sampling (e.g., multi-token prediction and constrained sampling)

Requirements

  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries

Nice to have

  • Experience optimizing LLM inference in practice (e.g., PyTorch inference, serving runtimes, model execution, or inference orchestration) in production environments
  • Familiarity with high performance networking and low latency communication stacks
  • Familiarity with GPU-accelerated inference stacks (e.g., CUDA at the application/runtime level, device plugins, or runtime integration)
  • Experience building or using experimentation systems (A/B, canarying, tiered rollout), including metric definition and comparability for performance and reliability
  • Familiarity with distributed inference stacks (e.g., NCCL-style collectives, model/tensor parallelism) and performance tradeoffs in large-scale serving

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Principal Software Engineer, CoreAI Workload Engines

8 matching positions

New

Principal Software Engineer - CoreAI

At CoreAI, we empower developers and organizations to shape the future with Arti...
Location
Location
United States , Redmond
Salary
Salary:
142800.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 5+ years of experience leading software engineering and analytics projects that delivered measurable product and growth wins
  • Deep experience architecting and operating large scale data pipelines in cloud environment, preferably Azure
  • Ability to write clean, working code using core algorithms, data structures, and analytics-oriented problem-solving
  • Understanding of data governance, privacy, lineage, and security best practices, especially within highly regulated or enterprise environments
  • Excellent communication skills to convey complex technical concepts to both technical and non-technical audiences
  • Experience using AI tools in software engineering, data science, and analytics workflows
  • Experience both prototyping and deploying data products
Job Responsibility
Job Responsibility
  • Leads by example and mentors others to produce extensible and maintainable code used across the company
  • Leverages deep subject-matter expertise of cross-product features with appropriate stakeholders to lead multiple product's project plans, release plans, and work items
  • Own and define end-to-end data and analytics architecture for CoreAI and Foundry platforms, setting long-term technical direction for scalable, reliable, and cost-effective analytics supporting AI workloads
  • Design, build, and optimize large-scale, robust data pipelines and architectures that support CoreAI's analytics initiatives
  • Data Governance & Trust: follow best practices for data quality, lineage, security, and compliance
  • Collaborate with stakeholders to define trustworthy data sets and implement rigorous data validation protocols, ensuring CoreAI's analytics are both accurate and auditable
  • Analytics Enablement: Partner with data scientists, analysts, and business leaders to translate business needs into technical solutions
  • Enable self-service analytics and empower teams by building data models, semantic layers, and tools that streamline access to trusted information
  • Cross-Functional Collaboration: Work closely with product managers, software engineers, AI researchers, and business stakeholders to align data solutions with business goals
  • Contribute actively to the infrastructure and culture needed to scale quantity and quality of data insights across CoreAI
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

The CoreAI GPU Infrastructure team builds the foundational accelerated compute p...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience
  • Proven ability to design and operate large-scale, production infrastructure with high reliability and performance requirements
  • Strong problem-solving skills and the ability to debug complex, cross-layer systems issues
  • Demonstrated technical leadership, including mentoring engineers and driving cross-team architectural alignment
  • Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments
  • Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multi-tenant usage)
  • Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources
  • Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios
  • Optimize performance, reliability, and utilization across large GPU fleets, including scale-up and scale-out configurations
  • Partner with networking and storage teams to enable high-performance interconnects (e.g., RDMA/InfiniBand class networking) for distributed workloads
  • Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence
  • Influence platform architecture and technical direction across teams through design reviews and technical leadership
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Principal Product Manager/Architect - Foundry Inference Platform (CoreAI)

We are seeking a Principal Product Manager/Architect to define and guide the tec...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 10+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • 1. Product Reliability: Own the product direction for Microsoft Foundry inference, with a primary mandate to make the platform the most reliable enterprise inferencing service available. This includes defining architectural standards for global serving, multi-region resiliency, automated failover, and platform-managed disaster recovery
  • Drive architectural alignment across global routing, capacity pooling, observability, and control plane abstractions to ensure consistent availability, predictable recovery behavior, and simplified customer operations at scale
  • Partner with engineering, infrastructure, and security leaders to ensure reliability targets, SLAs, SLOs and recovery objectives are designed into the platform by default
  • 2. GPU Fleet Efficiency & Capacity: Set the product direction for GPU fleet efficiency and capacity management, guiding platform-level design decisions that maximize utilization, minimize fragmentation, and accelerate timetomonetization of new hardware and models
  • This includes shaping the architecture for global capacity pooling, intelligent scheduling, fungibility across workloads, automated demand forecasting, and softwaredefined allocation
  • The Product Manager/Architect is expected to influence architectural investments across inference utilization, model serving, and hardware/system performance
  • 3. Strategic Customer & Innovation Engagement: Act as a senior technical advisor and architect for Foundry’s most innovative and strategic customers
  • Engage directly with customers on deep technical challenges, including largescale model migrations, reliabilitysensitive production deployments, and advanced serving architectures
  • Support competitive and strategic initiatives by articulating Foundry’s architectural advantages, turning bespoke requests into scalable features
  • 4. Cross-Company Technical Leadership: Serve as a unifying architectural voice across product management, engineering, infrastructure, and partner teams
  • Fulltime
Read More
Arrow Right

Principal Product Manager

At Microsoft, we are building the world’s most trusted and developer‑centric AI ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering, computer science, or a related technical field
  • Significant experience (typically 8–12+ years) in product management or software engineering with substantial product ownership, including experience working on platform or infrastructure products
  • Demonstrated ability to operate effectively in large, ambiguous, multi‑team environments with shared ownership and complex dependencies
  • Strong technical depth in cloud platforms, distributed systems, or AI/ML infrastructure, with the ability to engage credibly with senior engineers and architects
  • Proven track record of influencing strategy, driving alignment, and delivering outcomes through collaboration rather than direct authority
  • Strong analytical and systems‑thinking skills, with experience making high‑quality decisions in fast‑moving, evolving problem spaces
Job Responsibility
Job Responsibility
  • Act as a senior contributor to platform strategy for Azure AI Foundry and Azure ML, helping shape multi-year investments across model training, customization, deployment, and lifecycle management
  • Drive alignment and progress across federated, cross-organizational initiatives, working with peer Principal PMs and multiple engineering teams on shared platform outcomes
  • Contribute to the definition and evolution of high-leverage platform abstractions (APIs, SDKs, workflows) that enable scalable adoption of GenAI and custom code training workloads
  • Partner closely with senior engineering leaders to influence architectural direction, surface trade-offs, and ensure platform capabilities meet scale, reliability, and security expectations
  • Engage with strategic customers and internal stakeholders to gather insights, validate requirements, and translate learnings into durable, reusable platform capabilities
  • Use data, metrics, and experimentation to evaluate impact and inform product decisions across shared ownership areas
  • Serve as a thought leader and mentor within CoreAI, elevating product craft, platform thinking, and responsible AI practices across the organization
  • Fulltime
Read More
Arrow Right
New

Senior Manager, Claims Workforce Management

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
67900.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
July 12, 2026
Flip Icon
Requirements
Requirements
  • Minimum 7 years of experience in workforce management, operational analytics, or related roles within healthcare, insurance, or complex operational environments
  • Demonstrated expertise in forecasting, capacity planning, and workforce modeling
  • Strong analytical and problem-solving skills with the ability to translate data into executive-level insights and recommendations
  • Proven ability to operate independently with a high level of ownership and accountability
  • Strong communication skills and experience influencing leaders without direct authority
  • Advanced proficiency with workforce management tools, reporting platforms, and data analysis techniques
  • Experience partnering with senior leadership on staffing strategy and operational planning
  • Experience designing or evolving workforce management operating models
  • Strong business acumen with the ability to balance service, quality, cost, and compliance considerations
  • Bachelor’s degree preferred or equivalent combination of relevant experience, training, and professional development
Job Responsibility
Job Responsibility
  • Provides strategic ownership of Claims workforce management by leveraging deep analytical expertise to forecast demand, develop capacity and staffing models, and optimize workforce utilization
  • Serves as the primary subject matter expert for Claims Workforce Management (WFM), partnering closely with senior leadership and cross‑functional stakeholders to support operational decision-making, performance outcomes, and scalability
  • Leads workforce planning initiatives, drives process improvements, and delivers actionable insights to ensure claims operations are staffed efficiently, consistently, and in alignment with business objectives
  • Owns end-to-end workforce management for Claims operations, including forecasting, capacity planning, staffing models, and resource optimization across multiple work areas
  • Analyzes and interprets complex operational, volume, and productivity data to develop actionable workforce strategies that support claims performance, service levels, and financial targets
  • Develops demand forecasts and staffing models using historical data, trend analysis, and scenario modeling
  • provides insights and recommendations to senior leadership
  • Serves as the primary owner of staffing assumptions, workforce modeling, and capacity planning for Claims, ensuring alignment with operational strategy and business priorities
  • Leverages workforce management tools, statistical models, and analytics to evaluate demand variability, staffing risk, and operational scenarios
  • Partners closely with Claims leadership, Finance, HR, and Operational Excellence teams to align workforce strategies with hiring plans, training timelines, and productivity assumptions
What we offer
What we offer
  • medical, dental, and vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • bonus, commission or short-term incentive program
  • equity award program
  • Fulltime
Read More
Arrow Right
New

Representative, Specialty Pickup Retail Recovery

The role of a SPAR Recovery representative is to assist with reconciling outstan...
Location
Location
United States , Monroeville
Salary
Salary:
17.00 - 28.46 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
July 31, 2026
Flip Icon
Requirements
Requirements
  • At least one year of experience in customer service or healthcare services, with phone experience
  • Six months computer experience in a Windows based system
  • Excellent communication, organizational and interpersonal skills
  • Attention to details and accuracy
  • Ability to lift up to 50 pounds
  • PA Pharmacy Technician State License
  • High school diploma or equivalent
Job Responsibility
Job Responsibility
  • Assist with reconciling outstanding SPAR orders at retail by ensuring that they are either returned to CVS specialty or confirming that they have been delivered successfully to the patient
  • Making outbound/taking inbound calls to/from retail stores and/or patients to initiate the timely return of medications which are no longer needed by patients, or those medications which have not been picked up
  • Working with retail colleagues to locate and check in High Touch/ High-Cost orders
  • Printing, packing, and shipping supply requests to retail stores, patients, and prescriber offices
What we offer
What we offer
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Bonus, commission or short-term incentive program
  • Fulltime
Read More
Arrow Right
New

Embedded Software Engineer - Electrification

Our Mission: At General Motors, our product teams are redefining mobility. Throu...
Location
Location
United States , Milford
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS in Computer Science, Computer Engineering, Electrical Engineering or other applicable Engineering focuses
  • 2+ years of Embedded C software development experience
  • Experience developing, reading, and debugging source code in C, C++, Python
  • Understanding of one or more of the following: Batteries, Invertors, Supervisory Controls, or Electric Motors
  • Strong tools background in MATLAB/Simulink, DOORS, Git/Jira, and related GM controls toolchains is expected
  • Unit testing, SIL, HIL, bench, and vehicle testing
Job Responsibility
Job Responsibility
  • Create software for battery management, inverter, and electric motor system functions and perform integration and verification testing with minimal direction of lead engineers
  • Deliver scalable and modular software across all customers to enable a single software stream delivery
  • Document requirements for design solutions and link them to test cases that can demonstrate software functionality
  • Utilize automated test tools in build environments, benches, and products, to verify functionality of the feature
  • Analyze software defects
  • determine root cause, create software solution, test and verify closure
  • Performs design and analysis on changes
  • Diagnose, debug and solve issues related to battery, inverter, and electric motor hardware and software
  • Work with teams from multiple groups to meet project milestones
  • Develop test cases and write comprehensive test plans to assess software products at different system levels
What we offer
What we offer
  • competitive compensation
  • growth opportunities
  • Bonus Potential
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • Fulltime
Read More
Arrow Right