CrawlJobs Logo

Principal Software Engineer, CoreAI Workload Engines

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

139900.00 - 331200.00 USD / Year

Job Description:

The CoreAI Workloads team builds the foundational inference engines and APIs that power largescale AI inference across Azure - from cutting-edge startups to Fortune 500 enterprises and Microsoft Copilots and agents. Our mission is to deliver secure, reliable, and highly efficient GPU inference that enable multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity. We own inference serving and performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI serving some of the largest workloads on the planet with trillions of inferences per day. Our converged AI fabric and engines deliver inference capabilities for all LLMs in Microsoft catalog, including OpenAI, Anthropic, Mistral, Cohere, Llama, and more.

Job Responsibility:

  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability
  • Work across multiple layers of the AI software stack (abstractions, programming models, engine runtimes, libraries, and APIs) to enable large-scale model inference
  • Benchmark OpenAI and other LLMs for performance across Azure OpenAI Service workload tiers and segments, and translate results into production improvements
  • Debug, profile, and optimize production inference performance across the stack (abstractions, runtime, scheduling, and serving pipelines) to improve latency, throughput, and utilization
  • Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint
  • Collaborate across engineering teams to deliver scalable, production-ready serving efficiency and availability improvements, using experimentation results to guide prioritization and rollout
  • Build durable engine interfaces that enable fast experimentation and safe shipping of new strategies for class of service (QoS), replica load balancing, KV management (including offload/retrieval), quantization, and sampling (e.g., multi-token prediction and constrained sampling)

Requirements:

  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries

Nice to have:

  • Experience optimizing LLM inference in practice (e.g., PyTorch inference, serving runtimes, model execution, or inference orchestration) in production environments
  • Familiarity with high performance networking and low latency communication stacks
  • Familiarity with GPU-accelerated inference stacks (e.g., CUDA at the application/runtime level, device plugins, or runtime integration)
  • Experience building or using experimentation systems (A/B, canarying, tiered rollout), including metric definition and comparability for performance and reliability
  • Familiarity with distributed inference stacks (e.g., NCCL-style collectives, model/tensor parallelism) and performance tradeoffs in large-scale serving

Additional Information:

Job Posted:
April 23, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Software Engineer, CoreAI Workload Engines

Principal Software Engineer, CoreAI

The CoreAI GPU Infrastructure team builds the foundational accelerated compute p...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience
  • Proven ability to design and operate large-scale, production infrastructure with high reliability and performance requirements
  • Strong problem-solving skills and the ability to debug complex, cross-layer systems issues
  • Demonstrated technical leadership, including mentoring engineers and driving cross-team architectural alignment
  • Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments
  • Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multi-tenant usage)
  • Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources
  • Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios
  • Optimize performance, reliability, and utilization across large GPU fleets, including scale-up and scale-out configurations
  • Partner with networking and storage teams to enable high-performance interconnects (e.g., RDMA/InfiniBand class networking) for distributed workloads
  • Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence
  • Influence platform architecture and technical direction across teams through design reviews and technical leadership
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Principal Product Manager

At Microsoft, we are building the world’s most trusted and developer‑centric AI ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering, computer science, or a related technical field
  • Significant experience (typically 8–12+ years) in product management or software engineering with substantial product ownership, including experience working on platform or infrastructure products
  • Demonstrated ability to operate effectively in large, ambiguous, multi‑team environments with shared ownership and complex dependencies
  • Strong technical depth in cloud platforms, distributed systems, or AI/ML infrastructure, with the ability to engage credibly with senior engineers and architects
  • Proven track record of influencing strategy, driving alignment, and delivering outcomes through collaboration rather than direct authority
  • Strong analytical and systems‑thinking skills, with experience making high‑quality decisions in fast‑moving, evolving problem spaces
Job Responsibility
Job Responsibility
  • Act as a senior contributor to platform strategy for Azure AI Foundry and Azure ML, helping shape multi-year investments across model training, customization, deployment, and lifecycle management
  • Drive alignment and progress across federated, cross-organizational initiatives, working with peer Principal PMs and multiple engineering teams on shared platform outcomes
  • Contribute to the definition and evolution of high-leverage platform abstractions (APIs, SDKs, workflows) that enable scalable adoption of GenAI and custom code training workloads
  • Partner closely with senior engineering leaders to influence architectural direction, surface trade-offs, and ensure platform capabilities meet scale, reliability, and security expectations
  • Engage with strategic customers and internal stakeholders to gather insights, validate requirements, and translate learnings into durable, reusable platform capabilities
  • Use data, metrics, and experimentation to evaluate impact and inform product decisions across shared ownership areas
  • Serve as a thought leader and mentor within CoreAI, elevating product craft, platform thinking, and responsible AI practices across the organization
  • Fulltime
Read More
Arrow Right
New

Store Manager in Training

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Atlanta
Salary
Salary:
18.50 - 23.88 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 14, 2026
Flip Icon
Requirements
Requirements
  • Communicate well verbally and in writing to support and lead your team
  • Perform customer care duties to provide high levels of service
  • Execute merchandising strategies to support store sales growth
  • Manage the store inventory and assets to maintain profitability
  • Actively engage with your leader in Key Learning Experiences to support you on your path to promotion during the structured program
  • Support your store as management team member and lead in a manner that is consistent with CVS values and policies
  • Engage your colleagues in support of the company's purpose of helping people on their path to better health
  • Be willing to accept promotion roles with the market that you work in
  • Willingness to accept a promotion to Store Manager role at any location in the designated market
  • Ability to transfer to other CVS Pharmacy stores located within the designated market
What we offer
What we offer
  • medical
  • dental
  • vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • Fulltime
Read More
Arrow Right
New

Store Manager

Our Store Managers are inspirational role models who aspire to continuously impr...
Location
Location
United Kingdom , Stevenage
Salary
Salary:
Not provided
ernestjones.co.uk Logo
Ernest Jones
Expiration Date
April 27, 2026
Flip Icon
Requirements
Requirements
  • An excellent track record of successfully achieving KPI's
  • Experience of holding accountability for managing the commercial performance of a store
  • Experience of leading store teams
  • Experience of target-setting and interpreting business data
  • An ability to inspire, coach and develop your team to maximise potential and performance
  • Proven track record of role modelling and consistently delivering amazing customer experiences
  • Clear passion for our products and ability to put yourself in our customers shoes
  • A competitive spirit
  • Basic knowledge of legislative responsibilities e.g. recruitment, employment law, health and safety, trading standards
Job Responsibility
Job Responsibility
  • Achieving the store's targets through promoting the sales culture instore, building and inspiring a strong customer-first team and by ensuring compliance with company policies and procedures
  • Role modelling selling skills, coaching and inspiring the team through great leadership and adopting a proactive approach to sales management
  • Consistent compliance with company instructions, policies and processes
  • Control of costs, payroll, shrinkage and expenses for the store
  • Planning and organisation of operational activities within the store
What we offer
What we offer
  • Competitive salary including sales incentives
  • Generous discount of up to 30% off our fabulous products from day one
  • An annual enhanced discount to celebrate the day you joined our team
  • Retirement Savings plans which offer flexibility in the way you save for the future
  • Immediate Life Assurance from day one
  • A minimum of 33 days holiday per year
  • Recognised qualifications, study support and structured career progression
  • Health and Wellbeing Scheme
  • Financial Wellbeing scheme
  • Give As You Earn scheme
  • Fulltime
!
Read More
Arrow Right
New

HR Manager

Our client is a well-established organization within the engineering and project...
Location
Location
Malaysia , Kuala Lumpur
Salary
Salary:
10000.00 - 15000.00 MYR / Month
https://www.randstad.com Logo
Randstad
Expiration Date
May 25, 2026
Flip Icon
Requirements
Requirements
  • Minimum 7–10 years of experience in Human Resources, with strong focus on industrial relations (IR)
  • Proven experience handling complex employee relations and union-related matters
  • Strong knowledge of Malaysian employment law and IR practices
  • Experience in performance management systems and compensation benchmarking
  • Excellent communication, negotiation, and conflict resolution skills
  • Analytical and strategic mindset with strong decision-making capability
  • Prior experience in construction or engineering industry is an added advantage.
Job Responsibility
Job Responsibility
  • Lead all aspects of Human Resources, including employee relations, HR operations, and organizational development
  • Act as the primary point of contact for industrial relations (IR) matters including grievance handling, dispute resolution, and union engagement
  • Ensure full compliance with Malaysian labor laws, regulations, and internal policies
  • Develop and implement strategies to maintain a harmonious and productive workplace
  • Design and manage performance management systems, including KPIs, appraisals, and feedback frameworks
  • Conduct salary benchmarking and compensation analysis to ensure competitive and equitable pay structures
  • Oversee recruitment, onboarding, and workforce planning initiatives
  • Develop, review, and enhance HR policies, SOPs, and governance frameworks
  • Drive HR initiatives to improve employee engagement and organizational effectiveness
  • Fulltime
Read More
Arrow Right
New

Learning and Content Development Lead

The Opportunity We are looking for a Health & Safety Learning and Content Devel...
Location
Location
Australia , Melbourne
Salary
Salary:
135000.00 - 145000.00 AUD / Year
https://www.randstad.com Logo
Randstad
Expiration Date
April 27, 2026
Flip Icon
Requirements
Requirements
  • Instructional Design Expertise: Experience building high-quality learning content, ideally within a technical or safety-focused environment
  • Tech: Proficiency with e-learning authoring tools (e.g., Articulate 360, Adobe Captivate) and understanding of LMS management
  • Subject Matter: A solid grasp of health and safety principles and the ability to translate legislative requirements into practical actions
  • Collaboration: The ability to partner with diverse stakeholders to drive a proactive, learning-focused safety culture
  • Must hold (or be eligible for) a valid Working with Children Check
  • Instructional Design
  • Articulate
  • LMS Management
  • OHS Legislation
  • Compliance
Job Responsibility
Job Responsibility
  • Learning Strategy: Lead the end-to-end design and evaluation of health and safety learning programs, including e-learning, blended workshops, and micro-learning
  • Content Creation: Use adult learning principles to translate complex technical information into user friendly toolkits, checklists, and visual guides
  • Innovation: Manage the deployment, testing, and performance tracking of learning modules across digital platforms and LMS systems
  • Partnership: Collaborate with subject matter experts and leadership to identify capability gaps
  • Continuous Improvement: Maintain version control and ensure all content remains compliant and industry best practices
  • Fulltime
!
Read More
Arrow Right
New

Engineer II, Quality

As an innovation leader, we look for ambitious, forward-thinking, open-minded an...
Location
Location
Mexico , Atlacomulco
Salary
Salary:
Not provided
bourgogne.msa.fr Logo
MSA BOURGOGNE
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2-4 years of experience
  • Typically requires a college or university degree or the equivalent work experience that provides exposure to fundamental theories, principles and concepts
Job Responsibility
Job Responsibility
  • Will implement process sampling systems, procedures, and statistical techniques (when required) and will design or specify inspection and testing mechanisms and equipment
  • Will analyze production process limitations and standards, and work with Technical Services to recommend revision of specifications when indicated
  • Assists in formulating quality policies and procedures and will develop the required quality activities
  • Will have direct interface with the Customer-internal and external-and the plating supplier
  • Develops broad knowledge and skills in a specific practice area
  • Works on small projects or portions of larger projects
  • Performs moderate design tasks
  • Prepares portions of project documents
  • Edits specifications
  • Performs research and investigations
  • Fulltime
Read More
Arrow Right