CrawlJobs Logo

Senior Software Engineer, CoreAI Workload Engines

United States, Redmond Employment contract 119800.00 USD / Year · Job Posted April 23, 2026
Apply Position
Job Link Share

Job Description

The CoreAI Workloads team builds the foundational inference engines and APIs that power largescale AI inference across Azure - from cutting-edge startups to Fortune 500 enterprises and Microsoft Copilots and agents. Our mission is to deliver secure, reliable, and highly efficient GPU inference that enable multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity. We own inference serving and performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI serving some of the largest workloads on the planet with trillions of inferences per day. Our converged AI fabric and engines deliver inference capabilities for all LLMs in Microsoft catalog, including OpenAI, Anthropic, Mistral, Cohere, Llama, and more.

Job Responsibility

  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
  • Work across multiple layers of the AI software stack (abstractions, programming models, engine runtimes, libraries, and APIs) to enable large-scale model inference.
  • Benchmark OpenAI and other LLMs for performance across Azure OpenAI Service workload tiers and segments, and translate results into production improvements.
  • Debug, profile, and optimize production inference performance across the stack (abstractions, runtime, scheduling, and serving pipelines) to improve latency, throughput, and utilization.
  • Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint.
  • Collaborate across engineering teams to deliver scalable, production-ready serving efficiency and availability improvements, using experimentation results to guide prioritization and rollout.
  • Build durable engine interfaces that enable fast experimentation and safe shipping of new strategies for class of service (QoS), replica load balancing, KV management (including offload/retrieval), quantization, and sampling (e.g., multi-token prediction and constrained sampling).

Requirements

  • Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries.

Nice to have

  • Experience optimizing LLM inference in practice (e.g., PyTorch inference, serving runtimes, model execution, or inference orchestration) in production environments.
  • Familiarity with high performance networking and low latency communication stacks.
  • Familiarity with GPU-accelerated inference stacks (e.g., CUDA at the application/runtime level, device plugins, or runtime integration).
  • Experience building or using experimentation systems (A/B, canarying, tiered rollout), including metric definition and comparability for performance and reliability.
  • Familiarity with distributed inference stacks (e.g., NCCL-style collectives, model/tensor parallelism) and performance tradeoffs in large-scale serving.

What we offer

  • Benefits and other compensation

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Software Engineer, CoreAI Workload Engines

8 matching positions

Senior Software Engineer - CoreAI

Are you passionate about building high-performance, low-latency systems that pow...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years of experience with cloud platforms such as Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP)
  • 1+ years of proficiency with AI-assisted development tools (e.g., GitHub Copilot, IntelliCode)
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own and operate highly scalable, reliable, and low-latency distributed systems that power mission-critical workloads
  • Design and implement features that enable configuration management, monitoring, analytics, and observability for modern cloud applications
  • Drive integration with other Azure services to deliver seamless customer experiences
  • Write high-quality, well-tested code and own the DevOps lifecycle, including monitoring, alerting, and incident response
  • Integrate AI-assisted development tools to improve engineering productivity and code quality
  • Contribute to AI-enhanced features using technologies such as Large Language Models (LLMs), Model Context Protocol (MCP) servers, and Retrieval-Augmented Generation (RAG)
  • Mentor others, fostering technical growth, promoting engineering best practices, and leading initiatives to enhance team capabilities and collaboration
  • MS Culture & Values: Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Senior Product Manager - CoreAI

The Azure Managed Redis Product Management team defines the vision, strategy, an...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 5+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Technical expertise in distributed systems or databases – e.g. experience building or managing services in cloud infrastructure, high-scale web/enterprise software, or data storage systems
  • Proven success in driving product strategy and execution
  • Collaboration & leadership skills
  • Customer focus and data-driven mindset
  • Proven coding/vibe coding experiences to quickly build proof of concepts or demos to accelerate time to market
  • Domain experience with Redis (open-source or commercial), Azure Cache for Redis, or similar caching and NoSQL technologies
  • Experience with AI/ML workloads or data platforms
Job Responsibility
Job Responsibility
  • Articulate and drive the vision for Azure Managed Redis – focusing on performance, scalability, and seamless AI integration
  • Bet boldly on AI patterns (RAG, agent memory, vector indexing, semantic caching)
  • convert signals into roadmap decisions that deliver measurable value
  • Drive the roadmap end-to-end—prioritize AI first features while holding the line on reliability, compliance, and security
  • Lead GTM through influence—shape AI centric positioning and messaging, unblock execution across Product Marketing and Engineering
  • Cocreate with Engineering (Microsoft & Redis Inc.)
  • Build with customers and partners—pressure test AI and caching scenarios, validate fit, and translate feedback into crisp specs
  • Evangelize with proof—not just talk: demos, field enablement, and stories that make AI performance gains undeniable
  • Ensure the delivery and operation of features on the end-to-end roadmap for Azure Managed Redis – from inception to worldwide launch
  • Act as the bridge among Azure engineering, the Redis community and Redis Inc., and adjacent product groups
  • Fulltime
Read More
Arrow Right

Principal Product Manager/Architect - Foundry Inference Platform (CoreAI)

We are seeking a Principal Product Manager/Architect to define and guide the tec...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 10+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • 1. Product Reliability: Own the product direction for Microsoft Foundry inference, with a primary mandate to make the platform the most reliable enterprise inferencing service available. This includes defining architectural standards for global serving, multi-region resiliency, automated failover, and platform-managed disaster recovery
  • Drive architectural alignment across global routing, capacity pooling, observability, and control plane abstractions to ensure consistent availability, predictable recovery behavior, and simplified customer operations at scale
  • Partner with engineering, infrastructure, and security leaders to ensure reliability targets, SLAs, SLOs and recovery objectives are designed into the platform by default
  • 2. GPU Fleet Efficiency & Capacity: Set the product direction for GPU fleet efficiency and capacity management, guiding platform-level design decisions that maximize utilization, minimize fragmentation, and accelerate timetomonetization of new hardware and models
  • This includes shaping the architecture for global capacity pooling, intelligent scheduling, fungibility across workloads, automated demand forecasting, and softwaredefined allocation
  • The Product Manager/Architect is expected to influence architectural investments across inference utilization, model serving, and hardware/system performance
  • 3. Strategic Customer & Innovation Engagement: Act as a senior technical advisor and architect for Foundry’s most innovative and strategic customers
  • Engage directly with customers on deep technical challenges, including largescale model migrations, reliabilitysensitive production deployments, and advanced serving architectures
  • Support competitive and strategic initiatives by articulating Foundry’s architectural advantages, turning bespoke requests into scalable features
  • 4. Cross-Company Technical Leadership: Serve as a unifying architectural voice across product management, engineering, infrastructure, and partner teams
  • Fulltime
Read More
Arrow Right

Principal Product Manager

At Microsoft, we are building the world’s most trusted and developer‑centric AI ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering, computer science, or a related technical field
  • Significant experience (typically 8–12+ years) in product management or software engineering with substantial product ownership, including experience working on platform or infrastructure products
  • Demonstrated ability to operate effectively in large, ambiguous, multi‑team environments with shared ownership and complex dependencies
  • Strong technical depth in cloud platforms, distributed systems, or AI/ML infrastructure, with the ability to engage credibly with senior engineers and architects
  • Proven track record of influencing strategy, driving alignment, and delivering outcomes through collaboration rather than direct authority
  • Strong analytical and systems‑thinking skills, with experience making high‑quality decisions in fast‑moving, evolving problem spaces
Job Responsibility
Job Responsibility
  • Act as a senior contributor to platform strategy for Azure AI Foundry and Azure ML, helping shape multi-year investments across model training, customization, deployment, and lifecycle management
  • Drive alignment and progress across federated, cross-organizational initiatives, working with peer Principal PMs and multiple engineering teams on shared platform outcomes
  • Contribute to the definition and evolution of high-leverage platform abstractions (APIs, SDKs, workflows) that enable scalable adoption of GenAI and custom code training workloads
  • Partner closely with senior engineering leaders to influence architectural direction, surface trade-offs, and ensure platform capabilities meet scale, reliability, and security expectations
  • Engage with strategic customers and internal stakeholders to gather insights, validate requirements, and translate learnings into durable, reusable platform capabilities
  • Use data, metrics, and experimentation to evaluate impact and inform product decisions across shared ownership areas
  • Serve as a thought leader and mentor within CoreAI, elevating product craft, platform thinking, and responsible AI practices across the organization
  • Fulltime
Read More
Arrow Right
New

Vascular Technician

We have an exciting opportunity for a Vascular Technician to join our Imaging te...
Location
Location
United Kingdom , Hemel Hempstead
Salary
Salary:
Not provided
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
June 23, 2026
Flip Icon
Requirements
Requirements
  • Degree or equivalent qualification in Vascular Science, Ultrasound, or related discipline with relevant professional registration (e.g. HCPC) or eligibility for registration ideally with SVT or equivalent vascular accreditation
  • Proven experience performing and reporting non‑invasive vascular investigations with a strong understanding of vascular pathology and diagnostic techniques with experience working across both NHS and private healthcare settings and contributing to service development or audit activity
Job Responsibility
Job Responsibility
  • To deliver a high‑quality, patient‑centred vascular diagnostic service to both NHS and private patients within a private hospital setting
  • Independently perform, report, and support a full range of non‑invasive vascular investigations
  • Ensure compliance with national standards, clinical governance, and hospital policies
What we offer
What we offer
  • Payment in lieu of holiday
  • Contributory pension scheme
  • Employee Assistance
  • One Stop Healthcare Discount
  • Discounted Cinema Tickets
  • Free on-site parking
Read More
Arrow Right
New

Sales Assistant - 16 Hours

This is the face of our business and key to our success. We are looking for some...
Location
Location
United Kingdom , Market Drayton
Salary
Salary:
Not provided
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Desire to offer exceptional customer service
  • Great communication skills
  • Able to establish and maintain positive working relationships with others
  • Be flexible and adaptable in line with the needs of the store
  • Ability to prioritise tasks
  • Strong team player
Job Responsibility
Job Responsibility
  • Acknowledge customers in a friendly and welcoming way
  • Respond to queues and queries quickly and efficiently
  • Attentive to customer needs
  • Always be courteous, cheerful and respectful
  • Recommend productions and promotions
  • Carry out freshness checks
  • Ensure till accuracy
  • Perform stock replenishment and merchandising in accordance with company guidelines
  • Parttime
Read More
Arrow Right
New

Registered Nurse

Located in Newton Mearns, Mearns House combines luxurious settings with high-qua...
Location
Location
United Kingdom , Newton Mearns
Salary
Salary:
20.94 - 21.56 GBP / Hour
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Valid NMC registration
  • Excellent knowledge of the most up-to-date clinical practices and regulatory frameworks
  • Professional, compassionate and motivated
  • Organised, flexible in approach and have great communication skills
Job Responsibility
Job Responsibility
  • Upholding high nursing care standards by leading the team and ensuring the shift runs smoothly by creating a safe and supporting environment in line with NMC codes of practice
  • Deliver hands on clinical care, leadership and support to provide the highest standards of person-centred care ensuring our residents live their best lives
  • Advocate and deliver a person-centred approach to care for our residents and their families
  • Develop and monitor comprehensive, tailored care plans and detailed risk assessments adhering to regulatory frameworks
What we offer
What we offer
  • Nurse Development programme
  • Fulltime wage with ¾ days off each week
  • Overtime rates
Read More
Arrow Right