CrawlJobs Logo

Senior Software Engineer, CoreAI Workload Engines

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

119800.00 - 304200.00 USD / Year

Job Description:

The CoreAI Workloads team builds the foundational inference engines and APIs that power largescale AI inference across Azure - from cutting-edge startups to Fortune 500 enterprises and Microsoft Copilots and agents. Our mission is to deliver secure, reliable, and highly efficient GPU inference that enable multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity. We own inference serving and performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI serving some of the largest workloads on the planet with trillions of inferences per day. Our converged AI fabric and engines deliver inference capabilities for all LLMs in Microsoft catalog, including OpenAI, Anthropic, Mistral, Cohere, Llama, and more.

Job Responsibility:

  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
  • Work across multiple layers of the AI software stack (abstractions, programming models, engine runtimes, libraries, and APIs) to enable large-scale model inference.
  • Benchmark OpenAI and other LLMs for performance across Azure OpenAI Service workload tiers and segments, and translate results into production improvements.
  • Debug, profile, and optimize production inference performance across the stack (abstractions, runtime, scheduling, and serving pipelines) to improve latency, throughput, and utilization.
  • Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint.
  • Collaborate across engineering teams to deliver scalable, production-ready serving efficiency and availability improvements, using experimentation results to guide prioritization and rollout.
  • Build durable engine interfaces that enable fast experimentation and safe shipping of new strategies for class of service (QoS), replica load balancing, KV management (including offload/retrieval), quantization, and sampling (e.g., multi-token prediction and constrained sampling).

Requirements:

  • Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries.

Nice to have:

  • Experience optimizing LLM inference in practice (e.g., PyTorch inference, serving runtimes, model execution, or inference orchestration) in production environments.
  • Familiarity with high performance networking and low latency communication stacks.
  • Familiarity with GPU-accelerated inference stacks (e.g., CUDA at the application/runtime level, device plugins, or runtime integration).
  • Experience building or using experimentation systems (A/B, canarying, tiered rollout), including metric definition and comparability for performance and reliability.
  • Familiarity with distributed inference stacks (e.g., NCCL-style collectives, model/tensor parallelism) and performance tradeoffs in large-scale serving.
What we offer:
  • Benefits and other compensation

Additional Information:

Job Posted:
April 23, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer, CoreAI Workload Engines

Senior Software Engineer - CoreAI

Are you passionate about building high-performance, low-latency systems that pow...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years of experience with cloud platforms such as Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP)
  • 1+ years of proficiency with AI-assisted development tools (e.g., GitHub Copilot, IntelliCode)
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own and operate highly scalable, reliable, and low-latency distributed systems that power mission-critical workloads
  • Design and implement features that enable configuration management, monitoring, analytics, and observability for modern cloud applications
  • Drive integration with other Azure services to deliver seamless customer experiences
  • Write high-quality, well-tested code and own the DevOps lifecycle, including monitoring, alerting, and incident response
  • Integrate AI-assisted development tools to improve engineering productivity and code quality
  • Contribute to AI-enhanced features using technologies such as Large Language Models (LLMs), Model Context Protocol (MCP) servers, and Retrieval-Augmented Generation (RAG)
  • Mentor others, fostering technical growth, promoting engineering best practices, and leading initiatives to enhance team capabilities and collaboration
  • MS Culture & Values: Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Principal Product Manager

At Microsoft, we are building the world’s most trusted and developer‑centric AI ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering, computer science, or a related technical field
  • Significant experience (typically 8–12+ years) in product management or software engineering with substantial product ownership, including experience working on platform or infrastructure products
  • Demonstrated ability to operate effectively in large, ambiguous, multi‑team environments with shared ownership and complex dependencies
  • Strong technical depth in cloud platforms, distributed systems, or AI/ML infrastructure, with the ability to engage credibly with senior engineers and architects
  • Proven track record of influencing strategy, driving alignment, and delivering outcomes through collaboration rather than direct authority
  • Strong analytical and systems‑thinking skills, with experience making high‑quality decisions in fast‑moving, evolving problem spaces
Job Responsibility
Job Responsibility
  • Act as a senior contributor to platform strategy for Azure AI Foundry and Azure ML, helping shape multi-year investments across model training, customization, deployment, and lifecycle management
  • Drive alignment and progress across federated, cross-organizational initiatives, working with peer Principal PMs and multiple engineering teams on shared platform outcomes
  • Contribute to the definition and evolution of high-leverage platform abstractions (APIs, SDKs, workflows) that enable scalable adoption of GenAI and custom code training workloads
  • Partner closely with senior engineering leaders to influence architectural direction, surface trade-offs, and ensure platform capabilities meet scale, reliability, and security expectations
  • Engage with strategic customers and internal stakeholders to gather insights, validate requirements, and translate learnings into durable, reusable platform capabilities
  • Use data, metrics, and experimentation to evaluate impact and inform product decisions across shared ownership areas
  • Serve as a thought leader and mentor within CoreAI, elevating product craft, platform thinking, and responsible AI practices across the organization
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Senior Product Manager - CoreAI

The Azure Managed Redis Product Management team defines the vision, strategy, an...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 5+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Technical expertise in distributed systems or databases – e.g. experience building or managing services in cloud infrastructure, high-scale web/enterprise software, or data storage systems
  • Proven success in driving product strategy and execution
  • Collaboration & leadership skills
  • Customer focus and data-driven mindset
  • Proven coding/vibe coding experiences to quickly build proof of concepts or demos to accelerate time to market
  • Domain experience with Redis (open-source or commercial), Azure Cache for Redis, or similar caching and NoSQL technologies
  • Experience with AI/ML workloads or data platforms
Job Responsibility
Job Responsibility
  • Articulate and drive the vision for Azure Managed Redis – focusing on performance, scalability, and seamless AI integration
  • Bet boldly on AI patterns (RAG, agent memory, vector indexing, semantic caching)
  • convert signals into roadmap decisions that deliver measurable value
  • Drive the roadmap end-to-end—prioritize AI first features while holding the line on reliability, compliance, and security
  • Lead GTM through influence—shape AI centric positioning and messaging, unblock execution across Product Marketing and Engineering
  • Cocreate with Engineering (Microsoft & Redis Inc.)
  • Build with customers and partners—pressure test AI and caching scenarios, validate fit, and translate feedback into crisp specs
  • Evangelize with proof—not just talk: demos, field enablement, and stories that make AI performance gains undeniable
  • Ensure the delivery and operation of features on the end-to-end roadmap for Azure Managed Redis – from inception to worldwide launch
  • Act as the bridge among Azure engineering, the Redis community and Redis Inc., and adjacent product groups
  • Fulltime
Read More
Arrow Right
New

Store Manager in Training

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Atlanta
Salary
Salary:
18.50 - 23.88 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 14, 2026
Flip Icon
Requirements
Requirements
  • Communicate well verbally and in writing to support and lead your team
  • Perform customer care duties to provide high levels of service
  • Execute merchandising strategies to support store sales growth
  • Manage the store inventory and assets to maintain profitability
  • Actively engage with your leader in Key Learning Experiences to support you on your path to promotion during the structured program
  • Support your store as management team member and lead in a manner that is consistent with CVS values and policies
  • Engage your colleagues in support of the company's purpose of helping people on their path to better health
  • Be willing to accept promotion roles with the market that you work in
  • Willingness to accept a promotion to Store Manager role at any location in the designated market
  • Ability to transfer to other CVS Pharmacy stores located within the designated market
What we offer
What we offer
  • medical
  • dental
  • vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • Fulltime
Read More
Arrow Right
New

Store Manager

Our Store Managers are inspirational role models who aspire to continuously impr...
Location
Location
United Kingdom , Stevenage
Salary
Salary:
Not provided
ernestjones.co.uk Logo
Ernest Jones
Expiration Date
April 27, 2026
Flip Icon
Requirements
Requirements
  • An excellent track record of successfully achieving KPI's
  • Experience of holding accountability for managing the commercial performance of a store
  • Experience of leading store teams
  • Experience of target-setting and interpreting business data
  • An ability to inspire, coach and develop your team to maximise potential and performance
  • Proven track record of role modelling and consistently delivering amazing customer experiences
  • Clear passion for our products and ability to put yourself in our customers shoes
  • A competitive spirit
  • Basic knowledge of legislative responsibilities e.g. recruitment, employment law, health and safety, trading standards
Job Responsibility
Job Responsibility
  • Achieving the store's targets through promoting the sales culture instore, building and inspiring a strong customer-first team and by ensuring compliance with company policies and procedures
  • Role modelling selling skills, coaching and inspiring the team through great leadership and adopting a proactive approach to sales management
  • Consistent compliance with company instructions, policies and processes
  • Control of costs, payroll, shrinkage and expenses for the store
  • Planning and organisation of operational activities within the store
What we offer
What we offer
  • Competitive salary including sales incentives
  • Generous discount of up to 30% off our fabulous products from day one
  • An annual enhanced discount to celebrate the day you joined our team
  • Retirement Savings plans which offer flexibility in the way you save for the future
  • Immediate Life Assurance from day one
  • A minimum of 33 days holiday per year
  • Recognised qualifications, study support and structured career progression
  • Health and Wellbeing Scheme
  • Financial Wellbeing scheme
  • Give As You Earn scheme
  • Fulltime
!
Read More
Arrow Right
New

HR Manager

Our client is a well-established organization within the engineering and project...
Location
Location
Malaysia , Kuala Lumpur
Salary
Salary:
10000.00 - 15000.00 MYR / Month
https://www.randstad.com Logo
Randstad
Expiration Date
May 25, 2026
Flip Icon
Requirements
Requirements
  • Minimum 7–10 years of experience in Human Resources, with strong focus on industrial relations (IR)
  • Proven experience handling complex employee relations and union-related matters
  • Strong knowledge of Malaysian employment law and IR practices
  • Experience in performance management systems and compensation benchmarking
  • Excellent communication, negotiation, and conflict resolution skills
  • Analytical and strategic mindset with strong decision-making capability
  • Prior experience in construction or engineering industry is an added advantage.
Job Responsibility
Job Responsibility
  • Lead all aspects of Human Resources, including employee relations, HR operations, and organizational development
  • Act as the primary point of contact for industrial relations (IR) matters including grievance handling, dispute resolution, and union engagement
  • Ensure full compliance with Malaysian labor laws, regulations, and internal policies
  • Develop and implement strategies to maintain a harmonious and productive workplace
  • Design and manage performance management systems, including KPIs, appraisals, and feedback frameworks
  • Conduct salary benchmarking and compensation analysis to ensure competitive and equitable pay structures
  • Oversee recruitment, onboarding, and workforce planning initiatives
  • Develop, review, and enhance HR policies, SOPs, and governance frameworks
  • Drive HR initiatives to improve employee engagement and organizational effectiveness
  • Fulltime
Read More
Arrow Right
New

Learning and Content Development Lead

The Opportunity We are looking for a Health & Safety Learning and Content Devel...
Location
Location
Australia , Melbourne
Salary
Salary:
135000.00 - 145000.00 AUD / Year
https://www.randstad.com Logo
Randstad
Expiration Date
April 27, 2026
Flip Icon
Requirements
Requirements
  • Instructional Design Expertise: Experience building high-quality learning content, ideally within a technical or safety-focused environment
  • Tech: Proficiency with e-learning authoring tools (e.g., Articulate 360, Adobe Captivate) and understanding of LMS management
  • Subject Matter: A solid grasp of health and safety principles and the ability to translate legislative requirements into practical actions
  • Collaboration: The ability to partner with diverse stakeholders to drive a proactive, learning-focused safety culture
  • Must hold (or be eligible for) a valid Working with Children Check
  • Instructional Design
  • Articulate
  • LMS Management
  • OHS Legislation
  • Compliance
Job Responsibility
Job Responsibility
  • Learning Strategy: Lead the end-to-end design and evaluation of health and safety learning programs, including e-learning, blended workshops, and micro-learning
  • Content Creation: Use adult learning principles to translate complex technical information into user friendly toolkits, checklists, and visual guides
  • Innovation: Manage the deployment, testing, and performance tracking of learning modules across digital platforms and LMS systems
  • Partnership: Collaborate with subject matter experts and leadership to identify capability gaps
  • Continuous Improvement: Maintain version control and ensure all content remains compliant and industry best practices
  • Fulltime
!
Read More
Arrow Right