CrawlJobs Logo

Principal Software Engineer, CoreAI

United States, Redmond 142800.00 - 304200.00 USD / Year · Job Posted June 15, 2026
Apply Position
Job Link Share

Job Description

The CoreAI AI Platform team is seeking a Principal Software Engineer in Redmond, WA to design and evolve cloud-scale platform services, developer and research tooling, and governed data infrastructure that enable safe, reliable, and efficient delivery of AI-powered product experiences.

Job Responsibility

  • Lead the architecture and implementation of large-scale platform services that support complex engineering and AI workflows in distributed cloud environments
  • Build internal tooling and automation that improve productivity for engineers and researchers across experimentation, deployment, and operational workflows
  • Design platform capabilities that make data easier to discover, access, and use in secure, governed, and auditable ways
  • Drive operational excellence through improvements in reliability, observability, deployment safety, and incident readiness
  • Partner across teams to resolve cross-cutting technical problems and align architecture, engineering standards, and long-term investments
  • Mentor engineers, contribute to technical reviews, and help raise the engineering bar across the organization

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.

Nice to have

  • 6+ years of experience designing, building, and operating large-scale software systems
  • Strong expertise in distributed systems, cloud infrastructure, platform engineering, or data infrastructure
  • Demonstrated technical leadership across ambiguous problem spaces and across team boundaries
  • Experience improving production reliability, security, and operational efficiency for complex services
  • Experience building internal platforms, developer tooling, or workflow automation for engineering or research organizations
  • Experience with governed data platforms, access control, observability, or compliance-aware system design
  • Experience supporting AI or ML-adjacent workloads and the infrastructure around them
  • Strong mentoring and collaboration skills with a track record of raising engineering quality across teams

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Principal Software Engineer, CoreAI

8 matching positions

Principal Software Engineer, CoreAI

Core AI is at the forefront of Microsoft’s mission to redefine how software is b...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
Job Responsibility
Job Responsibility
  • Shape the Product Vision: Define and influence the product roadmap by aligning technical strategy with business goals and customer needs
  • Drive Strategic Clarity: Leverage data-driven insights and competitive intelligence to inform product direction, identify opportunities, and guide decision-making
  • Architect for Scale and Sustainability: Design and evolve durable, scalable system architectures that balance long-term maintainability with short-term delivery needs, making thoughtful engineering trade-offs
  • Foster Engineering Alignment: Work with the engineering teams and partner organizations by driving clarity, alignment, and shared ownership of technical direction
  • Deliver Cohesive End-to-End Experiences: Collaborate closely with partner teams—including experience, SDK, and platform groups—to ensure seamless integration and delivery of features across the stack
  • Build Foundational Capabilities: Contribute to and lead the development of core platform components and reusable building blocks that accelerate team velocity and product innovation
  • Champion Customer-Centric Development: Engage directly with customers and product teams to capture feedback, understand demand signals, and refine product messaging—ensuring the voice of the customer shapes product evolution
  • Lead Live Site Excellence: Drive operational excellence in managing and operating large-scale distributed systems with a high bar for service-level agreements (SLAs). Lead root cause analyses (RCAs) for key live site incidents and outages, identify systemic improvements, and set high standards for reliability and performance
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

The CoreAI GPU Infrastructure team builds the foundational accelerated compute p...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience
  • Proven ability to design and operate large-scale, production infrastructure with high reliability and performance requirements
  • Strong problem-solving skills and the ability to debug complex, cross-layer systems issues
  • Demonstrated technical leadership, including mentoring engineers and driving cross-team architectural alignment
  • Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments
  • Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multi-tenant usage)
  • Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources
  • Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios
  • Optimize performance, reliability, and utilization across large GPU fleets, including scale-up and scale-out configurations
  • Partner with networking and storage teams to enable high-performance interconnects (e.g., RDMA/InfiniBand class networking) for distributed workloads
  • Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence
  • Influence platform architecture and technical direction across teams through design reviews and technical leadership
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Joining the CoreAI organization at Microsoft means becoming part of the team tha...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
Job Responsibility
Job Responsibility
  • Collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
  • Design, build and improve services with high scalability and reliability.
  • Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
  • Participate in efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
  • Contribute to the deployment and monitoring of services in production environments.
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

The GenAI Infrastructure and Solutions team is building large-scale GenAI traini...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.
  • 6+ years designing, developing, and shipping high quality software.
  • 3+ years of experience with distributed systems and cloud-based infrastructure.
  • 2+ years of experience with containerization tools (e.g., Docker, Kubernetes).
  • 2+ years of experience with DevOps practices (CI/CD, automated testing, deployment, etc.).
  • Passionate and self-motivated. Strong ability in self-learning, entering new domain, managing through uncertainty in an innovative team environment.
  • Familiarity with virtualization technology.
  • Familiarity with production ML systems and concepts like model serving, caching, batching, and monitoring.
Job Responsibility
Job Responsibility
  • Lead the collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
  • Design, build and improve services with high scalability and reliability.
  • Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
  • Lead the efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
  • Contribute to the deployment and monitoring of services in production environments.
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - CoreAI

At CoreAI, we empower developers and organizations to shape the future with Arti...
Location
Location
United States , Redmond
Salary
Salary:
142800.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 5+ years of experience leading software engineering and analytics projects that delivered measurable product and growth wins
  • Deep experience architecting and operating large scale data pipelines in cloud environment, preferably Azure
  • Ability to write clean, working code using core algorithms, data structures, and analytics-oriented problem-solving
  • Understanding of data governance, privacy, lineage, and security best practices, especially within highly regulated or enterprise environments
  • Excellent communication skills to convey complex technical concepts to both technical and non-technical audiences
  • Experience using AI tools in software engineering, data science, and analytics workflows
  • Experience both prototyping and deploying data products
Job Responsibility
Job Responsibility
  • Leads by example and mentors others to produce extensible and maintainable code used across the company
  • Leverages deep subject-matter expertise of cross-product features with appropriate stakeholders to lead multiple product's project plans, release plans, and work items
  • Own and define end-to-end data and analytics architecture for CoreAI and Foundry platforms, setting long-term technical direction for scalable, reliable, and cost-effective analytics supporting AI workloads
  • Design, build, and optimize large-scale, robust data pipelines and architectures that support CoreAI's analytics initiatives
  • Data Governance & Trust: follow best practices for data quality, lineage, security, and compliance
  • Collaborate with stakeholders to define trustworthy data sets and implement rigorous data validation protocols, ensuring CoreAI's analytics are both accurate and auditable
  • Analytics Enablement: Partner with data scientists, analysts, and business leaders to translate business needs into technical solutions
  • Enable self-service analytics and empower teams by building data models, semantic layers, and tools that streamline access to trusted information
  • Cross-Functional Collaboration: Work closely with product managers, software engineers, AI researchers, and business stakeholders to align data solutions with business goals
  • Contribute actively to the infrastructure and culture needed to scale quantity and quality of data insights across CoreAI
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI FIT Agentic Systems

Joining the CoreAI organization at Microsoft means becoming part of the team tha...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Experience in distributed computing and architecture, and/or developing and operating high scale, reliable online services
Job Responsibility
Job Responsibility
  • Engage directly with key partners to understand and implement complex inferencing and agentic capabilities for Microsoft Copilot and other Microsoft products and Azure services
  • Design and implement API orchestration layer by leveraging OpenAI models, tools and capabilities
  • Work on cutting edge agentic platforms and automate and solve real-world problems with latest and greatest reasoning AI models
  • Work with cutting edge hardware stacks and a fast-moving software stack to deliver best of class inference and optimal cost
  • Anticipate, identify, assess, track, and mitigate project risks and issues in a fast-paced start up like environment
  • Motivated to build constructive and effective relationships and solve problems collaboratively
  • Support production inference SLAs for core AI scenarios on one of the largest GPU fleets in the world
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 331200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability
  • Fulltime
Read More
Arrow Right