Principal Researcher

Generative AI is transforming how people create, collaborate, and communicate—re...

Location

United States , Redmond

Salary:

163000.00 - 296400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Doctorate in relevant field AND 6+ years related research experience OR Master's Degree in relevant field AND 7+ years related research experience OR Bachelor's Degree in relevant field AND 9+ years related research experience OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Formulate, develop, and evaluate new algorithmic and system-level approaches for end-to-end AI serving, using analytical modeling and large-scale measurement to study token-level latency, tail latency (p95/p99), throughput-per-dollar, cold-start behavior, warm pool strategies, and capacity planning under multi-tenant SLOs and variable sequence lengths
Design and experimentally evaluate endpoint configuration and execution policies, including batching, routing, and scheduling strategies, tensor and pipeline parallelism, quantization and precision profiles, speculative decoding, and chunked or streaming generation, and drive the most promising approaches through robust rollout and validation into production
Perform hardware- and kernel-aware optimization by collaborating closely with model, kernel, compiler, and hardware teams to align serving algorithms with attention/KV innovations and accelerator capabilities
Build and benchmark experimental prototypes and large-scale measurements to validate research ideas and drive them toward production readiness
produce clear technical documentation, design reviews, and operational playbooks
Publish research results, file patents, and, where appropriate, contribute to open-source systems and serving frameworks.

Fulltime

Principal Software Engineer - Performance Tooling

The Artificial Intelligence (AI) Frameworks team at Microsoft develops AI softwa...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. This includes passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
4+ years’ practical experience working on high performance applications and performance debugging and optimization on CPUs/GPUs
Experience in DNN/LLM inference and experience in one or more DL frameworks such as PyTorch, Tensorflow, or ONNX Runtime and familiarity with CUDA, ROCm, Triton
Technical background and solid foundation in software engineering principles, computer architecture, GPU architecture, hardware neural net acceleration
Experience in end-to-end performance analysis and optimization of state of the art LLMs and HPC applications, including proficiency using GPU profiling tools
Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
Ability to independently lead projects

Job Responsibility

Work across multiple layers of the AI software stack (abstractions, programming models, compilers, runtimes, libraries, and APIs) to enable large-scale model training and inference
Benchmark OpenAI and other LLMs for performance on Graphic Processing Units (GPUs) and Microsoft hardware
Debug, profile, and optimize performance for training/inference workloads on CPUs (Central Processing Units)/GPUs
Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint
Collaborate across teams of researchers and engineers to deliver scalable, production-ready AI performance improvements

Fulltime

Principal Software Engineering Manager - Substrate Efficiency

M365 Copilot inference is a high-impact engineering team advancing applied AI an...

Location

United States , Redmond

Salary:

142800.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Build and lead a high-performing engineering team focused on inference runtime efficiency and model execution performance
Define and drive strategy to improve throughput per GPU through runtime optimizations
Increase engineering agility, enabling faster experimentation, iteration, and rollout of performance improvements
Partner across M365 Core, AI Core, Azure, and Microsoft Research to co-design and productionize advanced inference optimizations
Establish metrics, telemetry, and experimentation frameworks to measure efficiency gains and guide investment decisions
Own live-site performance, reliability, and operational excellence for inference engines at scale
Drive alignment across partner teams on engine interfaces, performance goals, and optimization priorities.

Fulltime

Principal Software Engineering Manager - AI Frameworks

As a Principal Software Engineering Manager - AI Frameworks on the team, you wil...

Location

United States , Redmond

Salary:

139900.00 - 304200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Master’s Degree in Computer Science or related technical field AND 10+ years of software engineering experience, including 6+ years in engineering management, OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years of software engineering experience, including 6+ years in engineering management, or equivalent experience
Strong technical foundation in software engineering principles, computer architecture, GPU architecture, and hardware acceleration for neural networks, with the ability to guide teams working in these areas
Experience leading teams responsible for end-to-end performance analysis and optimization of LLMs, AI systems, or HPC workloads, including use of GPU profiling and performance analysis tools
Demonstrated ability to lead cross-team initiatives, align stakeholders, and translate research or platform capabilities into scalable, production-ready solutions
Proven people leadership skills, including hiring, coaching, performance management, and career development, with a track record of building high-performing, inclusive teams
Exposure to AI / ML infrastructure, including DNN or LLM training and/or inference systems, and experience with at least one modern deep learning framework (e.g., PyTorch, TensorFlow, ONNX Runtime)
Familiarity with GPU software stacks and acceleration technologies such as CUDA, ROCm, Triton, or equivalent, sufficient to guide technical direction and evaluate tradeoffs

Job Responsibility

Lead and develop a team of engineers working across multiple layers of the AI software stack to enable large-scale training and inference
Set technical vision and execution strategy for model performance benchmarking, optimization, and deployment across GPUs and Microsoft hardware
Drive performance outcomes by prioritizing and overseeing efforts to benchmark, profile, debug, and optimize training and inference workloads
Own performance health by establishing mechanisms to monitor regressions, measure impact, and continuously improve time-to-deploy and hardware efficiency
Partner cross-functionally with research, product, infrastructure, and hardware teams to deliver scalable, production-ready AI performance improvements
Balance short-term delivery and long-term investments, ensuring the team’s work aligns with organizational goals, platform roadmaps, and Azure capex objectives
Build a strong engineering culture through coaching, feedback, hiring, and career development, enabling the team to operate with increasing autonomy and impact

Fulltime

Staff / Principal Machine Learning Engineer, Serving

A year ago, reliably working agentic systems and sub-second multimodal inference...

Location

United States , Mountain View

Salary:

270000.00 - 500000.00 USD / Year

Inworld AI

Expiration Date

Until further notice

Requirements

Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
Model Acceleration. Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding
High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems

Job Responsibility

We hand you unclear problems and expect you to make them clear
We value engineers who say 'I don't know yet' and then design the benchmark or prototype that finds out
We treat performance, latency, and reliability as first-class product features, not a box to check before launch
Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward
Your work should be visible

What we offer

bonus
equity
benefits
relocation assistance

Fulltime

Staff / Principal Machine Learning Engineer, Serving

Inworld is a product-oriented research lab of top AI researchers and engineers, ...

Location

Switzerland

Salary:

Not provided

Inworld AI

Expiration Date

Until further notice

Requirements

Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
Model Acceleration. Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding
High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems
Professional fluency in English (written and spoken) is required, as you will be collaborating daily with our US-based leadership and engineering teams

Fulltime

Principal Engineer, ASIC Development Engineering (Frontend Architect - AI Storage Solutions)

In this Frontend Architect position, you will develop AI Storage Solutions based...

Location

India , Bangalore

Salary:

Not provided

Sandisk

Expiration Date

Until further notice

Requirements

Bachelors or Masters or PhD in Computer/Electrical Engineering with 8+ years of hands-on Architecture experience authoring specifications
Strong technical background architecting SoC and I/O subsystems involving PCIe and PCIe-DMA engines, or UCIe or CXL or UAL
Strong IO subsystem microarchitecture, technical, and working knowledge of the PCIe/UCIe protocol specifications
Knowledge of I/O Subsystem and DMA interactions with internal embedded processor-subsystems (x86, RISC-V or ARM) and external host CPU
Good understanding of computer/graphics architecture, ML, LLM
Architecting an GPU/TPU/xPU Accelerator systems with optimized high bandwidth memory hierarchy and frontend architecture for multi-trillion parameter LLM training/inference including Dense, Mixture of Experts (MoE) with multiple modalities (text, vision, speech)
Deep experience optimizing large-scale ML systems, GPU architectures
Proficiency in principles and methods of microarchitecture, software, and hardware relevant to performance engineering
Multi-disciplinary experience, including familiarity with Firmware and ASIC design
Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations

Job Responsibility

Responsible for driving the SoC architecture, with a particular focus on I/O subsystems connected over UCIe, PCIe, UAL or CXL
Define I/O subsystem and PCIe DMA architectures, including their interactions with internal embedded processor-subsystems, Network on Chip, Memory controllers, and FPGA fabric
Create flexible and modular I/O subsystem architectures that can be deployed in either chiplet, monolithic or 3D form factors
Work with customers, and cross-functional teams to scope SoC requirements, analyze PPA tradeoffs, and then define architectural requirements that meet the PPA and schedule targets
Define I/O subsystem and DMA hardware, software, and firmware interactions with embedded processing subsystems and SoC CPUs on the device side and Host CPUs
Author architecture specifications in clear and concise language. Guide and assist pre-silicon design/verification and post-silicon validation during the execution phase
Responsible for improving the AI/ML ASIC Architecture performance through hardware & software co-optimization, post-silicon performance analysis, and influencing the strategic product roadmap
LLM Workload analysis and characterization of ASIC and competitive datacenter and AI solutions to identify opportunities for performance improvement in our products
Experience architecting one or some components of AI/ML accelerator ASICs such as HBM, PCIe/UCIe/CXL, NoC, DMA, Firmware Interactions, NAND, xPU, fabrics, etc
Drive the AI Storage Solutions frontend system architecture with GPU/TPU/NPU/xPU to match or exceed the nextgen HBM bandwidth

Fulltime

Principal Software Engineer

Are you looking for an opportunity to work with the latest Azure offerings and p...

Location

India , Bangalore

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

10–12+ years of experience in software engineering, with significant experience building scalable backend or distributed systems
Strong programming expertise in one or more languages such as Python, Go, Java, or C#, with experience designing production-grade services and APIs
Experience building AI-powered applications, including integrating LLMs, implementing agent or Copilot workflows, and orchestrating multi-step AI interactions
Hands-on experience with LLM application frameworks and orchestration tools such as Semantic Kernel, LangChain, or similar agent frameworks
Familiarity with retrieval-augmented generation (RAG) architectures, vector databases, embeddings, and semantic search systems
Experience evaluating and improving model performance through prompt design, evaluation frameworks, fine-tuning, or feedback loops
Solid understanding of distributed systems concepts including scalability, reliability, observability, caching, and asynchronous processing
Experience deploying and operating AI workloads in cloud environments (preferably Azure), including containerized services and GPU-enabled infrastructure
Understanding of Responsible AI practices, including model governance, safety, privacy, and evaluation of AI behaviour in production systems
Ability to work across product, research, and engineering teams to translate product scenarios into scalable AI system architectures

Job Responsibility

Design, build, and operate scalable AI systems that power intelligent product experiences, including Copilot and agent-driven workflows
Architect and implement backend services that support multi-step AI interactions, including orchestration pipelines, context management, memory/state persistence, and tool execution
Integrate large language models (LLMs), APIs, and internal services to enable context-aware, human-in-the-loop experiences across customer scenarios
Build and maintain data and inference pipelines that support model training, fine-tuning, evaluation, and real-time inference across diverse data sources
Evaluate, benchmark, and tune AI/ML models (LLMs and traditional models) to meet product requirements for accuracy, latency, reliability, and safety
Implement robust retrieval, grounding, and knowledge integration mechanisms (e.g., RAG systems, semantic indexing, vector search) to power intelligent applications
Collaborate with product managers, software engineers, and researchers to translate product vision into production-ready AI capabilities and measurable outcomes
Ensure reliability, observability, and governance of AI systems, including monitoring model performance, data quality, and responsible AI practices
Build reusable platforms, APIs, and tools that enable teams to rapidly develop AI-powered features and self-service intelligent applications

Fulltime

Select Country

Principal Researcher - GPU Performance

Job Description

Job Responsibility

Requirements

Looking for more opportunities?