This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. Advances in Artificial Intelligence (AI) increasingly depend on breakthroughs in systems and architecture, where hardware, models, and software must be co-designed to scale efficiently. This Research Internship offers the opportunity to explore next-generation AI systems through performance modeling, architectural analysis, and emerging inference mechanisms. Research Interns will investigate topics such as disaggregated inference, memory-architecture, and interconnect technologies specifically focused on request scheduling and key-value (KV) caching optimizations. This role is ideal for students passionate about understanding AI systems end-to-end and shaping the architectural foundations of tomorrow’s intelligent datacenters.
Job Responsibility:
Investigate and evaluate emerging disaggregated KV cache architectures
Implement a hierarchical storage architecture with multiple tiers GPU Memory: Active working set of KV caches currently used by the model CPU DRAM: Hot cache for recently used KV chunks using pinned memory for efficient GPU-CPU transfers Local Storage: Large-scale local caching (NVMe, local disk)
Build Peer-to-Peer (P2P) service KV cache sharing architecture that enables direct, high-performance cache transfer between multiple LLM serving instances without requiring centralized cache servers
Requirements:
Currently enrolled in a PhD program in Computer Science, Electrical/Computer Engineering, or a related field
Research experience in areas such as computer architecture, AI/ML systems, performance modeling, distributed systems, or hardware–software co-design
Programming skills in Python, C/C++ with experience building prototypes, simulators, or performance analysis tools
Familiarity with modern AI workloads and/or deep learning frameworks (e.g., PyTorch)
Demonstrated ability to define and pursue original research directions in AI systems or architecture
Ability to collaborate effectively with researchers across disciplines and work in cross-group, cross-cultural environments
Proficient communication and presentation skills for sharing complex technical insights
Ability to think creatively and approach system and architecture challenges with unconventional or innovative solutions
Experience with PyTorch, CUDA, Triton, or performance-simulation tools
Background in large-scale system design, AI inference bottleneck analysis, or modeling cost/performance tradeoffs
Understanding of accelerator, memory-system, or interconnect design principles