This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Microsoft’s Applied Sciences Group is seeking a visionary and hands-on Principal Applied Scientist to lead research and development in SLM, VLM, multimodal AI, across language, vision and agent workloads. This role is ideal for candidates passionate about building real-world systems that unify visual and textual modalities to power next-generation user experiences across devices and platforms. As a senior member of the team, you will drive innovation across model architecture, training, and deployment, especially for scalable autoregressive models that handle both language, structured text and reasoning tasks. You will also play a key role in converting cutting-edge research into practical applications and experiences for users across the globe.
Job Responsibility:
Design and prototype unified token-based architectures that treat text and/or image data as sequences for coherent text or multimodal generation
Work on SLMs for tasks such as planning, image captioning, visual question answering, and structured text generation
Build scalable training pipelines for large-scale text datasets
Optimize deep neural networks for deployment on Neural Processing Units (NPUs), GPUs and cloud environments, maximizing efficiency and performance
Collaborate with cross-functional teams to integrate models into Microsoft products and services
Publish research in top-tier venues (NeurIPS, CVPR, ICCV, ICLR) and contribute to the scientific community
Mentor junior scientists and engineers, fostering a collaborative and innovative research environment
Requirements:
Doctorate in Computer Vision, Machine Learning, or a related field with demonstratable experience in applied research or product development
OR Master's degree in Computer Vision, Machine Learning, or a related field with demonstratable experience in applied research or product development
OR Bachelor's degree in Computer Vision, Machine Learning, or a related field with demonstratable experience in applied research or product development
Strong publication record in top-tier venues (CVPR, ICCV, ECCV, NeurIPS, ICLR, AAAI)
Advanced Python or C++ (especially C++11 and newer) experience
Advanced experience in deep learning and its different toolkits, in particular Pytorch or TensorFlow
Ability to meet Microsoft, customer and/or government security screening requirements
Demonstrated ability to translate research into real-world applications
Proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow, HuggingFace)
Hands-on experience with generative models, especially diffusion and transformer-based synthesis
Experience building and training multimodal autoregressive models
Nice to have:
Experience deploying models to production or on-device environments
Experience optimizing models for Neural Processing Units (NPUs) or other hardware accelerators
Knowledge of quantization, pruning, and efficient fine-tuning techniques
Experience with RLHF, proven ability to design, prototype and implement training pipelines for planning and reasoning
Strong collaborative skills across cross-functional teams