This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Senior Data Scientist in the Multimedia Team, you will redefine how millions of users discover, consume, and create visual content. You will be at the heart of Bing Visual Search, Bing Image Creator, and our vast video indexing engine. Your mission is to build intelligent systems that understand the deep semantics of pixels and frames, enabling world-class image and video experiences that are fast, relevant, and inspiring.
Job Responsibility:
Visual Intelligence Development: Build and deploy SOTA machine learning models for image classification, object detection, and video action recognition to power Bing's multimedia features
Multimodal & Generative AI: Lead the development of multimodal embeddings that align text and visual data, and leverage Generative AI (e.g., DALL-E, MAI models) to enhance content creation tools
Scale & Optimization: Design robust feature-engineering pipelines to process billions of images and videos, ensuring low-latency inference in production services
Strategic Leadership: Embody Microsoft’s values by Creating Clarity in complex AI problems and Generating Energy across cross-functional teams of engineers and PMs
Responsible AI: Ensure all visual models adhere to strict Security, Privacy, and GDPR standards, specifically focusing on content moderation and bias detection in multimedia
Requirements:
Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 5+ years related experience (e.g., statistics predictive analytics, research)
OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics, predictive analytics, research)
OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 2+ year(s) related experience (e.g., statistics, predictive analytics, research)
OR equivalent experience
Mastery of Python and deep learning frameworks such as PyTorch or TensorFlow
Proven track record in Computer Vision (CV) or Multimedia Understanding, including work with large-scale visual datasets
Experience building and deploying live production systems at scale
Nice to have:
PhD focused on Computer Vision, Video Analytics, or Multimodal Learning
Experience with big data tools like Spark/PySpark and Azure Machine Learning
Publications in top-tier venues such as CVPR, ICCV, or ACM Multimedia