This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Help build the world’s most advanced multimodal dataset at Microsoft AI. We are on a mission to create the largest and most advanced multimodal dataset in the world. This dataset, spanning all modalities from across the web and beyond, will power the training of the world’s most capable AI frontier models, pushing the boundaries of scale, performance, and product deployment. The AI Data Infra team at Microsoft AI is responsible for building data infrastructure to help MAI teams to generate the biggest and best training dataset. Our work involves data pipelines, Spark, Ray, Vector Databases, and all other aspects of data infra.
Job Responsibility:
Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video)
Own and maintain critical data infrastructures, including spark, ray, vector databases, and others
Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models
Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation
Embody our culture and values
Requirements:
Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 8+ years experience in business analytics, data science, software development, data modeling, or data engineering
OR equivalent experience
Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 12+ years experience in business analytics, data science, software development, data modeling, or data engineering
OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 15+ years experience in business analytics, data science, software development, data modeling, or data engineering
OR equivalent experience
4+ years experience with data governance, data compliance and/or data security
Passionate about the role of data in large-scale AI model training
Thrive in a highly collaborative, fast-paced environment
Have a high degree of expertise and pay close attention to details
Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies
Effectively manage multiple responsibilities and can adjust to shifting priorities