This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits. We’re looking for a Member of Technical Staff - Principal Data Infrastructure Engineer. This role is a dynamic blend of Platform Engineering, DevOps/SRE, and Big Data Infrastructure Engineering, focused on enabling large-scale data and ML pipelines and intelligent systems. If you’ve architected big data platforms from the ground up and are eager to apply that expertise to consumer AI, we want to hear from you. You’ll bring: Deep technical expertise; A passion for automation and observability; Fluency in distributed systems; Creativity to design scalable solutions; And just as importantly: empathy, collaboration, and a growth mindset
Job Responsibility:
Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications
Champion DevOps and SRE best practices—automated deployments, service monitoring, and incident response
Build a self-service big data platform that empowers data and platform engineers and researchers
Develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM)
Collaborate with Data Engineers, Data Scientists, AI Researchers, and Developers to deliver secure, seamless big data workflows
Lead technical design reviews and uphold a clean, secure, and well-documented codebase
Proactively identify and resolve bottlenecks in data pipelines and infrastructure
Optimize system performance across storage, compute, and analytics layers
Partner with Security teams to enhance system security (IAM, OAuth, Kerberos)
Embody and promote Microsoft’s values: Respect, Integrity, Accountability, and Inclusion
Requirements:
Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering
OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
OR equivalent experience
4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering
3+ years of hands-on experience managing and scaling distributed systems—from bare-metal to cloud-native environments
2+ years deploying containerized applications using Kubernetes and Helm/Kustomize
Solid scripting and automation skills using Python, Bash, or PowerShell
Proven success in CI/CD pipeline management, release automation, and production troubleshooting
Experience working with Databricks for scalable data processing and analytics
Familiarity with security practices in infrastructure environments, including IAM, OAuth, and Kerberos administration
Proven experience with cloud-native infrastructure across Azure, AWS, or GCP
Hands-on expertise with modern data platforms like Databricks
Deep understanding of data storage and processing technologies: Relational & NoSQL databases
Key-value stores
Spark compute engines
Distributed file systems (e.g., HDFS, ADLS Gen2)
Messaging systems (e.g., Event Hub, Kafka, RabbitMQ)
Capacity planning and incident management for large-scale big data systems
Solid collaboration history with Data Engineers, Data Scientists, ML Engineers, Networking, and Security teams
Familiarity with modern web stacks: TypeScript, Node.js, React, and optionally PHP
Exposure to agentic workflows, deep learning, or AI frameworks
Practical experience integrating LLMs (e.g., GPT-based models) into daily workflows—automating documentation, code generation, reviews, and operational intelligence
Solid grasp of prompt engineering techniques to design, optimize, and evaluate interactions with LLMs
Demonstrated ability to troubleshoot and resolve complex performance and scalability issues across infrastructure layers
Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning
Experience applying LLMs to DevOps workflows, enhancing incident response, and streamlining cross-functional collaboration is a solid advantage