This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Microsoft Cloud Operations + Innovation (CO+I) is the engine that powers Microsoft's cloud services and our team is focused on delivering high quality infrastructure to support cloud operations. As Microsoft’s Cloud business continues to mature, our infrastructure expansion accelerates, and Data Centers are central to this growth. To support this, the acquisition and development of our owned, designed, and constructed Data Center facilities will scale to meet the demands of our customers, while we also continue to lease and acquire Data Center capacity at pace, particularly in our high growth markets, working closely with Data Center operators in each region, and across the Globe. We are seeking a skilled Data Engineer to join our COI+I Lease and Land Development Solution Delivery team. The ideal candidate will have a robust background in data management, algorithm design, and analytical skills to drive the development and optimization of our systems and tools. This role will involve transforming raw data into valuable insights, designing efficient data architectures, and ensuring compliance with data governance standards.
Job Responsibility:
Apply modification techniques to transform raw data into compatible formats for downstream systems
Utilize software and computing tools to ensure data quality and completeness
Implement code to extract and validate raw data from upstream sources, ensuring accuracy and reliability
Writes efficient, readable, extensible code from scratch that spans multiple features/solutions
Develops technical expertise in proper modeling, coding, and/or debugging techniques such as locating, isolating, and resolving errors and/or defects
Leverages technical proficiency of big-data software engineering concepts, such as Hadoop Ecosystem, Apache Spark, continuous integration and continuous delivery (CI/CD), Docker, Delta Lake, MLflow, AML, and representational state transfer (REST) application programming interface (API) consumption/development
Acquires data necessary for successful completion of the project plan
Proactively detects changes and communicates to senior leaders
Develops usable data sets for modeling purposes
Contributes to ethics and privacy policies related to collecting and preparing data by providing updates and suggestions around internal best practices
Contributes to data integrity/cleanliness conversations with customers
Adhere to data modeling and handling procedures to maintain compliance with laws and policies
Document data type, classifications, and lineage to ensure traceability and govern data accessibility
Perform root cause analysis to identify and resolve anomalies
Implement performance monitoring protocols and build visualizations to monitor data quality and pipeline health
Support and monitor data platforms to ensure optimal performance and compliance with service level agreements
Knowledge and implementation of an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
Leverages knowledge of machine learning solutions (e.g., classification, regression, clustering, forecasting, NLP, image recognition, etc.) and individual algorithms (e.g., linear and logistic regression, k-means, gradient boosting, autoregressive integrated moving average [ARIMA], recurrent neutral networks [RNN], long short-term memory [LSTM] networks) to identify the best approach to complete objectives
Understands modeling techniques (e.g., dimensionality reduction, cross validation, regularization, encoding, assembling, activation functions) and selects the correct approach to prepare data, train and optimize the model, and evaluate the output for statistical and business significance
Understands the risks of data leakage, the bias/variance tradeoff, methodological limitations, etc
Writes all necessary scripts in the appropriate language: T-SQL, U-SQL, KQL, Python, R, etc
Constructs hypotheses, designs controlled experiments, analyzes results using statistical tests, and communicates findings to business stakeholders
Effectively communicates with diverse audiences on data quality issues and initiatives
Understands operational considerations of model deployment, such as performance, scalability, monitoring, maintenance, integration into engineering production system, stability
Develops operational models that run at scale through partnership with data engineering teams
Coaches less experienced engineers on data analysis and modeling best practices
Develops a strong understanding of the Microsoft toolset in artificial intelligence (AI) and machine learning (ML) (e.g., Azure Machine Learning, Azure Cognitive Services, Azure Databricks)
Design and Implement Dashboards: Develop user-friendly dashboards for various applications, such as Supplier Spend Analytics, Supplier Scorecards, Incident and Service Level Agreement (SLA) Compliance Monitoring, Spares and Inventory Management, and other business-facing applications
Requirements:
Bachelor’s degree in computer science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years’ experience in business analytics, data science, data modeling, or data engineering work
OR master’s degree in computer science, Math, Software Engineering, Computer Engineering, or related field and 3+ years’ experience in business analytics, data science, data modeling, or data engineering work
Ability to meet Microsoft, customer and/or government security screening requirements
8+ years of experience in data engineering with coding and debugging skills in C#, Python, and/or SQL
Deploying solutions in Azure Services & Managing Azure Subscriptions
Understanding and knowledge about big data and writing queries with Kusto/KQL
Understanding and knowledge about extracting data via REST APIs
Strong analytical skills with a systematic and structured approach to software design
5+ years of experience in data science, analytics, or machine learning
4+ years of experience in developing solutions with Microsoft Power Platform, including Power BI, Fabric, Power Automate & M365 Dataverse
3+ years of experience in building Data Pipelines using Azure Data Factory
1+ year of experience in developing solutions in Azure Fabric
4+ Years of experience in writing SQL Queries
Experience with data cloud computing technologies such as – Azure Synapse, Azure Data Factory, SQL, Azure Data Explorer
5+ years of experience in Microsoft/Azure Data Stack including , ETL, Data Pipeline development with SQL, Fabric