Site Reliability Engineer Job at Microsoft Corporation (Bangalore)

Job Description

Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture. Within Azure Data, the databases team builds and maintains Microsoft's operational Database systems. We store and manage data in a structured way to enable multitude of applications across various industries. We are on a journey to enable developer friendly, mission-critical, AI enabled operational Databases across relational, non-relational and OSS offerings. Reinventing Big-Data Engine is happening NOW in Azure Data Explorer team (Kusto). The team started as a small incubation 10 years ago and has already made a big impact within Microsoft. Today, we are running a very large-scale cloud service (over 200k nodes), provide log analytics for hundreds of teams across all Microsoft divisions as well as external world customers. We are looking for strong and motivated SRE to help us continue driving the Azure Data Explorer / Synapse Real Time Analytics revolution and make it THE technology for log search and text analytics across all Microsoft as well as providing value to our external customers.

Job Responsibility

Customer Focus – bring unwavering customer focus and support to help our customers utilize, embedded and build deep solutions on top of Kusto tailored to their needs
Automation – deliver automation and tooling to improve our live site management and adhere to scale without scale methodology
Design - Evaluate and contribute to product, service design and architecture, help shape Site Reliability Engineering strategies, review specifications, design and improve upon core processes
Observability - Identify system problems and recommend monitoring solutions & automation to improve processing efficiency and stability
Provide engineering design across different workloads including incident & problem management, change management, security and compliance
Continuous integration/deployment - Implement/maintain and operate the build and release pipelines allowing our developers to safely code/test and deploy our products in very large scale
Community Building - Help us build and contribute to an exciting Azure Data explorer community

Requirements

Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, or systems engineering
OR equivalent experience
2+ years of scripting and programming experience, including any of the following: .NET, PowerShell, Python, C#
Ability to meet Microsoft, customer and/or government security screening requirements
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Ability to work as part an On-Call rotation is a must
Ability to contribute to multiple projects/demands simultaneously
Ability to work effectively with customers both internal and external to Microsoft is a must

Nice to have

Deep understanding of cloud services
Knowledge of Kubernetes concepts and implementation in Azure ( AKS ) and/or in other cloud provider platforms
Working knowledge of Database as well as Big Data systems
Understanding of BCDR
Understanding and working knowledge of CI\CD pipelines
2+ years of troubleshooting experience in wide cloud based systems
Out of the box, agile thinking to adapt to changing environment
Deep knowledge of system design & architecture, and running of complex, large scale online services
Working knowledge of Virtual Network and Private Endpoint concepts
Ability to monitor and takes action on telemetry data and performs analyses to identify patterns that reveal errors and unexpected problems that are affecting the system availability, reliability, performance, and/or efficiency, with minimal guidance

Microsoft Corporation - All Job Offers

Select Country

Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?