Job Description
We are currently seeking a Site Reliability Engineer to join our team in Guadalajara, Jalisco (MX-JAL), Mexico (MX). SRE – Site Reliability Engineer We are currently seeking a Site Reliability Engineer to join our team in GDL, Jalisco (MX-JAL), Mexico (MX). Perform L1.5 activities such as monitoring, deployment, rollback. Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage. Troubleshoot Azure resources, escalate to Level 3 (Software Development Team). Understand the Microsoft Azure Cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree. Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), CosmosDB, SQL Server, IoT Hub, Databricks, KeyVault, Datalake. Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting. Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, Github, Ansible. Understand the concept of container orchestration platforms (e.g. Kubernetes). Understand the concept of scripts: Powershell, Python. Understand the difference between NoSQL and SQL databases, and how to maintain them. Understand monitoring and logging systems (LogAnalytics, Splunk, ELK, Prometheus, Nagios, Zabbix, etc.). Independent thinker - why does it break, what can I proactively do to fix it. Please note this is a 24/7 operations IT support team, and if is often necessary to rotate shifts, the rotation can be every 1 month or 2, so please do not assume you will only work the standard Monday thru Friday day shift.