CrawlJobs Logo

Site Reliability Engineer

Mexico, Guadalajara Employment contract · Job Posted June 15, 2026
Apply Position
Job Link Share

Job Description

We are currently seeking a Site Reliability Engineer to join our team in Guadalajara, Jalisco (MX-JAL), Mexico (MX). SRE – Site Reliability Engineer We are currently seeking a Site Reliability Engineer to join our team in GDL, Jalisco (MX-JAL), Mexico (MX). Perform L1.5 activities such as monitoring, deployment, rollback. Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage. Troubleshoot Azure resources, escalate to Level 3 (Software Development Team). Understand the Microsoft Azure Cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree. Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), CosmosDB, SQL Server, IoT Hub, Databricks, KeyVault, Datalake. Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting. Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, Github, Ansible. Understand the concept of container orchestration platforms (e.g. Kubernetes). Understand the concept of scripts: Powershell, Python. Understand the difference between NoSQL and SQL databases, and how to maintain them. Understand monitoring and logging systems (LogAnalytics, Splunk, ELK, Prometheus, Nagios, Zabbix, etc.). Independent thinker - why does it break, what can I proactively do to fix it. Please note this is a 24/7 operations IT support team, and if is often necessary to rotate shifts, the rotation can be every 1 month or 2, so please do not assume you will only work the standard Monday thru Friday day shift.

Job Responsibility

  • Perform L1.5 activities such as monitoring, deployment, rollback
  • Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
  • Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)

Requirements

  • Perform L1.5 activities such as monitoring, deployment, rollback
  • Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
  • Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)
  • Understand the Microsoft Azure Cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree
  • Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), CosmosDB, SQL Server, IoT Hub, Databricks, KeyVault, Datalake
  • Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting
  • Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, Github, Ansible
  • Understand the concept of container orchestration platforms (e.g. Kubernetes)
  • Understand the concept of scripts: Powershell, Python
  • Understand the difference between NoSQL and SQL databases, and how to maintain them
  • Understand monitoring and logging systems (LogAnalytics, Splunk, ELK, Prometheus, Nagios, Zabbix, etc.)
  • Independent thinker - why does it break, what can I proactively do to fix it
  • Please note this is a 24/7 operations IT support team, and if is often necessary to rotate shifts, the rotation can be every 1 month or 2, so please do not assume you will only work the standard Monday thru Friday day shift

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer

8 matching positions

New

Site Reliability Engineer

Location
Location
South Africa , Johannesburg
Salary
Salary:
Not provided
nintex.com Logo
Nintex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You provide guidance on infrastructure architecture and contribute to high-quality and successful product releases.
  • You contribute to your team and domain through successfully leading and consistently delivering on projects of ambiguous scope, high complexity, and critical business impact.
  • You contribute to relevant guilds, practice forums and other initiatives to improve Nintex’s DevOps and SRE discipline.
  • You have an in-depth understanding of distributed systems architecture, as well as monitoring and observability practices and tools.
  • You quickly resolve priority infrastructure issues and help other technical team members or Product Managers understand how to avoid them in the future.
  • You provide detailed estimates for work items you propose or assigned.
  • You assist in decision-making around tooling, automation practices, and testing solutions.
  • You stay up-to-date with technology trends and use this knowledge help your team and the broader Engineering practice.
  • You run Nintex infrastructure with IaC tools (as Terraform) and GitHub Actions for automation, containerize our environments (Kubernetes) and leverage cloud technologies to meet our goals
  • You build monitoring that alerts on symptoms rather than outages using tools like Prometheus, Grafana, Alertmanager and PagerDuty
Job Responsibility
Job Responsibility
  • You are highly skilled and sufficiently experienced in Nintex DevOps tools and processes to own a long-term program or technology such as Kubernetes, etc.
  • You write scripts, tools and utilities that support and integrate with delivery pipelines and you integrate telemetry where appropriate.
  • You are called into incidents and bring trusted knowledge in your platform domain.
  • You debug and fix infrastructure issues on production environments quickly using the relevant tools and guidelines to prevent recurrence.
  • You build, promote and support infrastructure patterns and practices within Nintex.
  • You provide coaching/mentoring to other Engineers on the team
  • You lead or contribute to post-mortems for incidents, including root cause analysis and identification of preventative and remedial actions.
  • You continuously monitor our platform performance and take immediate action to improve it
  • You review and advise on appropriate design patterns to solve automation and infrastructure problems without creating technical debt.
  • You design and build complex infrastructure components for distributed systems as Kubernetes.
What we offer
What we offer
  • Global Gratitude and Recharge Days
  • Flexible, paid time off policy
  • Employee wellness programs and counseling resources
  • Meaningful peer recognition and awards
  • Paid parental leave
  • Invention/patenting assistance
  • Community impact, paid volunteer time, and opportunities
  • Intercultural learning and celebration
  • Multiple tools through which to learn and grow, and an incredible global community
Read More
Arrow Right

Site Reliability Engineer

An Elite FinTech Firm is looking for a highly talented DevOps Engineer/Systems S...
Location
Location
Hong Kong , Hong Kong
Salary
Salary:
1200000.00 HKD / Year
hunterbond.com Logo
Hunter Bond
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Genuine passion in Linux & Open-source
  • Excellent knowledge of Python
  • Use of CI/CD, Docker, Ansible, Chef, Puppet
  • Knowledge of large-scale storage systems (on-prem)
Job Responsibility
Job Responsibility
  • Help architect a resilient, multi-petabyte storage solutions & build new data centres
  • Automate anything and everything with Python & config tools
  • Innovate whilst bringing in new ideas
What we offer
What we offer
  • Flexible hours/work options
  • Working in one of the world’s most elite teams
  • Invest heavily in cutting-edge and next-gen tech
  • Technologists only report to other technologists
  • Brand new skyline Manhattan office
  • Start-up style environment
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a Staff Software Engineer, you will play a key role in designing, building, a...
Location
Location
United States , San Jose
Salary
Salary:
120500.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years of hands-on experience in Infra Ops, Dev Ops, or Site Reliability Engineering (SRE)
  • Proficiency with Linux systems, especially Debian-based distributions
  • Strong experience with cloud platforms such as AWS and GCP
  • Expertise in Infrastructure as Code tools like Terraform, Packer, and Ansible
  • Solid programming skills in Python and/or Golang
  • Deep understanding of containerization (Docker, Container) and orchestration tools (AWS EKS, GCP GKE)
  • Experience with GitOps workflows
  • Proven track record in implementing and maintaining CI/CD pipelines
  • Strong background in security and familiarity with security programs
  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK)
Job Responsibility
Job Responsibility
  • Enhance Infrastructure as Code (IAC) and enforce best practices
  • Optimize cloud infrastructure for scalability, security, and cost-effectiveness
  • Develop internal tools to support and streamline cloud platform operations
  • Improve CI/CD pipelines and deployment workflows using FluxCD and Jenkins
  • Address container image vulnerabilities and standardize remediation processes
  • Build Amazon Machine Images (AMIs) aligned with CIS and STIG benchmarks
  • Strengthen monitoring, alerting, and observability using Prometheus, Grafana, and logging tools
  • Troubleshoot complex production issues to ensure system reliability and customer satisfaction
  • Fine-tune distributed systems such as Apache Kafka and Cassandra
  • Collaborate with development, security, and operations teams to align infrastructure with application needs
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

An Elite FinTech Firm is looking for a highly talented DevOps Engineer/Systems S...
Location
Location
United Kingdom , London
Salary
Salary:
150000.00 GBP / Year
hunterbond.com Logo
Hunter Bond
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Genuine passion in Linux & Open-source
  • Excellent knowledge of Python
  • Use of CI/CD, Docker, Ansible, Chef, Puppet
  • Knowledge of large-scale storage systems (on-prem)
Job Responsibility
Job Responsibility
  • Help architect a resilient, multi-petabyte storage solutions & build new data centres
  • Automate anything and everything with Python & config tools
  • Innovate whilst bringing in new ideas
What we offer
What we offer
  • Flexible hours/work options
  • Working in one of the world’s most elite teams
  • Invest heavily in cutting-edge and next-gen tech
  • Technologists only report to other technologists
  • Brand new skyline Manhattan office
  • Start-up style environment
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Must pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own the end-to-end readiness of Event Stream across Azure regions, including onboarding new regions, driving deployment automation, and ensuring consistent, secure, and compliant service rollout
  • Work closely with platform, infrastructure, and partner teams (e.g., Event Hubs, Kusto, Fabric platform) to deliver resilient, low-latency streaming experiences on a global scale
  • Play a key role in advancing our reliability posture, improving availability, monitoring, and incident response across regions
  • Build strong observability, telemetry, and automated recovery mechanisms to meet high availability and SLA targets
  • Region Build-out & Deployment: Onboard new regions, drive deployment automation, and ensure consistent service configuration
  • Reliability & SRE: Improve availability, resiliency, and incident response
  • own service health across regions
  • Observability & Operations: Enhance telemetry, monitoring, alerting, and troubleshooting capabilities
  • Cross-team Collaboration: Partner with platform and infra teams to unblock dependencies and ensure smooth rollout
  • Production Excellence: Drive root-cause analysis, repair items, and continuous improvement on service reliability
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

Location
Location
United Kingdom , Newcastle
Salary
Salary:
Not provided
trimble.com Logo
Trimble Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Engineering or a related field
  • At least 5 years of technical experience with a proven ability to take full ownership of production infrastructure
  • Excellent collaboration skills with leading cross-functional work
  • Demonstrated success in managing infrastructure in production environments
  • Expertise in capacity planning and cost optimisation for efficient operations
  • Extensive expertise managing cloud provider hosted infrastructure, specifically with Microsoft Azure or AWS
  • Proficient in high-level scripting languages like Python and Infrastructure as Code tools (Terraform), along with containerisation
  • Demonstrated success with Kubernetes or other containerization technologies
  • Familiarity with CI/CD pipelines and tools such as Azure DevOps, Jenkins, Argo CD, Helm, GitHub
  • Experience with monitoring tools and incident management processes like Prometheus, Grafana, New Relic, DataDog, Splunk, Cloudwatch, Sumologic etc
Job Responsibility
Job Responsibility
  • Develop and maintain scalable infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
  • Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
  • Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
  • Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
  • Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
  • Lead incident response efforts and conduct deep-dive root cause analysis to implement long-term, innovative technical solutions
  • Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
  • Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
  • Participate in on-call rotations and handle critical incidents with confidence and expertise
  • Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team
Read More
Arrow Right
New

Site Reliability Engineer

Shape the Future of Intelligent Operations as a Site Reliability Engineer (AI Op...
Location
Location
India , Chennai
Salary
Salary:
Not provided
trimble.com Logo
Trimble Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1 to 2 years of professional experience in a DevOps, MLOps, or systems engineering environment
  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a closely related technical field
  • Direct experience with Microsoft Azure cloud platforms and its specialized ecosystem services (such as Azure ML and Azure DevOps)
  • Proficiency with Python or other scripting languages (Shell / Bash / PowerShell) for rapid system integration and task automation
  • Foundational understanding of containerization (Docker), basic orchestration concepts (Kubernetes fundamentals), and version control system workflows (Git)
  • Solid baseline knowledge of fundamental DevOps principles (CI/CD, system administration) and a basic understanding of the end-to-end machine learning model lifecycle
Job Responsibility
Job Responsibility
  • Assist in the deployment and maintenance of machine learning models in production under direct supervision, building skills in containerization and orchestration architectures
  • Support the development of robust continuous integration and deployment pipelines for ML workflows, including model versioning, automated testing, and release processes
  • Monitor production ML model performance, detect data drift, and track system health by implementing foundational logging, alerting, and metrics solutions
  • Contribute to infrastructure automation and configuration management for machine learning workloads, learning to treat infrastructure as software
  • Partner closely with ML engineers and data scientists to operationalize complex models, ensuring reliability, scale, and strict adherence to established operational patterns
What we offer
What we offer
  • Structured environment to accelerate technical skills
  • Direct guidance from experienced engineering professionals
  • Projects that improve productivity, quality, safety, transparency and sustainability
  • Collaborative and supportive team
  • Entrepreneurial spirit empowering proactive doers
  • Flexible work arrangements
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are currently seeking a Site Reliability Engineer to join our team in Westlak...
Location
Location
United States , Westlake
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related disciplines (understanding reliability engineering principles, SLIs, SLOs, error budgets, and operational excellence)
  • 5+ years’ hands-on Terraform experience
  • 5+ years’ experience supporting mission-critical enterprise applications in production environments
  • 5+ years’ experience with cloud networking, security, and infrastructure architecture
  • 5+ years of hands-on experience managing hybrid cloud environments
  • 5 + years of automation skills using Python, Ansible, Shell scripting, or similar technologies
  • 5+ years’ experience building reusable infrastructure modules and automated deployment frameworks
Job Responsibility
Job Responsibility
  • Design, implement, and support highly available load balancing solutions using F5 BIG-IP, Broadcom AVI, and cloud-native load balancing services
  • Build and maintain Infrastructure-as-Code (IaC) solutions using Terraform
  • Develop automation solutions for infrastructure provisioning, configuration management, and operational workflows
  • Support and enhance CI/CD pipelines using tools such as Jenkins, Azure DevOps, GitHub Actions, or similar platforms
  • Collaborate with application, cloud, network, and platform teams to improve reliability, performance, and scalability
  • Monitor production systems and proactively identify reliability, performance, and availability risks
  • Implement Site Reliability Engineering best practices including observability, incident management, capacity planning, and resiliency engineering
  • Troubleshoot complex issues across networking, cloud infrastructure, load balancing, and application environments
  • Support hybrid infrastructure environments spanning on-premises datacenters and public cloud platforms
  • Participate in on-call rotation and provide production support for critical business applications
  • Fulltime
Read More
Arrow Right