This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Site Reliability Engineer (SRE) to support a greenfield initiative within the Trade Compliance and Innovation team. This role will serve as the primary SRE for one squad within a geographically distributed team and will support a second squad as needed. The SRE will play a key role in enabling scalable, secure, and highly reliable infrastructure from development through production while partnering closely with development and QA teams. This is a hands‑on role requiring strong DevOps and cloud infrastructure expertise, a software‑engineering mindset, and the ability to independently research, design, and implement solutions that improve system reliability and operational efficiency.
Job Responsibility:
Apply software engineering practices to IT operations to maintain scalable, secure, and highly available production environments
Act as a bridge between development and operations by applying engineering rigor to system administration and infrastructure management
Design, build, and support infrastructure across DEV, Test, and Production environments
Develop and maintain automation using code to analyze logs, monitor systems, test environments, and respond to incidents
Implement and manage Infrastructure as Code (IaC) using Terraform following organizational best practices
Support deployments of Java and Python‑based microservices, containerized workloads, and related cloud services
Implement and manage blue‑green deployments, scaling strategies (horizontal and vertical), resiliency, and security postures
Support Azure Container Apps (ACA) and Kubernetes platforms (AKS)
Work with messaging systems, webhooks, Azure Functions, and distributed integrations
Support monitoring, logging, and observability using enterprise tools (e.g., ELK, Grafana)
Partner closely with global Dev, QA, and SRE team members to resolve infrastructure and reliability issues
Research, learn, and apply new technologies and solutions as required
Requirements:
3–5+ years of experience in a Site Reliability Engineering, DevOps, or infrastructure‑focused role
Bachelor’s degree in Computer Science, Computer Engineering, Information Technology, or a related field
Strong hands‑on experience with Azure cloud infrastructure
Proven expertise with Infrastructure as Code (Terraform)
Strong DevOps/SRE skillset with the ability to work independently and collaborate with other SREs and Dev teams
Experience supporting Java and Python microservices in cloud environments
Experience with CI/CD pipelines, specifically GitHub Actions/Pipelines
Strong understanding of NFRs, including performance, scalability, resiliency, and security
Proficiency in one or more programming/scripting languages such as Python, Go, Java, .NET, or Node.js
Experience with infrastructure monitoring and logging platforms
Strong problem‑solving, research, and multitasking capabilities
Clear communication skills to explain technical problems and solutions to diverse stakeholders
Ability to support infrastructure needs across multiple time zones with early CST availability
Nice to have:
Experience with Azure Kubernetes Service (AKS) and Azure Container Apps (ACA)
Experience with messaging systems, event‑driven architectures, and webhooks
Hands‑on experience with Azure Functions
Experience deploying and managing Azure OpenAI services
Familiarity with ELK stack, Grafana, or similar observability tools
Azure certifications
Experience with MLOps, LLMOps, or AI‑focused infrastructure
Prior experience supporting globally distributed teams