Lead Site Reliability Engineer Job at Capital One (Mexico City)

Job Description

We're building a Site Reliability Engineering center in Mexico City, and we're hiring a Manager-level Backend Engineer to own the reliability and operational maturity of our settlement platforms. These are batch-critical systems that process every credit and debit transaction across the network. This is a foundational role. You'll be one of the first engineers in CDMX responsible for ensuring settlement cycles complete accurately, on time, and in compliance with SOX and PCI-DSS requirements. You'll work across hybrid infrastructure (on-prem data centers and AWS), partner closely with UK-based engineers, and build the automation and observability that allows Mexico City to operate settlement.

Job Responsibility

Own reliability for batch settlement systems - ensure cycle completion windows are met, data integrity is maintained, and failures are detected before they reach downstream consumers
Build and improve observability for settlement pipelines - dashboards, alerts, and anomaly detection that make system health legible and reduce reliance on tribal knowledge
Drive automation of operational toil - certificate rotation, environment provisioning, compliance artifact generation, and manual validation steps that currently require human intervention
Partner with UK-based settlement engineers - acquire domain expertise on Durbin compliance windows, cross-border DCI routing, and acquirer/issuer SLA adherence
Participate in incident management - respond to settlement failures, drive root cause analysis, and implement durable fixes that prevent recurrence
Contribute to regulatory readiness - ensure SRE practices produce audit-ready artifacts for SOX and PCI-DSS exams without manual toil

Requirements

Professional English fluency
Bachelor's degree
At least 6 years of experience in SRE, production operations, or reliability engineering
Experience in DevOps Engineering (internship experience does not apply)
5+ years of experience in at least one of the following: Java, Python, Go
At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
3+ years of experience with container orchestration services including Docker or Kubernetes
Experience with Shell or Bash scripting
At least 3 years of Unix or Linux system administration experience

Nice to have

Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
Troubleshooting and debugging skills across distributed systems
Familiarity with payments, financial services, or other regulated high-availability domains
Knowledge or experience of Networking concepts (TCP/DNS/TLS)

What we offer

Healthy Body, Healthy Mind
Save Money, Make Money
Time, Family and Advice

Capital One - All Job Offers

Select Country

Lead Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?