This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
United States, Charlotte Employment contract 119000.00 - 224000.00 USD / Year · Job Posted May 27, 2026
Job offer has expired
Job Link Share
Job Responsibility
Drive and lead Site Reliability Engineering capabilities at Wells Fargo Banking Operations igniting the practice, principles, and culture, leading by example. Mentor and coach engineers while scaling the SRE practice within Banking Operations and partnering with peer platform embedded SRE teams
Leverage enterprise capabilities, tools, and innovation to improve availability in a complex ecosystem by maturing observability practices including monitoring, logging, distributed tracing, synthetic monitoring, and chaos engineering with a focus on actionable insights and proactive detection
Lead the evolution of our environment introducing self-healing and autonomic capabilities, solving complex operational and systemic issues with precision including building and training models, automating cognitive processes, and leveraging telemetry to improve availability and reliability of products we provide to customers
Own and automate key SRE metrics and IT Service Operations processes including customer impact, golden signals and critical user journeys, % availability of critical business flows, SLO/SLI definition and adherence, error budget management, and real-time observability dashboards
automate incident response processes through data integration with unified communications and alerting/notification systems
Provide leadership in support responsibilities for critical applications and customer journeys onboarded to SRE including rapid remediation of issues through Agile practices, conducting blameless post mortems, driving root cause analysis, and implementing durable solutions through continuous improvement with the goal of eliminating repeat incidents
Requirements
5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience using Observability Tools with hands-on implementation of monitoring, logging, or tracing solutions utilizing Grafana, ThousandEyes, Prometheus, AppDynamics, or Splunk
3+ years of application production support experience in complex, high-availability environments
2+ years of experience with Confluence or Jira
Nice to have
Experienced with Site Reliability Engineering (SRE) including SLO/SLI frameworks, error budgets, toil reduction, and production reliability engineering practices
Experience with database logging and monitoring concepts experience
Experience with Application performance monitoring and optimization using BlazeMeter, JMeter, Splunk, AppDynamics, or similar observability platforms
Experience with scripting or programming languages such as Bash, PowerShell, Python, Shell, VBScript, or JavaScript for automation and reliability engineering use cases
Experience and understanding of AIOps and related tools such as MoogSoft or Big Panda, including event correlation and noise reduction
Experience with one or more automation tools such as Ansible or similar infrastructure-as-code/configuration management tools
Experience with Container technologies: Kubernetes, Docker, PKS, with focus on observability and reliability patterns in distributed systems
What we offer
Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance