This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Wells Fargo is seeking a Lead Site Reliability Engineer (SRE) to join the WIMT Platform team. This role is responsible for driving the stability, resiliency, performance, and security of mission‑critical platforms that support Wells Fargo Advisors, First Clearing firms, and FINET practices. As a Lead SRE, you will provide hands‑on technical leadership across incident management, automation, observability, and reliability engineering, with a strong focus on proactive risk mitigation and continuous improvement. You will help define and enforce reliability standards while partnering closely with Application Development, Product, Business, and Enterprise teams to ensure operational excellence throughout the full-service lifecycle. This role is ideal for a highly motivated engineer with deep experience operating large‑scale, production systems who takes ownership, values accountability, and is passionate about building resilient, enterprise‑grade platforms. Learn more about career areas and business divisions at https://www.wellsfargojobs.com.
Job Responsibility:
Design and implement scalability, reliability, and observability strategies for cloud and on-premise environments
Define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and Error Budgets to improve system reliability
Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
Maintain knowledge of industry best practices and new technologies and recommend innovations that enhance operations or provide a competitive advantage to the organization
Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
Drive adoption of NFRs, best practices-quality and compliance across observability and performance engineering
Ensure high availability and performance of production systems through proactive monitoring and incident response
Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
Lead projects, teams, or serve as a peer mentor
Requirements:
5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience leading observability and monitoring tooling - Splunk, AppDynamics, Splunk Observability, Grafana, Open Telemetry
5+ years in infrastructure (windows and Linux) support
5+ years proven success in toil reduction initiatives
5+ years in cloud application management especially OpenShift Container Platform
Nice to have:
5+ Years’ experience in SRE, public & private cloud technologies, Java performance tuning, capacity optimization for mission critical applications
Working knowledge of multiple programming languages (e.g., Java, JavaScript, Ruby, Python, JSON, Angular, NodeJS)
Hands-on experience with cloud and platform technologies such as AWS, PCF, PKS, Kubernetes, OpenShift, Linux, Azure, Windows, and VMware
Strong verbal, written, and interpersonal communication skills for effective collaboration across teams
Ability to engage with and influence stakeholders at various organizational levels