Lead Systems Operations Engineer, Wells Fargo

Wells Fargo

Location:
United States, West Des Moines ▼
Chandler
Charlotte
Irving
Minneapolis

Category:
IT - Administration

Contract Type:
Employment contract

Salary:

119000.00 - 206000.00 USD / Year

Save Job

Apply Position

Job Description:

Wells Fargo is seeking a highly skilled and forward-thinking Lead Systems Operations Engineer to join our API SRE & Operations team within CTO Platform Services team. This role is ideal for someone passionate about building scalable, resilient, and intelligent infrastructure solutions. You will play a key role in driving automation, reducing operational toil, and enabling self-service capabilities through cutting-edge technologies including Generative AI and Agent development.

Job Responsibility:

Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams
Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
Make decisions on technical changes and enhancements
Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
Production support activities: Incident Management: Triage incidents, engage partner teams, provide status updates, facilitate business user communication
Problem Management: Ticket management for daily tasks and efforts that are brought to support attention, Root cause analysis
Batch Management: Facilitate batch job creation, implementation, and change, Update batch schedules, Batch job documentation
Change Management: Identify forward schedule of change to applications and environments, Review post change implementation success / failures and create actions plans to remediate if required
Monitoring: Implementation of Alerts and Configuration - Customize alerting tools based on application specific thresholds, Enable business transaction monitoring
BCP Support: Documentation and coordination efforts to secure application resiliency prior to BCP event, Test execution during scheduled BCP events
Capacity Management: Support capacity planning initiatives and provide application information to capacity planning teams
Audit and Compliance support: Participate in audit activities and provide data to auditors on production environment variables
Automation: Configure dashboards and develop scripts to automate day to day tasks from platform perspective
On-call: Provide support during deployments and carry pager to support after hours

Requirements:

5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
4+ years of Proficiency in leveraging observability platforms such as BigPanda, ThousandEyes, Grafana, Prometheus, ELK, Splunk Observability, and AppDynamics to enhance service reliability and performance monitoring
4+ years of experience in IT Service Management (ITSM), with a strong background in incident, problem, and change management processes
3+ years of experience working with Red Hat Enterprise Linux and Kubernetes, with a strong focus on Red Hat OpenShift Container Platform (OCP)
3+ years of experience with Site Reliability Engineering and supporting production grade
3+ years of experience with solid understanding of Apigee or similar API Management platforms
3+ years of experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure and familiarity with Kubernetes
3+ years of experience with Automation & Scripting: Expertise in Ansible Tower, including developing and maintaining playbooks

Nice to have:

Strong experience working in Agile methodologies / Scrum environments
Experience in project management and stakeholder engagement
Proven experience in leading cross-functional teams
Strong problem-solving and decision-making abilities
Excellent communication and collaboration skills

What we offer:

Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance
Parental leave
Critical caregiving leave
Discounts and savings
Commuter benefits
Tuition reimbursement
Scholarships for dependent children
Adoption reimbursement

Additional Information:

Job Posted:
October 05, 2025

Expiration:
October 13, 2025

Employment Type:

Fulltime

Work Type:

Hybrid work

View All Jobs In This Company

Job Link Share:

Lead Systems Operations Engineer