This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Wells Fargo is seeking an experienced Site Reliability Engineer (SRE) with deep expertise in Apigee API Management and IBM DataPower
Job Responsibility
Lead complex, broad impact initiatives including provision of high level systems consultation for the technology teams
Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
Make decisions on technical changes and enhancements
Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
Lead daily support operations for Apigee OPDK, Apigee Hybrid,to ensure platform uptime, stability, and performance
Manage and maintain core Apigee components such as Routers, Message Processors, MART, Synchronizer, UDCA, Postgres, Zookeeper, Cassandra, and runtime infrastructure
Lead operational activities for IBM DataPower, including domain management, cryptographic objects, firmware upgrades, service configuration, and cluster maintenance
Own and resolve P1/P2 high-severity incidents with quick response and deep technical troubleshooting
Perform detailed Root Cause Analysis (RCA), document post-incident reports, and drive permanent corrective actions
Lead communication handling during major incidents and coordinate with cross-functional teams
Oversee ticket lifecycle management, SLA adherence, and escalation handling across support tiers
Act as the primary technical liaison between Support, Engineering, Cloud, Network, Security, and Architecture teams
Support API proxy deployments, shared flows, developer portal configurations, and runtime troubleshooting
Operate and support IBM DataPower Gateways, including configuration of Multi-Protocol Gateways (MPGW), WS-Proxies, Cryptographic profiles, XSLT, GatewayScript services
Troubleshoot runtime, policy, routing, and security issues on DataPower appliances
Expertise knowledge in Ansible, python, Unix scripting and contribute automating tasks like Monitoring & alerting, Deployment workflows, Health checks, Incident response, Config validation
Implement reliability improvements through Infrastructure-as-Code (IaC) using Terraform, Ansible, and GitOps
Develop automated recovery scripts and tools to reduce manual operational overhead
Establish and maintain observability using Splunk, ELK, Grafana, Prometheus, Dynatrace / AppDynamics
Build dashboards for SLIs/SLOs, latency, error analysis, backend performance, and capacity metrics
Improve proactive alerting to reduce mean time to detect (MTTD) and mean time to recover (MTTR)
Participate in design discussions, architectural reviews, API governance activities, and platform modernization initiatives
Work with CAB (Change Advisory Board) for change planning, approvals, and execution tracking
Contribute to runbooks, SOPs, architectural diagrams, and platform knowledge base assets.
Requirements
5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Experienced in monitoring, automation, and cloud platforms, with a proven track record of designing and supporting highly scalable, reliable, and secure API ecosystems.