Ai Ops Principal Engineer Job at Wells Fargo (Charlotte / Iselin)

Job Description

Wells Fargo is seeking a Principal Engineer – AIOps to join Platform Strategy & Transformation as part of Commercial & Corporate and Investment Management Technology (CCIBT) group. Learn more about the career areas and business divisions at wellsfargojobs.com. This role sits at the core of CCIBT's Zero Touch Production (ZTP) transformation agenda, driving the strategy, architecture, and execution of next‑generation AIOps capabilities across the enterprise. You will define and deliver intelligent, autonomous operations by leveraging AI/ML, observability, automation, and event-driven architectures to minimize manual intervention, improve resilience, and enable self-healing systems. You will partner closely with senior engineering, platform, SRE, and business leaders to accelerate AIOps adoption, embed intelligence into production ecosystems, and deliver measurable improvements in availability, efficiency, and operational risk reduction. This is a hands-on senior developer role requiring strong development skills and ability to work with advanced automations using technologies like Robotic Process Automation (RPA), Artificial Intelligence, Low-code technologies like UiPath, Microsoft Power Platforms, Google ADK, LangChain, LangGraph, Alteryx etc.

Job Responsibility

Lead the strategy, design, and execution of AIOps platforms and capabilities to enable Zero Touch Production across CCIBT
Define and drive enterprise-wide AIOps roadmap, including observability, event correlation, anomaly detection, predictive insights, and automated remediation
Architect and implement self-healing systems leveraging AI/ML, event-driven automation, and closed-loop workflows
Drive adoption of intelligent incident management, root cause analysis (RCA), noise reduction, and auto-resolution techniques
Establish target-state architecture and engineering standards for AIOps platforms, tooling, and integrations
Influence enterprise technology strategy by evaluating emerging AIOps trends, tools, and frameworks
Partner with SRE, infrastructure, cloud, and application teams to embed AIOps into SDLC, CI/CD, and production operations
Lead large-scale engineering initiatives with cross-functional and enterprise impact
Provide thought leadership on resilience engineering, reliability, automation, and production excellence
Mentor and guide senior engineers and teams on AIOps best practices, architecture, and implementation
Collaborate with risk, compliance, and governance teams to ensure secure, compliant, and auditable automation

Requirements

7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience in AIOps, SRE, production engineering, or large-scale distributed systems operations
4+ years of experience with Python, programming, or scripting languages
2+ years of experience working with Generative AI, large language models (LLM), or foundation models

Nice to have

2+ Agentic AI and Agent building experience
Experience with AI-powered development or GitHub Copilot
Proven experience designing and implementing observability, monitoring, and automation platforms at scale
Deep expertise in AIOps platforms and tools (e.g., Prometheus, AppDynamics, Splunk, ITRS Geneos, BigPanda, OpenTelemetry ecosystems)
Strong experience with AI/ML for IT operations, including anomaly detection, event correlation, forecasting, and intelligent alerting
Hands-on experience with automation frameworks (e.g., Ansible, Terraform, or similar) and event-driven architectures
Strong understanding of SRE principles, SLIs/SLOs, error budgets, and reliability engineering practices
Experience building self-healing systems and closed-loop remediation workflows
Proficiency in cloud platforms and cloud-native architectures (Kubernetes, microservices)
Knowledge of data pipelines, streaming platforms (Kafka), and telemetry ingestion/processing
Familiarity with GenAI/LLM-assisted operations, including incident summarization, knowledge mining, and automated runbook generation
Ability to operate across complex organizational structures with strong stakeholder management and communication skills
Proven ability to define target-state architecture, operating models, and actionable roadmaps
Ability to manage multiple high-complexity engineering initiatives with significant enterprise impact
Strong analytical, problem-solving, and architectural design skills
Excellent communication and documentation skills (e.g., Confluence, Git, architecture diagrams)
Comfortable driving transformation and influencing senior leadership in a fast-paced, evolving environment

What we offer

Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance
Parental leave
Critical caregiving leave
Discounts and savings
Commuter benefits
Tuition reimbursement
Scholarships for dependent children
Adoption reimbursement

Wells Fargo - All Job Offers

Select Country

Ai Ops Principal Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Ai Ops Principal Engineer

Principal AI Ops Architect

Principal Engineer, Computer Vision & AI /3D Data (Team Lead)

Principal Engineer, Computer Vision & AI /3D Data

Principal AI Demand Planner

Full Stack AI Engineer

Principal Engineer

Principal Engineer, Model Dev Platform

Principal Software Engineer

Our AI answers in your language