CrawlJobs Logo

Principal Engineer - Observability Events Management

https://www.wellsfargo.com/ Logo

Wells Fargo

Location Icon

Location:
United States , Iselin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

159000.00 - 305000.00 USD / Year
Save Job
Save Icon
Job offer has expired

Job Description:

Wells Fargo is seeking a Principal Engineer in Technology as part of our CTO organization. Learn more about the career areas and lines of business at wellsfargojobs.com. Wells Fargo is seeking a Principal Engineer to play a key role in driving technology‑led innovation at enterprise scale. This role will help transform Telemetry, Monitoring, and Observability across on‑premises platforms as well as private and public cloud environments. Our vision is to deliver best‑in‑class observability capabilities that provide consumer‑grade insights at a competitive cost by leveraging open‑source technologies and open standards. Through this modernization, we aim to reduce the cost of failure, improve platform availability, and optimize software engineering capacity across the enterprise.

Job Responsibility:

  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership

Requirements:

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 3+ years of experience with distributed streaming and event driven architectures, using technologies such as Kafka
  • 3+ years of experience in database design and data architecture, including relational and/or NoSQL systems, data modeling, and high throughput data access patterns
  • 7+ years of programming experience in Java, C# and/or Python, building scalable, production grade backend systems and APIs

Nice to have:

  • AIOps expertise spanning alert ingestion, correlation, enrichment, and AI‑driven incident analysis within large‑scale enterprise observability platforms
  • Experience in AI/ML technologies, including Large Language Models (LLMs), prompt engineering, model integration, or applied machine learning in production environments
  • Experience designing and integrating APIs and microservices, including REST/gRPC services and service orchestration patterns
  • Knowledge of API security, OAuth, authentication/authorization platforms, and reverse proxy or gateway patterns
  • Experience working with cloud native platforms like OCP, including containerization, distributed systems, and scalable deployment models
  • Experience incident & problem management, technology change management and performance testing and tuning
  • Experience with system design and architecture for high scale applications, including performance optimization, fault tolerance, and resiliency
  • Strong verbal, written, and interpersonal communication skills
  • Ability to influence and build relationships with LOB stakeholders, technology leadership, external service providers, and architecture teams
  • Experience of modern software development lifecycle, including TDD, CI/CD, Pairing, Build Automations, Automated Testing, Agile Games, Chaos Engineering
  • Knowledge of Agile methodologies, and product operating model
  • Knowledge and understanding of application or software security: web application penetration testing, secure code review, secure static code analysis
  • Knowledge and understanding of technology architecture: solutions development
  • Knowledge and understanding of complex enterprise systems and frameworks including frontends, middleware, services layer, database, backend, and downstream interfaces
What we offer:
  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Additional Information:

Job Posted:
April 25, 2026

Expiration:
April 26, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Engineer - Observability Events Management

Principal Data Engineer

PointClickCare is searching for a Principal Data Engineer who will contribute to...
Location
Location
United States
Salary
Salary:
183200.00 - 203500.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Principal Data Engineer with at least 10 years of professional experience in software or data engineering, including a minimum of 4 years focused on streaming and real-time data systems
  • Proven experience driving technical direction and mentoring engineers while delivering complex, high-scale solutions as a hands-on contributor
  • Deep expertise in streaming and real-time data technologies, including frameworks such as Apache Kafka, Flink, and Spark Streaming
  • Strong understanding of event-driven architectures and distributed systems, with hands-on experience implementing resilient, low-latency pipelines
  • Practical experience with cloud platforms (AWS, Azure, or GCP) and containerized deployments for data workloads
  • Fluency in data quality practices and CI/CD integration, including schema management, automated testing, and validation frameworks (e.g., dbt, Great Expectations)
  • Operational excellence in observability, with experience implementing metrics, logging, tracing, and alerting for data pipelines using modern tools
  • Solid foundation in data governance and performance optimization, ensuring reliability and scalability across batch and streaming environments
  • Experience with Lakehouse architectures and related technologies, including Databricks, Azure ADLS Gen2, and Apache Hudi
  • Strong collaboration and communication skills, with the ability to influence stakeholders and evangelize modern data practices within your team and across the organization
Job Responsibility
Job Responsibility
  • Lead and guide the design and implementation of scalable streaming data pipelines
  • Engineer and optimize real-time data solutions using frameworks like Apache Kafka, Flink, Spark Streaming
  • Collaborate cross-functionally with product, analytics, and AI teams to ensure data is a strategic asset
  • Advance ongoing modernization efforts, deepening adoption of event-driven architectures and cloud-native technologies
  • Drive adoption of best practices in data governance, observability, and performance tuning for streaming workloads
  • Embed data quality in processing pipelines by defining schema contracts, implementing transformation tests and data assertions, enforcing backward-compatible schema evolution, and automating checks for freshness, completeness, and accuracy across batch and streaming paths before production deployment
  • Establish robust observability for data pipelines by implementing metrics, logging, and distributed tracing for streaming jobs, defining SLAs and SLOs for latency and throughput, and integrating alerting and dashboards to enable proactive monitoring and rapid incident response
  • Foster a culture of quality through peer reviews, providing constructive feedback and seeking input on your own work
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Director, Software Engineering

Palo Alto Networks is shaping the future with technology that is transforming th...
Location
Location
United States , Santa Clara
Salary
Salary:
232000.00 - 320750.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in software engineering
  • 5+ years leading large, distributed organizations and managing other managers
  • Expert at leveraging and mandating AI-assist tools (Claude Code, Cursor, Windsurf, or Copilot)
  • Proven track record building multi-tenant, high-throughput systems (Go, Python, K8s, Kafka) on GCP or AWS
  • Deep expertise in high-volume data pipelines, synthetic monitoring, and real-time analytics
  • Ability to translate complex architectural roadmaps into clear, high-impact business outcomes for executive stakeholders
  • BS or MS in Computer Science or a related technical field
Job Responsibility
Job Responsibility
  • Define and execute the technical roadmap for the ADEM platform
  • Lead the transition from traditional monitoring to Agentic AI workflows
  • Champion a modern SDLC by mandating and integrating AI-powered development tools
  • Own the architecture that processes billions of telemetry events daily
  • Define and execute the roadmap for an AI-first ADEM ecosystem
  • Lead the design of high-scale, cloud-native services
  • Oversee the delivery of high-throughput telemetry pipelines
  • Ensure enterprise-grade reliability across a global footprint
  • Partner with Product Management, Data Science, and Security teams
  • Serve as the primary technical authority for ADEM in executive forums
What we offer
What we offer
  • Restricted stock units
  • Bonus
  • Employee benefits (as per linked policy)
  • Fulltime
Read More
Arrow Right

Principal AIOps Engineer

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
144200.00 - 288400.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
July 01, 2026
Flip Icon
Requirements
Requirements
  • 10+ years of experience in SRE, production operations supporting highly available services along with experience with Product model
  • Proven technical leadership: ability to set direction, lead cross-team initiatives, and advise stakeholders through architecture reviews, tradeoffs, and operational readiness
  • Strong programming/scripting skills (Python preferred) and experience building automation, integrations, and APIs
  • Experience integrating observability platforms and event sources across hybrid environments (cloud/on-prem) and operating production-grade monitoring/event management at scale
  • Strong ServiceNow experience as an ITSM system of record (Incident/Problem/Change
  • CMDB/asset concepts). Ability to build and operate integrations at scale (REST, webhooks, event management) to support automation and auditability
  • Python (preferred) for automation and data/ML pipelines
  • experience building integrations, services, and operational tooling
  • Workflow orchestration and integrations (ServiceNow APIs, event pipelines, runbook automation) with strong reliability, security, and auditability practices
  • Observability: Prometheus/Grafana, OpenTelemetry, ELK/Splunk/Datadog (or equivalent)
Job Responsibility
Job Responsibility
  • Lead the AIOps strategy, roadmap, and operating model (intake, triage, automation lifecycle, KPIs) to measurably improve MTTR, alert quality, and operational efficiency
  • Own the observability-to-AIOps pipeline (metrics, logs, traces, events) and drive standardization of telemetry, service health models, and actionable alerting across teams and platforms
  • Design and implement event intelligence: correlation, deduplication, suppression, anomaly detection, incident clustering, and probable-cause analysis using topology/CMDB context
  • Advise operations, service owners, and leadership stakeholders
  • lead change enablement, adoption, and value measurement for AIOps and agentic automation across the organization
  • Develop ServiceNow-centric AIOps integrations (ITSM + ITOM/Event Management where applicable): event ingestion, alert-to-incident policies, enrichment, assignment/routing, approvals, change workflows, and closure updates for auditable closed-loop ops
  • Establish governance for operational AI (risk controls, approvals, auditability, data access, prompt/response logging, evaluation, and continuous improvement) in partnership with security, compliance, and operations
  • Build and operationalize agentic AI workflows for incident triage and resolution: signal summarization, similar-incident retrieval, knowledge article drafting, ticket updates, stakeholder communications, and human-in-the-loop remediation
  • Enable closed-loop automation and self-healing by connecting AIOps detections to orchestrated actions (runbooks/workflows), with clear approvals, safety checks, and rollback paths
  • Partner with NOC/SOC, infrastructure, and application owners to onboard services into AIOps, define service models, and improve signal quality, escalation paths, and operational readiness
What we offer
What we offer
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Bonus, commission or short-term incentive program
  • Equity award program
  • Fulltime
Read More
Arrow Right

Principal Engineer

We seek a Principal Engineer to be the technical leader for the Everyday Rewards...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
woolworths.com.au Logo
Woolworths Supermarkets
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum 10 + years experiences a engineer with most recent as a Principal Engineer, Senior Technical Architect, or Senior Tech Lead — ideally with an eCommerce, loyalty, or high-volume consumer platform background
  • Strong technical skills in .NET (C#, ASP.NET MVC, WebAPI, Entity Framework), Graph QL and Node.js
  • Extensive cloud platform experience (GCP/Azure) and familiarity with containerisation (Kubernetes/Docker Swarm)
  • Expertise in DevSecOps, CI/CD, infrastructure as code, and delivery automation
  • Experience with web security (OWASP Top 10), microservice architectures, NoSQL databases, and event messaging/queues (Kafka, RabbitMQ, Azure Event Hubs)
  • Proven ability to influence multiple teams, raise engineering standards, and facilitate team growth
Job Responsibility
Job Responsibility
  • Own and evolve the architectural runway for the Everyday Rewards tribe, defining target-state architecture and managing pragmatic delivery steps
  • Serve as the primary technical reference point, providing hands-on guidance on solution design, implementation tradeoffs, and engineering practices across squads
  • Drive tactical solutioning for complex problems, including partner integrations, loyalty processing, real-time event pipelines, and customer-facing digital experiences
  • Identify and address cross-squad technical concerns (consistency, shared components, standards, technical debt) with a delivery-aware lens
  • Establish and continuously improve engineering standards (testing, observability, CI/CD, secure-by-design) and produce fit-for-purpose solution architectures
What we offer
What we offer
  • Team discounts across our range of Woolworths Group brands you know and love and a robust rewards program that celebrates and incentivises purpose-driven work
  • A global business with endless career possibilities around every corner and across every discipline – with valuable exposure to a vast and exciting business network
  • A progressive and flexible 'work from anywhere' policy that gives you the opportunity to harmonise work, life and your wellbeing
  • A range of programs to help you prioritise and manage your wellbeing, including 24/7 access to the Sonder app
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

We are looking for a Principal Software Engineer to spearhead the architecture, ...
Location
Location
India , Pune; Kolkata
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of professional experience in software engineering with proficiency in .Net, C# and strong focus on distributed cloud systems
  • Deep expertise in Azure, Kubernetes, containerization, microservices, and cloud operations
  • Proven architectural leadership in large-scale ETL, orchestration frameworks, workflow engines, and distributed processing systems
  • Strong experience with event-driven architecture and messaging systems (e.g., Kafka, Service Bus, RabbitMQ)
  • Strong grounding in reliability engineering: observability, tracing, metrics, logs, CI/CD, and operational automation
Job Responsibility
Job Responsibility
  • Architectural Leadership: Evolve end-to-end architecture for cloud-based ETL workflows and engineering data synchronization
  • Architect event-driven systems using microservices, container orchestration, and state-machine-driven execution
  • Lead design of multi-tenant services optimized for global scale, performance, and cost-efficiency
  • Cloud Platform & Distributed Systems: Design cloud-native pipelines using Kubernetes, focusing on zero-downtime rollouts and secure configuration management
  • Implement distributed locking and conflict-resolution mechanisms for high-concurrency data sync
  • Ensure robust observability through distributed tracing, automated diagnostics, and structured logging
  • Data & ETL Workflow Architecture: Drive ETL design for engineering data, focusing on schema management, versioning, and domain-model mapping
  • Master workflow orchestration using engines like Temporal or Step Functions to ensure idempotency and transactional integrity
  • Reliability, Resilience & Operational Excellence: Define resilience strategies, including SLOs, self-healing workflows, circuit breakers, and failure isolation patterns
  • Champion operational health through architecture reviews, capacity planning, and cloud cost governance
What we offer
What we offer
  • A great Team and culture
  • An exciting career as an integral part of a world-leading software company providing solutions for architecture, engineering, and construction
  • An attractive salary and benefits package
  • A commitment to inclusion, belonging and colleague wellbeing through global initiatives and resource groups
  • A company committed to making a real difference by advancing the world’s infrastructure for better quality of life, where your contributions help build a more sustainable, connected, and resilient world
Read More
Arrow Right

NaaS Architect Principal

The NaaS Architect Principal is central to BT International's network transforma...
Location
Location
Spain , Madrid
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define and communicate network architectural vision, with track record driving large-scale network transformation programs in service provider or cloud environments
  • Network Architecture Expertise – Deep understanding of service provider networks including SDN, segment routing, MPLS, BGP and overlay technologies, combined with cloud-native networking and container networking patterns
  • Platform Engineering Mindset – Strong understanding of platform-as-a-product principles, building self-service capabilities and treating internal teams as customers with clear SLAs
  • API & Integration Architecture – Extensive experience designing API-driven architectures using RESTful, gRPC and event-driven patterns, with knowledge of industry standards including TMF, MEF and CAMARA
  • Technical Depth – Hands-on background in network engineering with coding capability in at least one language (Python, Go) and participation in technical spike or proof of concept work
  • Automation & Infrastructure-as-Code – Strong background in network automation, infrastructure-as-code (Terraform, Ansible) and GitOps with Flux/Argo CD
  • Cloud-Native & Multi-Cloud – Experience with cloud-native patterns including Kubernetes, containers and orchestration, operating across multi-vendor and multi-cloud environments
  • Observability & Network Operations – Knowledge of observability systems (ELK, Prometheus, Grafana, gNMI), telemetry pipelines, event streaming platforms (Kafka), orchestration platforms (Itential, NetBox) and traffic engineering controllers
  • Telco Transformation Context – Experience navigating organizational and technical challenges of telco network modernization while maintaining operational continuity
  • Zero Touch Operations – Knowledge of intent-based networking, automated remediation, workflow-driven operations and compliance management that enable zero-touch networking principles
Job Responsibility
Job Responsibility
  • Define and lead the architectural strategy for NaaS platform evolution, establishing target state architectures that balance functional requirements with non-functional requirements including scalability, resilience, security and cost optimization
  • Work hand in hand with product engineering squads to provide hands-on architectural guidance, working directly with engineers to deliver product excellence as well as technical spikes and proof-of-concepts
  • Drive API-first architecture across network services, establishing patterns for exposing network capabilities through modern integration approaches including RESTful APIs, gRPC and event-driven patterns, with alignment to industry standards including TMF, MEF and CAMARA
  • Lead vendor rationalization strategy across network equipment vendors, cloud providers and orchestration platforms, reducing vendor dependencies through strategic build vs buy decisions and phasing out unnecessary third-party systems in favor of composable in-house capabilities
  • Champion modern architecture patterns including infrastructure-as-code, GitOps, automated provisioning and cloud-native networking that enable continuous delivery and operational excellence
  • Establish observability frameworks for network services including telemetry pipelines, metrics collection, distributed tracing and logging strategies that enable proactive operations and rapid troubleshooting
  • Collaborate with platform engineering teams to build Internal Developer Platform capabilities that abstract network complexity and provide self-service access to network functions
  • Drive architectural governance through design reviews and conformance processes, ensuring solutions align with platform standards while empowering product team autonomy
  • Provide technical thought leadership on network architecture including SDN underlay, control plane, management plane APIs and telemetry, translating industry trends and technological advances into roadmaps that align with BT International's platform strategy
  • Mentor architects and engineers, fostering architectural thinking and technical leadership capability across both the architecture and product engineering organizations
Read More
Arrow Right

Executive Director, Agentic AI

The Executive Director, Agentic AI will define and lead the enterprise strategy,...
Location
Location
United States , Sacramento
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
May 30, 2026
Flip Icon
Requirements
Requirements
  • 12+ years in software engineering, platforms, or AI/ML, with 5+ years in senior leadership roles
  • Hands-on experience delivering AI systems at enterprise scale (not just experimentation)
  • Deep understanding of: LLMs, SLMs, RAG, embeddings, vector databases
  • Agent frameworks and orchestration patterns
  • Distributed systems, APIs, event-driven architectures
  • Proven ability to operate in regulated, high-availability environments
  • Strong executive communication and stakeholder-management skills
Job Responsibility
Job Responsibility
  • Define the enterprise Agentic AI vision and roadmap, aligned to business outcomes (cost reduction, revenue growth, productivity, experience uplift)
  • Establish clear differentiation between LLM tools, copilots, workflows, and autonomous/multi-agent systems
  • Identify and prioritize high-value agentic use cases (e.g., customer support resolution, claims/prior auth automation, contract leakage reduction, operational orchestration, developer productivity)
  • Own the design and evolution of the Agentic AI Platform, including: Multi-agent frameworks (planner, executor, verifier, critic, retriever agents)
  • Tool/function calling and API orchestration
  • RAG, memory, state management, and context persistence
  • Human-in-the-loop / human-on-the-loop controls
  • Define standards for agent lifecycle management (design, testing, deployment, observability, rollback)
  • Partner with Digital Platform and Integration teams to ensure agents are API-first, event-driven, and scalable
  • Lead delivery of production-grade agentic solutions, not POCs
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
!
Read More
Arrow Right