CrawlJobs Logo

Principal Engineer – Network Tools and Observability

United States, Irving Employment contract 159000.00 - 305000.00 USD / Year · Job Posted June 28, 2026
Apply Position
Job Link Share

Job Description

Wells Fargo is seeking a Principal Engineer to lead the strategy, architecture, and delivery of the enterprise network tooling stack, with a strong emphasis on observability and Agentic AI. This role ensures the bank’s networks – including datacenter, branch, cloud, WAN/SD-WAN, and SASE/edge – are instrumented with robust, scalable, and secure tooling to support operational excellence, compliance, and cost efficiency. The engineer will drive integration of telemetry, topology, and analytics platforms to accelerate incident resolution, enable data-driven planning, and support Secure by Design operations. This position sits within the Innovation and Transformation team in Secure Network Services, part of the Cyber Security organization.

Job Responsibility

  • Strategy & Roadmap: Advise leadership on multi-year strategy for network tools and observability (NPMD, telemetry pipelines, topology, synthetic monitoring), aligned with SRE, Security, and Cloud initiatives
  • Architecture: Design and evolve the observability stack, selecting tools and technologies (e.g., SNMP, flow/IPFIX, syslog, streaming telemetry/gNMI, OpenTelemetry, APM/NPM, log analytics, time-series DB) for scalability across on-prem, cloud, and branch environments
  • Delivery Leadership: Oversee implementation and integration with ITSM/CMDB, CI/CD, configuration compliance, and ITIL-based incident/problem/change workflows
  • Cross-Functional Collaboration: Partner with NetOps, SRE, SecOps, Cloud, and Lines of Business to translate requirements into architecture and measurable outcomes
  • Blueprints & Standards: Define baselines for telemetry and data collection: SNMP polling and SNMP traps, Syslog ingestion and correlation, Flow data (NetFlow, sFlow, IPFIX, etc.), Packet capture tools for deep traffic analysis, Integration with vendor-based controllers and SaaS platforms, Define tooling standards across platforms
  • Enterprise Impact: Lead resolution of complex challenges requiring deep evaluation across multiple domains, Translate strategic business objectives and enterprise technology landscape into engineering solutions, Provide vision and technical direction to leadership for innovative, large-scale solutions
  • Operational Excellence: Optimize processes and drive robust automation, Identify inefficiencies and promote continuous improvement
  • Leadership & Mentorship: Align stakeholders through clear technical communication, Mentor teams and deliver knowledge transfers to upskill SMEs, Present technical concepts to senior leaders

Requirements

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years in large‑scale network engineering/operations, with 5+ years focused on tooling and observability

Nice to have

  • Expert knowledge of enterprise networks (L2/L3, BGP/EVPN, DC fabrics, SD‑WAN, SASE, firewalls, LB, DNS/DHCP/IPAM) and cloud networking (Azure/AWS/GCP: VNet/VPC, Transit, Direct Connect/ExpressRoute)
  • Deep experience with telemetry & tooling: SNMP, NetFlow/IPFIX/sFlow, syslog, streaming telemetry (gNMI/NETCONF/RESTCONF), synthetic monitoring, path analytics, and topology
  • Hands-on experience with tools such as ThousandEyes, AppDynamics, Splunk/Elastic, Prometheus/Grafana, ServiceNow
  • Automation and integration skills: Python, Ansible, Terraform, CI/CD, REST APIs, webhooks, event pipelines
  • Strong grasp of SRE principles (SLIs/SLOs/error budgets), reliability and resilience (RTO/RPO, DR, chaos engineering)
  • 5+ years of technical design experience
  • 5+ years of network or network security design experience
  • Strong knowledge of end-to-end network architecture and protocols
  • Excellent communication and technical writing skills
  • Ability to define business requirements and align them with tooling strategy
  • Ability to translate complex technical needs into clear, actionable collaboration
  • Proven leadership and mentoring capabilities

What we offer

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Principal Engineer – Network Tools and Observability

8 matching positions

Principal Network Engineer, Operations & Observability

The System Engineer job family has responsibility for infrastructure/technical p...
Location
Location
United States , Englewood
Salary
Salary:
60.24 - 89.60 USD / Hour
americannursingcare.com Logo
American Nursing Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors of Arts degree or equivalent experience
  • 10 years of professional IT experience in an IT technical or infrastructure field
  • 5+ years Unix operational experience (Solaris, AIX, Linux)
  • 5+ years Windows Server operational experience
Job Responsibility
Job Responsibility
  • Platform Lifecycle Management
  • Enterprise Architecture and Strategy
  • Future-State Vision
  • Strategy and Roadmap
  • Architectural Standards
  • Collaboration and Operational Model
  • Develops organizational policies, standards, and guidelines for methods and tools
  • Determines testing policy
  • Sets the release policy for the organization
  • Maintain primary responsibility for strategic planning, technical roadmap development, standards and architecture
What we offer
What we offer
  • medical
  • prescription drug
  • dental
  • vision plans
  • life insurance
  • paid time off
  • tuition reimbursement
  • retirement plan benefit(s) including 401(k), 403(b), and other defined benefits offerings
  • Fulltime
Read More
Arrow Right

Principal Architect - Cloud and Observability

We're building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
144200.00 - 288400.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 29, 2026
Flip Icon
Requirements
Requirements
  • 10+ years in infrastructure, cloud architecture, platform engineering, or SRE
  • 8+ years of architecture work in observability, cloud infrastructure, or both at a large enterprise
  • Solid experience with at least two of Azure, AWS, or GCP -- including networking, identity, compute, and storage
  • 5+ years with Kubernetes in production (OpenShift, EKS, AKS, or GKE)
  • 5+ years with OpenTelemetry or similar frameworks (collectors, SDKs, semantic conventions, pipeline design)
  • 5+ years with observability platforms: Grafana/Mimir/Loki/Tempo, Prometheus, Datadog, Splunk, Dynatrace, or comparable tools
  • Experience defining SLOs/SLIs and building alerting strategies at an organizational level
  • Proven track record writing architecture standards that other teams adopted and followed
  • Able to communicate clearly with both engineers and senior leadership
Job Responsibility
Job Responsibility
  • Own the enterprise observability reference architecture covering metrics, logs, traces, and events across all environments (cloud and on-prem)
  • Drive the OpenTelemetry-first instrumentation strategy -- standard libraries, semantic conventions, collector topologies (DaemonSet, gateway, sidecar), and pipeline design
  • Build and operate telemetry pipelines on Grafana Mimir, Loki, and Tempo, including multi-tenant configurations, retention policies, and capacity planning
  • Define how we measure reliability: SLOs, SLIs, error budgets, and alerting frameworks -- consistently across all lines of business
  • Own the integration between observability tooling and incident management (ServiceNow ITOM, xMatters)
  • Drive telemetry schema standards to ensure teams emit data that is useful downstream, not just technically compliant
  • Build and maintain reference architectures for our hybrid footprint: OpenShift on-prem with KVM/libvirt and Dell PowerFlex storage, plus Azure, AWS, and GCP
  • Lead standards work around workload identity and federation using SPIFFE/SPIRE and cloud-native IAM patterns to move away from static secrets
  • Provide guidance on compute runtime selection -- containers vs. VMs vs. bare metal vs. serverless -- with a clear decision framework for teams
  • Help teams connect autoscaling and capacity planning behavior to actual telemetry signals
What we offer
What we offer
  • medical, dental, and vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • other resources, based on eligibility
  • bonus, commission or short-term incentive program
  • equity award program
  • Fulltime
Read More
Arrow Right
New

Principal Network Architect

As Principal Network Architect at Stacuity, you will be the senior design author...
Location
Location
Isle of Man , Douglas
Salary
Salary:
Not provided
Stacuity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum CCNP level (or equivalent) across major enterprise networking vendors
  • Excellent written and spoken English
  • Strong grounding in high-availability and failover principles at both Layer 2 and Layer 3
  • Deep, practical expertise in IP routing at scale, including BGP (iBGP and eBGP) and IS-IS
  • Solid experience with tunnelling technologies such as GRE and VXLAN, and Layer 2 networking including HSRP/VRRP/VARP, VLANs and trunking, STP, and ARP/NDP
  • Working knowledge of VPN technologies and IPsec
  • Hands-on firewall configuration and management experience
  • Hands-on configuration and troubleshooting experience across major networking vendors
  • Experience with SD-WAN and interconnection fabrics
  • Experience with monitoring and observability tooling
Job Responsibility
Job Responsibility
  • Act as the design authority for Stacuity’s global network
  • Author and deliver High-Level Design (HLD) and Low-Level Design (LLD) documentation
  • Design and evolve global WAN, LAN, and Internet edge architecture
  • Define network standards
  • Evaluate emerging network technologies
  • Be hands-on alongside engineers
  • Lead network projects from a design perspective
  • Build and maintain hardware Bills of Materials (BoM)
  • Act as a senior escalation point for complex network incidents
  • Provide technical leadership and mentorship
  • Fulltime
Read More
Arrow Right

Principal Engineer

Wells Fargo is seeking a Principal Engineer to join the Consumer Technology grou...
Location
Location
United States , Irving;Chandler;Charlotte
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
July 02, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of alarm scripting or alerting tool(s) experience
  • 3+ years of experience setting up distributed tracing across an internet topology for full health check and with the ability to pinpoint problem source
Job Responsibility
Job Responsibility
  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Fulltime
!
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Product Engineer

We're looking for a Principal Product Engineer who pairs deep engineering craft ...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
tripadvisor.com Logo
Tripadvisor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software experience, with significant time still spent hands-on in code - track record of shipping product-impacting work end-to-end, not just owning a layer
  • Real depth in the mobile app ecosystem (iOS and/or Android, with strong fluency in Swift and/or Kotlin and the surrounding ecosystem - offline sync, push, auth, persistence, networking, REST/GraphQL) - and credible breadth beyond it
  • Demonstrated breadth: you've worked seriously in at least one of {web frontend, backend services, data/infra, platform tooling} alongside mobile, and can hold your own in code review there
  • Strong product judgment: you've made calls about what not to build, and can defend them with evidence
  • Comfort troubleshooting in production across stacks - crash analysis, latency tracing, release-health debugging
  • Excellent cross-functional collaboration
  • you make the people around you better.
Job Responsibility
Job Responsibility
  • Identify, scope, and ship the changes that move business metrics - across mobile, web, services, and data layers
  • Architect long-lasting systems that hold up under real production conditions: performance, reliability, scalability, offline behavior, consistency
  • Lead technical design reviews across teams, weighing trade-offs not just in code but in product impact, time-to-ship, and operational cost
  • Drive operational maturity wherever it's weakest - release management, observability, incident response, performance monitoring - including in the mobile apps
  • Partner with PMs, designers, and engineering leaders to shape what we build, why, and in what order
  • you're a peer in those conversations, not a downstream implementer
  • Set the technical bar for the org by example: write the prototype, prove the pattern, then teach it
  • Communicate trade-offs clearly to engineers, product partners, and senior stakeholders
What we offer
What we offer
  • Competitive compensation packages (routinely benchmarked against the latest industry data), including base salary and annual bonuses
  • “Work your way” with flexibility to suit your lifestyle. Tripadvisor Group takes a remote-friendly approach to collaboration across a worldwide team, with the option to join on-site as often as you’d like or as required by your team.
  • Flexible schedule. Work-life balance is ingrained in our culture by design. Trust and accountability make it work.
  • Donation matching. Give back? Give more! We match qualifying charitable donations annually.
  • Tuition assistance. Want to level up your career? We love to hear it! Receive annual support for qualified programs.
  • Lifestyle benefit. An annual benefit to spend on yourself. Use it on travel, wellness, or whatever suits you.
  • Travel perks. We believe that travel is employee development, so we provide discounts and more.
  • Employee assistance program. We’re here for you with resources and programs to help you through life’s challenges.
  • Health benefits. We offer great coverage and competitive premiums.
  • Generous referral scheme. Help us grow and be rewarded with generous awards for referring successful candidates.
Read More
Arrow Right

Principal Software Engineer

Join Mastercard's Operations Automation Program as a Principal Engineer for our ...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
mastercard.com Logo
Mastercard
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in Software Engineering and DevOps roles, including at least 2 years in a technical leadership capacity
  • Strong Linux systems administration background
  • Deep familiarity with cloud environments either AWS, Azure or GCP
  • Proven experience with container orchestration and tooling (Kubernetes, Helm, Docker Compose)
  • Hands-on experience with Terraform Enterprise, Ansbile and Chef
  • Strong understanding of CI/CD pipelines (GitHub Actions, Jenkins etc.)
  • Proficient in scripting and programming (Bash, GoLang Python, Ruby )
  • Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk, Dynatrace)
  • Excellent communication skills and the ability to work effectively with Product, Developers and Operations teams
  • Demonstrated ownership mindset, prioritization skills, and ability to thrive in a fast-paced, multi-tasking environment
Job Responsibility
Job Responsibility
  • Automate, provision and manage public and on-prem cloud infrastructure, including containerized and virtualized systems
  • Plan, build, and optimize CI/CD pipelines to enable fast, safe, and repeatable delivery of complex distributed systems across a global network of data centres
  • Collaborate in cross-functional teams to automate deployments, troubleshoot complex issues, and support new product initiatives
  • Champion Infrastructure as Code principles using tools like Terraform Enterprise, Ansible and Chef
  • Drive observability and reliability through monitoring, logging, and alerting systems (Prometheus, Grafana, Druid, Splunk etc.)
  • Foster innovation and continuous improvement, adopting new tools and practices to increase efficiency, scalability, and cost optimization
  • Hands-on coding of automation efforts in Go Lang
  • Partner with stakeholders to align DevOps initiatives with business objectives, including uptime, deployment velocity, and cost efficiency
  • Fulltime
Read More
Arrow Right

Principal DevOps Engineer

We are looking for Principal Engineer to join our Cloud-NGFW engineering team. Y...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
  • 10+ years of professional experience in DevOps, SRE, or Infrastructure Engineering, with a Bachelor's degree
  • Strong proficiency in Linux/Unix systems administration, internals, networking, and troubleshooting
  • Expertise in at least one programming/scripting language (e.g., Python, Go, Bash)
  • Hands-on experience with at least one major cloud platform (AWS, Azure) and its core services
  • Proven experience with containerization and orchestration technologies
  • Demonstrable experience building and managing CI/CD pipelines (e.g., GitLab Actions, Jenkins)
  • Strong hands-on experience with infrastructure as code (e.g., Terraform, Ansible)
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable, highly available, and secure infrastructure to support our global security services
  • Spearhead the transition to autonomous operational processes by developing and implementing Infrastructure as Code (IaC) practices
  • Define and govern SLIs/SLOs/SLAs to ensure rigorous service standards and lead the 'Error Budget' conversation to balance feature velocity with system stability
  • Build and optimize CI/CD pipelines that empower developers to ship code multiple times a day with high confidence and reliability
  • Implement and enhance monitoring, alerting, and observability solutions to improve system visibility and proactively reduce Mean Time to Resolution (MTTR)
  • Drive a culture of continuous improvement through blameless postmortems and root cause analysis, and collaborate with cross-functional teams to implement preventative measures
  • Automate repetitive operational tasks using scripting and configuration management tools to improve efficiency and reduce manual error
  • Fulltime
Read More
Arrow Right