CrawlJobs Logo

Principal Network Engineer, Operations & Observability

americannursingcare.com Logo

American Nursing Care

Location Icon

Location:
United States , Englewood

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

60.24 - 89.60 USD / Hour

Job Description:

The System Engineer job family has responsibility for infrastructure/technical planning, implementation and support activities for systems owned by the CommonSpirit Health Technology Infrastructure team. Specific responsibilities include installing and supporting system hardware and software, performing system upgrades, and evaluating and installing patches and software updates. Responsibilities also include operational support activities such as resolving software and hardware related problems, managing backup and recovery activity, administering technology layers, managing monitoring and alerting functions, performing capacity planning and conducting version management. System Engineers work closely with architects, infrastructure support, database administrators and application support teams to ensure seamless and quality IT support for CommonSpirit Health customers and alignment with CommonSpirit Health's IT standards, controls and governance. To be successful, individuals must possess a combination of technical, business and leadership skills. This requires an understanding of customers' business needs, processes, and functions. They also require a solid knowledge of IT infrastructure, architecture, applications development and support, networks, and computer operations. In addition, individuals working in this job family must possess excellent communication skills and the ability to influence others. The Principal System Engineer is considered a subject matter expert in the enterprise and multiple technology areas, platforms and functions. They are responsible for maintaining a deep awareness and understanding of emerging trends and technologies in IT and Healthcare. Assignments span the enterprise as the principal system engineer will be responsible for standards and technical roadmap development. This senior role focuses on the strategic architecture, administration, and continuous improvement of the organization's network operations tooling ecosystem. The scope includes platforms such as Cisco Catalyst Center, AKIPS, NNMi, OMi, ThousandEyes, and other related network management, monitoring, and observability technologies.

Job Responsibility:

  • Platform Lifecycle Management
  • Enterprise Architecture and Strategy
  • Future-State Vision
  • Strategy and Roadmap
  • Architectural Standards
  • Collaboration and Operational Model
  • Develops organizational policies, standards, and guidelines for methods and tools
  • Determines testing policy
  • Sets the release policy for the organization
  • Maintain primary responsibility for strategic planning, technical roadmap development, standards and architecture
  • Oversees efforts with key vendors to understand future application product plans
  • Perform project resourcing, oversight and management
  • Coaches the team on personal development and develops training strategies and schedules
  • Initiates methods and approaches to meet defined business objectives
  • Works on, and may lead, multiple projects that may span the enterprise
  • Identify automation opportunities and implement scripted solutions
  • Serves as an escalation point for complex requests and issues
  • Provide overall ownership of the Change Management process for the System Engineer team
  • Provide exceptional customer service to CommonSpirit end users
  • Act as primary conduit to oversee support efforts between System Engineering and other CommonSpirit Health teams
  • Schedule and manage all performance tuning and troubleshooting efforts
  • Review, recommend and monitor the source code/versioning management function
  • Provide overall technical ownership for all support and project issues and responsibilities within the System Engineering teams
  • Design, implement and maintain a comprehensive monitoring and alerting process across all Technology Infrastructure platforms
  • Utilize standard tools and methodology to develop system and support performance metrics
  • Demonstrate comprehensive knowledge & expertise with CommonSpirit business processes and routines
  • Perform crisis management during high-severity operational incidents
  • Perform meeting facilitation for staff, operational support and project meetings
  • Complete assignments as required by the Director / Manager
  • May require on-call coverage responsibilities

Requirements:

  • Bachelors of Arts degree or equivalent experience
  • 10 years of professional IT experience in an IT technical or infrastructure field
  • 5+ years Unix operational experience (Solaris, AIX, Linux)
  • 5+ years Windows Server operational experience

Nice to have:

Healthcare industry experience

What we offer:
  • medical
  • prescription drug
  • dental
  • vision plans
  • life insurance
  • paid time off
  • tuition reimbursement
  • retirement plan benefit(s) including 401(k), 403(b), and other defined benefits offerings

Additional Information:

Job Posted:
May 10, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Network Engineer, Operations & Observability

Senior Principal Backend Engineer

As an Observability Architect for the Platform Engineering team, you will collab...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Previous experience in building and managing large scale telemetry systems, OTEL, TSDB
  • Previous experience building large scale data ingestion pipelines
  • Software development in Java, Python
  • Serious analytical skills across different levels of the stack: Network, .Net/Java, Operating System
Job Responsibility
Job Responsibility
  • Regularly tackle the largest and most complex problems on the team, from technical design to Solution
  • Deliver solutions that are used by other teams and products
  • Determine plans-of-attack on large projects
  • Routinely tackle complex architecture challenges and apply architectural standards and start using them on new projects
  • Lead code reviews and documentation, as well as take on complex bug fixes, especially on high-risk problems
  • Set the standard for thorough, meaningful code reviews
  • Partner across Engineering teams to take on company-wide initiatives spanning multiple projects
  • Transfer your depth of knowledge from your current language to excel as a Software Engineer
  • Mentor more junior members of the team
What we offer
What we offer
  • Health and wellbeing resources
  • Paid volunteer days
Read More
Arrow Right

Principal Performance Engineer

This is the definitive performance ownership role for the Matillion Data Product...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
matillion.com Logo
Matillion
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience as a Principal Engineer specializing in performance for large distributed systems, ideally within a high-growth SaaS or data platform environment
  • Deep expertise in Java and Spring Boot, with strong JVM performance engineering skills, including GC tuning, heap/thread optimization, and low-latency design
  • Strong systems-level understanding of technologies like Kubernetes, container runtimes, Python performance, network behavior, and cloud architecture (AWS)
  • The ability to influence engineers across teams without direct authority, shaping architecture, patterns, threading models, and concurrency to eliminate bottlenecks
  • Expertise with modern observability stacks (e.g., OTel, Prometheus, Grafana, Datadog) and hands-on experience designing and running meaningful load, chaos, and stress scenarios
Job Responsibility
Job Responsibility
  • Define and drive the Performance Vision and Strategy, establishing clear, measurable targets for latency, throughput, cost efficiency, and workload scalability across the Data Productivity Cloud
  • Conduct deep-dive profiling and complex root cause analysis across distributed systems, investigating API latency, memory/CPU pressure, queue back-pressure, and network bottlenecks
  • Build, maintain, and own repeatable, scalable benchmarking frameworks covering agent performance, concurrency, workflow orchestration throughput, and end-to-end user journeys
  • Drive significant Cost Performance and Efficiency improvements by utilizing telemetry and empirical data to reduce runtime operational costs (compute, memory, storage, network)
  • Act as the technical performance expert to influence feature design, technical architecture, and implementation patterns across engineering teams to prevent regressions and embed performance guardrails
What we offer
What we offer
  • Company Equity
  • 27 days paid time off
  • 12 days of Company Holiday
  • 5 days paid volunteering leave
  • Group Mediclaim (GMC)
  • Enhanced parental leave policies
  • MacBook Pro
  • Access to various tools to aid your career development
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Cloud Infrastructure Engineer

As Highspot continues to scale rapidly, building a robust and efficient platform...
Location
Location
United States , Seattle
Salary
Salary:
188696.00 - 282609.00 USD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of experience in software or infrastructure engineering
  • At least 5 years focused on platform engineering or cloud infrastructure at scale
  • Proven success designing and operating internal developer platforms in AWS environments
  • Expert-level experience with Kubernetes, including provisioning, cluster lifecycle management, workload orchestration, and multi-tenant design
  • Strong expertise in Terraform, GitOps tools (e.g., ArgoCD), and CI/CD systems (e.g., GitHub Actions, Spinnaker)
  • Deep understanding of cloud networking, IAM, service meshes, and container orchestration at scale
  • Familiar with the CNCF landscape and how to leverage open-source tools to solve platform problems
  • Passion for developer experience
  • Track record of technical leadership, mentoring, and influencing engineering culture at a large scale
  • Bachelor's or Master’s in Computer Science or related discipline, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design and build scalable platform capabilities that empower engineering teams to ship features reliably, securely, and quickly
  • Create and maintain developer-facing tools and paved paths (e.g., CI/CD pipelines, Kubernetes platforms, observability stacks, secrets management)
  • Implement Infrastructure-as-Code and GitOps patterns to promote consistency, automation, and compliance across environments
  • Collaborate with product, security, and compliance stakeholders to build platform services that meet SLAs and governance standards
  • Drive efforts to standardize and simplify infrastructure across cloud environments (AWS, Azure), enabling secure multi-cloud operation
  • Lead incident response, reliability engineering, and observability improvements that ensure platform uptime and performance
  • Act as a technical mentor and thought leader, guiding teams on infrastructure architecture, platform adoption, and best practices
  • Define and execute on a strategic roadmap to evolve the internal platform in line with company growth and technology direction
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Health Savings Account (HSA) with employer contribution
  • 401(k) Matching with immediate vesting on employer match
  • Flexible PTO
  • 8 paid holidays and 5 paid days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • 18 weeks paid parental leave
  • Access to Coaches and Therapists through Modern Health
  • 2 volunteer days per year
  • Commuting benefits
  • Fulltime
Read More
Arrow Right

Principal Cloud Infrastructure Engineer

As Highspot continues to scale rapidly, building a robust and efficient platform...
Location
Location
Canada , Vancouver
Salary
Salary:
170435.00 - 230435.00 CAD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of experience in software or infrastructure engineering
  • At least 5 years focused on platform engineering or cloud infrastructure at scale
  • Proven success designing and operating internal developer platforms in AWS and/or Azure environments
  • Expert-level experience with Kubernetes, including provisioning, cluster lifecycle management, workload orchestration, and multi-tenant design
  • Strong expertise in Terraform, GitOps tools (e.g., ArgoCD), and CI/CD systems (e.g., GitHub Actions, Spinnaker)
  • Deep understanding of cloud networking, IAM, service meshes, and container orchestration at scale
  • Familiar with the CNCF landscape and how to leverage open-source tools to solve platform problems
  • Passion for developer experience
  • Track record of technical leadership, mentoring, and influencing engineering culture at a large scale
  • Bachelor's or Master’s in Computer Science or related discipline, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design and build scalable platform capabilities that empower engineering teams to ship features reliably, securely, and quickly
  • Create and maintain developer-facing tools and paved paths (e.g., CI/CD pipelines, Kubernetes platforms, observability stacks, secrets management)
  • Implement Infrastructure-as-Code and GitOps patterns to promote consistency, automation, and compliance across environments
  • Collaborate with product, security, and compliance stakeholders to build platform services that meet SLAs and governance standards
  • Drive efforts to standardize and simplify infrastructure across cloud environments (AWS, Azure), enabling secure multi-cloud operation
  • Lead incident response, reliability engineering, and observability improvements that ensure platform uptime and performance
  • Act as a technical mentor and thought leader, guiding teams on infrastructure architecture, platform adoption, and best practices
  • Define and execute on a strategic roadmap to evolve the internal platform in line with company growth and technology direction
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Group Retirement Savings Plan (RRSP) and matching employer contributions (DPSP) with immediate vesting
  • Flexible PTO
  • Generous Holiday Schedule + 5 Days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • Flexible work schedules
  • Access to Coaches and Therapists through Modern Health
  • 2 Volunteer days per year
  • Monthly transportation allowance for employees that work in our Vancouver Hub location
  • Eligible for bonuses and stock options
  • Fulltime
Read More
Arrow Right

NaaS Architect Principal

The NaaS Architect Principal is central to BT International's network transforma...
Location
Location
Spain , Madrid
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define and communicate network architectural vision, with track record driving large-scale network transformation programs in service provider or cloud environments
  • Network Architecture Expertise – Deep understanding of service provider networks including SDN, segment routing, MPLS, BGP and overlay technologies, combined with cloud-native networking and container networking patterns
  • Platform Engineering Mindset – Strong understanding of platform-as-a-product principles, building self-service capabilities and treating internal teams as customers with clear SLAs
  • API & Integration Architecture – Extensive experience designing API-driven architectures using RESTful, gRPC and event-driven patterns, with knowledge of industry standards including TMF, MEF and CAMARA
  • Technical Depth – Hands-on background in network engineering with coding capability in at least one language (Python, Go) and participation in technical spike or proof of concept work
  • Automation & Infrastructure-as-Code – Strong background in network automation, infrastructure-as-code (Terraform, Ansible) and GitOps with Flux/Argo CD
  • Cloud-Native & Multi-Cloud – Experience with cloud-native patterns including Kubernetes, containers and orchestration, operating across multi-vendor and multi-cloud environments
  • Observability & Network Operations – Knowledge of observability systems (ELK, Prometheus, Grafana, gNMI), telemetry pipelines, event streaming platforms (Kafka), orchestration platforms (Itential, NetBox) and traffic engineering controllers
  • Telco Transformation Context – Experience navigating organizational and technical challenges of telco network modernization while maintaining operational continuity
  • Zero Touch Operations – Knowledge of intent-based networking, automated remediation, workflow-driven operations and compliance management that enable zero-touch networking principles
Job Responsibility
Job Responsibility
  • Define and lead the architectural strategy for NaaS platform evolution, establishing target state architectures that balance functional requirements with non-functional requirements including scalability, resilience, security and cost optimization
  • Work hand in hand with product engineering squads to provide hands-on architectural guidance, working directly with engineers to deliver product excellence as well as technical spikes and proof-of-concepts
  • Drive API-first architecture across network services, establishing patterns for exposing network capabilities through modern integration approaches including RESTful APIs, gRPC and event-driven patterns, with alignment to industry standards including TMF, MEF and CAMARA
  • Lead vendor rationalization strategy across network equipment vendors, cloud providers and orchestration platforms, reducing vendor dependencies through strategic build vs buy decisions and phasing out unnecessary third-party systems in favor of composable in-house capabilities
  • Champion modern architecture patterns including infrastructure-as-code, GitOps, automated provisioning and cloud-native networking that enable continuous delivery and operational excellence
  • Establish observability frameworks for network services including telemetry pipelines, metrics collection, distributed tracing and logging strategies that enable proactive operations and rapid troubleshooting
  • Collaborate with platform engineering teams to build Internal Developer Platform capabilities that abstract network complexity and provide self-service access to network functions
  • Drive architectural governance through design reviews and conformance processes, ensuring solutions align with platform standards while empowering product team autonomy
  • Provide technical thought leadership on network architecture including SDN underlay, control plane, management plane APIs and telemetry, translating industry trends and technological advances into roadmaps that align with BT International's platform strategy
  • Mentor architects and engineers, fostering architectural thinking and technical leadership capability across both the architecture and product engineering organizations
Read More
Arrow Right

Senior Principal Cloud Developer

The role involves designing and building innovative Agentic AI applications and ...
Location
Location
United States , San Jose
Salary
Salary:
157500.00 - 361500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10-15 years of experience in developing highly scalable cloud and cloud-native applications using technology stacks, architecture, design, development, and support
  • at least one year of recent multi-agent Agentic and RAG GenAI Software Development experience applied to Networking and/or Observability domains
  • experience developing Network Observability software for large scale Network Monitoring, Network Performance, Network Configuration or Network Capacity Management products
  • deep understanding and experience in Networking Protocol and Networking Best Practices for Enterprise and Service Provider networks
  • proven skills and programming experience in Golang, scalable concurrent processing, REST, Data Caching Services, DB schema design and data access technologies
  • experience in building, orchestrating, and deploying highly scalable REST based stateless APIs/web services for web applications in Kubernetes environment
  • familiarity with code versioning tools such as Git
  • knowledge of Network and NetFlow Logs processing and indexing
  • ability to communicate with senior Executives and with customers
Job Responsibility
Job Responsibility
  • design and build large scale distributed systems
  • apply best practices for high availability, scalability, resilience, performance, and security requirements in the cloud
  • transition proof-of-concept implementations into R&D teams to accelerate new product delivery
  • create technical content such as designs, specifications, and initial software implementations
  • mentor less-experienced staff members
  • collect product feedback from field interactions to provide input into Engineering and Product Management
  • maintain knowledge of OpsRamp SaaS product and roadmap, as well as competition
  • collaborate with product team to translate functional requirements into technical solutions
  • develop monitoring solutions using tools and services that are part of the cloud infrastructure
  • facilitate CI/CD by integrating development processes
What we offer
What we offer
  • comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • personal and professional development programs
  • unconditional inclusion and flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineering Manager

Are you a Principal Site Reliability Engineering Manager interested in improving...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • 3+ years of people management experience
  • 5+ years of experience planning, designing, implementing, and delivering large initiatives spanning multiple engineers as the primary owner, including operating and improving production services at scale
  • Experience leading reliability engineering for developer-facing or platform services, including incident response, automation/toil reduction, and observability (metrics/logs/tracing) built on top of mature observability platforms and practices
  • Experience working across disciplines, groups, and teams to align reliability priorities and delivery plans
  • Experience architecting, deploying, and operating enterprise scale distributed cloud services (Azure preferred), including containerization and orchestration
  • Experience operating engineering systems outer loop processes (CI/CD, build, and release platforms) with reliability, safety, and governance practices
Job Responsibility
Job Responsibility
  • Partner with engineers, product managers, and partner teams to design, operate, and maintain reliable and resilient services, with clear operational requirements (monitoring, alerting, runbooks, capacity, and failure modes)
  • Drive cross-org alignment through partnerships and co-development following the “One Microsoft” philosophy, including shared reliability standards and operational tooling
  • Build, grow, and retain a team of Site Reliability Engineers
  • Provide mentorship and coaching on reliability engineering, incident response, and pragmatic automation—within and beyond your team
  • Define, implement, and operate SLOs/SLIs and error budgets for critical engineering systems services
  • use them to guide prioritization and continuous improvement
  • Lead incident management for your services, including on-call health, escalation paths, blameless post incident reviews, modeling follow-through on corrective and preventive actions
  • Drive automation to reduce toil and improve operational efficiency across build, validation, and deployment systems (e.g., self-healing, safe rollouts, and automated remediation)
  • Establish observability (metrics, logs, traces), capacity planning, and performance management to meet reliability and latency goals at scale
  • Foster a diverse and inclusive culture where everyone can bring their full and authentic self, while holding a high bar for customer impact and reliability
  • Fulltime
Read More
Arrow Right