CrawlJobs Logo

Observability Operations Engineer

technologent.com Logo

Technologent

Location Icon

Location:
United States , Phoenix

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for a Senior Systems Engineer – Observability & Infrastructure to support Linux-based infrastructure and large-scale containerized environments within an enterprise technology ecosystem. This role focuses on platform stability, Elasticsearch administration, Kubernetes operations, and observability maturity across distributed systems.

Job Responsibility:

  • Manage and support Linux-based infrastructure and containerized environments (Docker, Kubernetes)
  • Administer, scale, and optimize large-scale Elasticsearch clusters, including performance tuning and troubleshooting
  • Provide end-to-end system administration support across development, staging, and production environments
  • Perform deep-dive troubleshooting across infrastructure, networking, and observability components
  • Support ITSM processes, including incident, change, and problem management
  • Manage hardware and software lifecycle activities
  • Ensure platform stability, high availability, and performance optimization
  • Collaborate with platform engineering and SRE teams to enhance observability capabilities
  • Support deployment, upgrades, and operational governance of monitoring and logging tools
  • Contribute to automation and continuous operational improvements

Requirements:

  • Deep knowledge of Linux systems administration
  • Strong hands-on experience with Docker and Kubernetes in production environments
  • Experience administering Elasticsearch in enterprise-scale environments
  • Strong troubleshooting and root cause analysis skills across distributed systems
  • Solid understanding of networking fundamentals (TCP/IP, DNS, routing, load balancing, firewalls)
  • Experience supporting ITSM processes and infrastructure lifecycle management

Nice to have:

  • Familiarity with observability concepts such as distributed tracing, metrics, monitoring, and logging
  • Experience managing large-scale Elasticsearch deployments
  • Knowledge of OpenTelemetry / OpenTracing
  • Hands-on experience with observability and monitoring tools such as: Jaeger, Kibana, Grafana, Prometheus, Splunk, Dynatrace, Kafka
  • Experience with Rancher or similar Kubernetes management platforms

Additional Information:

Job Posted:
February 19, 2026

Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Observability Operations Engineer

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Observability

The Observability team at Airtable ensures that engineers have the tools they ne...
Location
Location
United States , San Francisco; New York; Seattle
Salary
Salary:
196000.00 - 270000.00 USD / Year
airtable.com Logo
Airtable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience
  • 3+ years focused on observability or infrastructure at scale
  • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
  • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
  • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
  • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
  • Experience mentoring engineers and collaborating across multiple teams
  • Strong communication skills
  • Eagerness to own high-impact initiatives
  • Proven ability to balance short-term fixes with long-term strategic vision
Job Responsibility
Job Responsibility
  • Architect and scale core observability systems
  • Lead the design and evolution of logging, metrics, and tracing pipelines
  • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
  • Guide and mentor a growing team of infrastructure engineers
  • Define and uphold coding standards and operational excellence
  • Partner with Deploy Infrastructure, Service Orchestration, and Product teams
  • Align infrastructure decisions with business goals
  • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
  • Optimize performance and cost of large-scale data pipelines
  • Shape the observability roadmap
What we offer
What we offer
  • Opportunity to receive benefits
  • Restricted stock units
  • May include incentive compensation
  • Comprehensive benefit offerings
  • Fulltime
Read More
Arrow Right

Senior Observability Engineer

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Site Reliability, DevOps, or Platform Engineering with a focus on observability
  • Proven expertise with at least one major observability platform (e.g., Prometheus, Victoria Metrics, OpenSearch)
  • Hands-on experience with Kubernetes, including deep knowledge of controllers, operators, and Helm
  • Experience writing Kubernetes controllers (controller-runtime, KubeBuilder)
  • Strong programming skills in Go or Python (Rust is a plus)
  • Experience designing, scaling, and operating observability systems at enterprise scale
  • Familiarity with at least one major cloud provider (AWS, Azure, or GCP)
  • Strong understanding of distributed systems, telemetry pipelines, and instrumentation standards (e.g., OpenTelemetry)
  • Excellent communication skills with the ability to explain complex topics to diverse stakeholders
Job Responsibility
Job Responsibility
  • Design, implement, and maintain observability features such as Alerting, SLOs, Reporting, and Synthetic Tests
  • Manage and scale OpenTelemetry Collectors and other observability agents across Kubernetes environments
  • Write and maintain Kubernetes Controllers using frameworks like controller-runtime and KubeBuilder
  • Operate and optimize the internal Coralogix account, ensuring proper usage, cost efficiency, and best practices adoption
  • Define and enforce observability guidelines and standards across the organization
  • Partner with engineering teams to embed observability by default into products and services
  • Control observability-related costs while maximizing performance, visibility, and value
  • Contribute to upstream projects such as OpenTelemetry, helping shape industry standards
  • Explore and implement cutting-edge observability technologies, including eBPF-based approaches
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer (Observability)

You will enable our machine learning team, data engineers, and applications team...
Location
Location
United States , New York
Salary
Salary:
180000.00 - 225000.00 USD / Year
evolutioniq.com Logo
EvolutionIQ
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of DevOps experience
  • Extensive experience designing and running production systems on GCP
  • Deep exposure and familiarity to networking concepts, Kubernetes clusters, Docker, containerized development, Terraform, Helm, Dagster (DE), and ArgoCD
  • Experience with production operations and working with product engineering teams
  • Experience integrating with SIEM and security software, such as vulnerability scanners
  • You know the critical questions to ask in order to understand a client’s business problem and can show the business impact of your technical solutions
  • Team player who is solutions-oriented
  • You have crisp written and verbal communication skills
Job Responsibility
Job Responsibility
  • Improve and further our observability stack across GCP infrastructure and applications
  • Drive consistency and operational excellence across all teams
  • Enable the data engineering team to use Dagster efficiently
  • Leverage tools like Terraform, Github Actions, Helm, and ArgoCD to build efficient infrastructure as code pipelines
  • Ensure industry standard security controls in our cloud environments
  • Institute culture of reliability in a federated ownership environment
What we offer
What we offer
  • Medical, dental, vision, short & long-term disability, life insurance and AD&D, and 401k matching
  • Additional family, wellness, and pet benefits
  • Paid time off and sick leave, 100% paid parental leave (16 weeks for primary caregivers and 12 weeks for secondary caregivers)
  • We offer a flexible schedule for new parents returning to work
  • Catered lunches, happy hours, pet-friendly spaces, and monthly technology stipend
  • $1,000/year for each employee for professional development, as well opportunities for tuition reimbursement
  • An annual bonus plan and company equity plan (RSUs) are also included in our compensation package
  • Fulltime
Read More
Arrow Right

Monitoring & Observability Engineer

The Monitoring & Observability Engineer is a senior level position responsible f...
Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of relevant experience in an Engineering & IT role
  • At least 2+ years of hands-on working experience in: Strong understanding of UI/UX principles and best practices
  • Proficient in JavaScript, TypeScript, HTML, CSS, React, and Node.js
  • Experience with backend technologies and databases (e.g., MongoDB)
  • Experience with Python Programming
  • Experience with version control systems (e.g., Git)
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration skills
  • Create modular and reusable React components to streamline development and maintain consistency across the application
  • Continuously improve existing applications, addressing bugs, and implementing new features
Job Responsibility
Job Responsibility
  • Drive the best-in-class monitoring using a range of tools across all regions of Global Consumer bank
  • Drive POCs and incubate new features and capabilities
  • Be forward looking and ensure long term strategic success
  • Work closely with the monitoring operations teams, production support, performance test teams, operations, application owners and application owners to deliver best-in-class monitoring
  • Explain complicated performance bottlenecks to stakeholders
  • Understand complicated application architecture, including Java app servers, Web Servers, Cloud (PCF, AWS, Google), Kubernetes, TIBCO, mainframe
  • Build advanced dashboards and queries
  • Be a subject matter expert for the Global Consumer Bank, including conducting brown bags and office hours
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Fulltime
Read More
Arrow Right

Federal Observability Engineer

You will be part of a larger technical team, working as an Observability Enginee...
Location
Location
United States , HILL AFB
Salary
Salary:
105500.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • US Citizenship Required
  • Secret Clearance Required
  • DD8750 - Security Plus or higher Security Certification (CISSP, CASP, etc)
  • Bachelor's degree preferred or Associate degree holder (technical field) with 6-8 years working experience in related fields
  • Strong understanding of cloud computing platforms (AWS, Azure, GCP)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Proficiency in scripting languages (Python, Go, Bash)
  • Experience with SQL and NoSQL databases
  • Knowledge of networking protocols (TCP/IP, HTTP)
  • Proven experience with the OpsRamp platform is a strong plus
Job Responsibility
Job Responsibility
  • Designing, implementing, and maintaining observability infrastructure in an OpsRamp environment
  • Working as part of a larger technical team supporting HPE's PCE environment and Cloud infrastructure for a Federal Customer
  • Configuring and managing data sources, defining and monitoring key performance indicators (KPIs), and analyzing performance trends
  • Configuring log collection, aggregation, and analysis within the OpsRamp platform
  • Creating and managing alerts, defining escalation paths, and integrating with incident management systems
  • Developing and implementing automated workflows and remediation actions within the OpsRamp platform
  • Designing and building custom dashboards and reports to provide key insights into system health and performance
  • Integrating OpsRamp with other monitoring and observability tools as needed
  • Ensuring data quality and integrity within the OpsRamp platform
  • Troubleshooting and resolving performance issues, application errors, and other operational problems
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right