CrawlJobs Logo

Senior Observability Infrastructure Engineer

Netherlands, Amsterdam · Job Posted May 05, 2026
Apply Position
Job Link Share

Job Description

We are looking for an experienced Observability Infrastructure Engineer to join our Platform Engineering organization. You will be part of the team responsible for building and running Observability pillars on premise and on Kubernetes. Our systems collect, process, and store the logs, metrics, and traces that allow hundreds of product teams to monitor their services in real time. This is a role for a builder and a problem solver who enjoys deep technical troubleshooting across distributed systems and then turns recurring issues into automated, repeatable solutions. You will work in a large-scale environment where we manage petabytes of data and thousands of servers. We are currently in the middle of a major transformation: focusing on automation of operations and enabling self service for our users.

Job Responsibility

  • Build the next generation of our platform: Design and implement the future architecture of our logging and metrics systems.
  • Own infrastructure operations: You will take full ownership of our hybrid infrastructure, managing the lifecycle of over 1,500 servers across both bare-metal and Kubernetes environments.
  • Automate to reduce toil: You will write code in Go or Python to eliminate manual operational tasks.
  • Optimize for scale and performance: You will dive deep into performance bottlenecks within our distributed tracing and logging pipelines.
  • Reliability and Engineering: You will participate in on-call rotations, but your primary focus will be engineering solutions that stop alerts from firing in the first place.

Requirements

  • 10+ years of experience in the observability domain or in a relevant platform/infrastructure domain.
  • Observability Stack Expertise: You have hands-on experience operating core telemetry data stores at scale e.g. Elasticsearch/Opensearch/VictoriaLogs/Clickhouse for logging, Prometheus/ VictoriaMetrics for metrics and Grafana Tempo for distributed tracing.
  • Linux Experience: You understand the operating system at a kernel level and can debug complex networking, file system, and performance issues on both bare metal and virtualized hardware .
  • Production Kubernetes Experience: Proven hands-on experience operating, and troubleshooting production workloads on Kubernetes (on-prem and/or cloud), including strong day-to-day use of kubectl and Kubernetes primitives (e.g. Namespaces, Pods, Deployments/StatefulSets, Services, Ingress, ConfigMaps/Secrets)
  • Software Engineering Mindset: You are proficient in Go or Python and do not just write scripts
  • you build tools and automation platforms that treat infrastructure as code.

Nice to have

  • Experience with large scale, multi tenant isolation and quota or cost governance approaches for telemetry platforms.
  • Familiarity with regulated environments where security, audibility, and data handling requirements shape platform design decisions.

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Observability Infrastructure Engineer

8 matching positions

Senior Infrastructure Engineer / Observability Specialist

Location: Remote - Anywhere in Australia (Will be required to travel to Canberra...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
finxl.com.au Logo
FinXL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must be Australian Citizen and be able to obtain Baseline Security Clearance
  • Cloud Expertise: Proficiency in AWS, Azure, or Google Cloud platforms
  • Observability Concepts: Deep understanding of metrics, logs, and traces, including the design of alerting systems
  • Automation: Experience in scripting with Python, Bash, or PowerShell
  • Containerisation: Knowledge of Kubernetes and Docker
  • Soft Skills: Strong negotiation and communication skills to assist with project planning and problem resolution
Job Responsibility
Job Responsibility
  • Configure and support observability tools including Dynatrace, Amazon CloudWatch, Amazon CloudTrail, AWS Config, and Azure Monitor
  • Take ownership of observability monitoring policies, standards, and documentation
  • Perform fault diagnosis and root cause analysis with timely remedial action
  • Drive change and uplift IT teams through education and "evangelising" monitoring concepts
  • Provide support for AWS S3, cloud backups, and AWS RDS databases as needed
  • Lead incident response through to conclusion and manage assigned service queues
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure & Observability

Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands‑on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure‑as‑code (Terraform) and experience in multi‑cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross‑functional communication
  • effective collaboration with both technical and non‑technical stakeholders
Job Responsibility
Job Responsibility
  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open‑source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre‑aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi‑tenant, multi‑cluster, and multi‑cloud designs
  • Contribute improvements back to open source and CNCF‑aligned projects
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off in accordance with local leave policies
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure & Observability

We are building a next-generation observability and cloud platform that is high-...
Location
Location
United Kingdom , Cambridge
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross-functional communication
  • effective collaboration with both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open-source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi-tenant, multi-cluster, and multi-cloud designs
  • Contribute improvements back to open source and CNCF-aligned projects
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off work for vacation and other personal reasons
  • Fulltime
Read More
Arrow Right
New

Senior Infrastructure Engineer – End User Compute (OS Engineering)

Wells Fargo is seeking a Senior Infrastructure Engineer -OS engineering/Azure Cl...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
June 21, 2026
Flip Icon
Requirements
Requirements
  • 4+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 4+ years in infrastructure engineering with strong focus on Windows OS engineering in large enterprise environments
  • Deep understanding of Windows internals (services, kernel components, memory, policies, drivers, certificates, update service)
  • Strong hands-on experience with modern endpoint management technologies: Azure AD / Entra ID
  • Zero Trust
  • Certificate services
  • Advanced PowerShell engineering skills, including: Modular script design
  • Strong logging & telemetry
  • Error handling & retry logic
  • Idempotent execution patterns
Job Responsibility
Job Responsibility
  • Lead or participate in high level technical concepts spanning technology and business
  • Develop specifications for complex infrastructure systems, design and test solutions
  • Contribute to the testing of business, application and technical infrastructure requirements
  • Drive solutions to reduce recovery
  • Review and analyze solutions for cloud security, secrets management and key rotations
  • Design, code, test, debug and document programs using Agile development practices
  • Design complex system upgrades
  • Resolve troublesome trends as they develop
  • Develop a long range plan designed to resolve problems and prevent them from recurring
  • Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
  • Fulltime
!
Read More
Arrow Right
New

Senior Infrastructure Engineer

Location
Location
India , Hyderabad
Salary
Salary:
Not provided
alterdomus.com Logo
Alter Domus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7-8 years of experience in Infrastructure administration
  • Strong hands-on experience with Linux systems (RHEL, Ubuntu, SUSE)
  • Proven expertise in Linux system administration, performance tuning, and troubleshooting
  • Experience with monitoring and alerting tools such as Prometheus, Grafana, Zabbix, Nagios, ELK Stack, or Splunk
  • Strong understanding of observability, logging, and system health monitoring
  • Proficiency in scripting and automation tools (Bash, Python, Ansible)
  • Good understanding of networking fundamentals (TCP/IP, DNS, HTTP)
  • Familiarity with virtualization and cloud platforms
  • Experience in capacity planning and performance optimization
  • Strong troubleshooting and analytical skills
Job Responsibility
Job Responsibility
  • Manage Linux infrastructure (RHEL, CentOS, Ubuntu, SUSE), including installation, patching, upgrades, and troubleshooting
  • Perform OS-level monitoring and tuning (CPU, memory, disk, processes, kernel)
  • Maintain file systems and storage configurations (LVM, NFS, ext4, xfs)
  • Design and manage monitoring and alerting frameworks
  • Configure and tune alerts for system, application, and network performance
  • Ensure high availability and continuous system monitoring
  • Administer monitoring tools such as Prometheus, Grafana, Zabbix, Nagios, ELK, and Splunk
  • Build dashboards and track system health, KPIs, and performance trends
  • Perform proactive health checks and capacity planning
  • Troubleshoot complex issues and perform root cause analysis (RCA)
What we offer
What we offer
  • Support for professional accreditations such as ACCA and study leave
  • Flexible arrangements, generous holidays, plus an additional day off for your birthday
  • Continuous mentoring along your career progression
  • Active sports, events and social committees across our offices
  • 24/7 support available from our Employee Assistance Program
  • The opportunity to invest in our growth and success through our Employee Share Plan
  • Fulltime
Read More
Arrow Right

Senior Observability Engineer

As a Senior Observability Engineer, you’ll lead the design and ongoing improveme...
Location
Location
New Zealand
Salary
Salary:
Not provided
foodstuffs-si.co.nz Logo
Foodstuffs South Island Limited
Expiration Date
June 21, 2026
Flip Icon
Requirements
Requirements
  • Deep hands-on experience with enterprise observability platforms (e.g. Grafana Cloud)
  • Strong Prometheus experience, including querying and alerting approaches
  • Experience implementing code-driven platform configuration (e.g. GitOps, Terraform, CI/CD pipelines)
  • Solid understanding of cloud environments across AWS and/or GCP
  • Exposure to Kubernetes and modern platform or infrastructure environments
  • Experience using scripting or infrastructure tooling to support scalable operations
Job Responsibility
Job Responsibility
  • Define and embed standards for monitoring, alerting, and dashboard design across teams
  • Drive the transition from legacy monitoring tools to modern, cloud-based platforms
  • Develop and manage data ingestion pipelines across cloud, infrastructure, and network environments
  • Build and maintain dashboards that provide meaningful operational and performance insights
  • Design alerting frameworks with clear routing, prioritisation and ITSM integration
  • Implement automation and code-driven configuration to improve consistency and reduce manual effort
What we offer
What we offer
  • Flexible working
  • Additional paid parental leave
  • Free period products
  • Financial health checks
  • Southern Cross health insurance for you and your family after a qualifying period
  • Discounts at local gyms
  • Access to online wellbeing tools
  • Prayer and privacy room onsite
  • Continuous learning and professional growth
  • Access to our extensive online training library
!
Read More
Arrow Right

Senior Observability Engineer

Our client, a large professional services firm, is looking to hire a Senior Obse...
Location
Location
United States
Salary
Salary:
Not provided
clearbridgetech.com Logo
ClearBridge Technology Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on Grafana dashboard and data source experience
  • Experience with Grafana Loki and LogQL
  • Experience deploying and configuring OpenTelemetry Collector
  • Grafana Alloy experience preferred
  • Logstash experience preferred, especially syslog parsing, Grok, filtering, and normalization
  • Experience collecting syslog from network devices
  • Experience with Cisco or similar network device log formats
  • Experience collecting network telemetry such as CPU, memory, uptime, interface status, bandwidth, errors, discards, device health, and alarms
  • Familiarity with SNMP, syslog, network device CLI configuration, and collector-based monitoring
  • Ability to troubleshoot ingestion, parsing, dropped logs, pipeline health, and telemetry flow issues
Job Responsibility
Job Responsibility
  • Support a short-term Grafana-based observability migration proof of concept
  • Collect, parse, normalize, and validate log data and network telemetry from infrastructure devices such as routers, switches, firewalls, wireless controllers, and other network appliances
  • Support collector deployment, syslog ingestion, telemetry collection, dashboard validation, troubleshooting, documentation, and handoff for a multi-site proof of concept
  • Fulltime
Read More
Arrow Right

Azure Infrastructure Senior Engineer

Codec is looking for a Senior Azure Infrastructure Engineer who thrives on turni...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
codec.uk Logo
Codec UK
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of hands-on experience with Microsoft Azure infrastructure in production environments
  • 5+ years overall experience in IT infrastructure or cloud engineering roles
  • Strong hands-on proficiency with Infrastructure as Code: Terraform, Bicep, and PowerShell
  • Practical experience building and maintaining CI/CD pipelines using Azure DevOps Pipelines and/or GitHub Actions for infrastructure deployments
  • Solid understanding of IaC testing approaches, deployment strategies, and secret management using Azure Key Vault
  • Good working knowledge of Azure core infrastructure services: Virtual Networks, VPN/ExpressRoute, Azure Firewall, Network Security Groups, Load Balancers, Virtual Machines, Storage accounts, and Compute services
  • Working knowledge of Microsoft Entra ID: conditional access policies, identity management, Privileged Identity Management (PIM), and SSO configuration
  • Working knowledge of the Microsoft Defender suite (Defender for Cloud, Defender for Endpoint, Defender for Identity)
  • Familiarity with Microsoft Well-Architected Framework (WAF) and Cloud Adoption Framework (CAF) principles
  • Strong collaboration skills
Job Responsibility
Job Responsibility
  • Implement Azure infrastructure solutions based on designs and architectural patterns
  • Build, maintain, and continuously improve Infrastructure as Code using Terraform, Bicep, and PowerShell
  • Design and manage CI/CD pipelines for infrastructure deployment using Azure DevOps Pipelines and GitHub Actions
  • Implement automated testing for IaC deployments and ensure robust secret management practices
  • Configure and manage Microsoft Entra ID components
  • Implement and configure the Microsoft Defender suite
  • Deploy and configure core Azure infrastructure services
  • Collaborate closely with Project Managers to ensure tasks are properly tracked
  • Work alongside the Cloud Solution Architect during customer engagements
  • Produce and maintain clear, comprehensive technical documentation and runbooks
  • Fulltime
Read More
Arrow Right