CrawlJobs Logo

Senior Observability Infrastructure Engineer

adyen.com Logo

Adyen

Location Icon

Location:
Netherlands , Amsterdam

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for an experienced Observability Infrastructure Engineer to join our Platform Engineering organization. You will be part of the team responsible for building and running Observability pillars on premise and on Kubernetes. Our systems collect, process, and store the logs, metrics, and traces that allow hundreds of product teams to monitor their services in real time. This is a role for a builder and a problem solver who enjoys deep technical troubleshooting across distributed systems and then turns recurring issues into automated, repeatable solutions. You will work in a large-scale environment where we manage petabytes of data and thousands of servers. We are currently in the middle of a major transformation: focusing on automation of operations and enabling self service for our users.

Job Responsibility:

  • Build the next generation of our platform: Design and implement the future architecture of our logging and metrics systems.
  • Own infrastructure operations: You will take full ownership of our hybrid infrastructure, managing the lifecycle of over 1,500 servers across both bare-metal and Kubernetes environments.
  • Automate to reduce toil: You will write code in Go or Python to eliminate manual operational tasks.
  • Optimize for scale and performance: You will dive deep into performance bottlenecks within our distributed tracing and logging pipelines.
  • Reliability and Engineering: You will participate in on-call rotations, but your primary focus will be engineering solutions that stop alerts from firing in the first place.

Requirements:

  • 10+ years of experience in the observability domain or in a relevant platform/infrastructure domain.
  • Observability Stack Expertise: You have hands-on experience operating core telemetry data stores at scale e.g. Elasticsearch/Opensearch/VictoriaLogs/Clickhouse for logging, Prometheus/ VictoriaMetrics for metrics and Grafana Tempo for distributed tracing.
  • Linux Experience: You understand the operating system at a kernel level and can debug complex networking, file system, and performance issues on both bare metal and virtualized hardware .
  • Production Kubernetes Experience: Proven hands-on experience operating, and troubleshooting production workloads on Kubernetes (on-prem and/or cloud), including strong day-to-day use of kubectl and Kubernetes primitives (e.g. Namespaces, Pods, Deployments/StatefulSets, Services, Ingress, ConfigMaps/Secrets)
  • Software Engineering Mindset: You are proficient in Go or Python and do not just write scripts
  • you build tools and automation platforms that treat infrastructure as code.

Nice to have:

  • Experience with large scale, multi tenant isolation and quota or cost governance approaches for telemetry platforms.
  • Familiarity with regulated environments where security, audibility, and data handling requirements shape platform design decisions.

Additional Information:

Job Posted:
May 05, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Observability Infrastructure Engineer

Senior Software Engineer, Observability

The Observability team at Airtable ensures that engineers have the tools they ne...
Location
Location
United States , San Francisco; New York; Seattle
Salary
Salary:
196000.00 - 270000.00 USD / Year
airtable.com Logo
Airtable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience
  • 3+ years focused on observability or infrastructure at scale
  • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
  • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
  • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
  • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
  • Experience mentoring engineers and collaborating across multiple teams
  • Strong communication skills
  • Eagerness to own high-impact initiatives
  • Proven ability to balance short-term fixes with long-term strategic vision
Job Responsibility
Job Responsibility
  • Architect and scale core observability systems
  • Lead the design and evolution of logging, metrics, and tracing pipelines
  • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
  • Guide and mentor a growing team of infrastructure engineers
  • Define and uphold coding standards and operational excellence
  • Partner with Deploy Infrastructure, Service Orchestration, and Product teams
  • Align infrastructure decisions with business goals
  • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
  • Optimize performance and cost of large-scale data pipelines
  • Shape the observability roadmap
What we offer
What we offer
  • Opportunity to receive benefits
  • Restricted stock units
  • May include incentive compensation
  • Comprehensive benefit offerings
  • Fulltime
Read More
Arrow Right

Senior Frontend Infrastructure Engineer

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in frontend development with one of the major frameworks
  • Strong expertise in architecting frontend infrastructure
  • Proficiency with tools and workflows like GitHub Actions, lint tools etc.
  • Experience with monorepo workspaces (e.g Lerna, Nx)
  • Experience with optimizing build times and creating efficient workflows
  • Strong understanding of HTML5, CSS3, SCSS, JavaScript (ES6+), and TypeScript
  • Experience with RESTful APIs and WebSockets
  • Strong understanding of UI/UX best practices
  • Experience with unit testing and frontend testing frameworks (e.g., Jest, Playwright)
  • Strong problem-solving skills and attention to detail
Job Responsibility
Job Responsibility
  • Design and maintain scalable web applications
  • Implement and optimize design systems
  • Improve build processes for efficiency
  • Create automated workflows
  • Write clean and testable code
  • Collaborate with cross-functional teams to deliver features
  • Debug and upgrade software
  • Stay current with emerging frontend technologies
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...
Location
Location
United States , Oakland
Salary
Salary:
164000.00 - 208000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, or equivalent coursework
  • At least 3 years of experience building logging, metrics, and tracing infrastructure
  • Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
  • Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
  • At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
  • Excellent communication and collaboration skills
  • Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role. Due to federal contract requirements, Everlaw may only hire US citizens for this position.
Job Responsibility
Job Responsibility
  • Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
  • Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
  • Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
  • Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
  • Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
  • Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
  • Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base.
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Uptown Oakland, just steps from the BART line and dozens of restaurants and walking distance to Lake Merritt
  • Fulltime
Read More
Arrow Right

Senior Director of Engineering, Infrastructure

Senior Director of Engineering role leading the Infrastructure group at PagerDut...
Location
Location
United States , San Francisco
Salary
Salary:
233000.00 - 392000.00 USD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in senior engineering leadership roles, managing multiple layers of managers
  • Significant experience as a hands-on technical contributor earlier in your career
  • Deep knowledge of modern infrastructure and software delivery: high availability, distributed systems, public cloud (AWS), microservices, containers, CI/CD pipelines, observability, and automation
  • Track record of building and scaling high-performing, inclusive engineering organizations
Job Responsibility
Job Responsibility
  • Define and drive the multi-year strategy for PagerDuty's infrastructure and platform foundations
  • Strong ownership of PagerDuty's reliability patterns and practices
  • Bar raiser for all engineering functions
  • Lead, mentor, and scale a diverse team of Engineering Managers, Senior Managers, and technical leaders across multiple geographies
  • Ensure the reliability, scalability, and security of PagerDuty's global SaaS platform
  • Partner with peers in Engineering, Product, and Security to deliver large cross-functional initiatives
  • Champion engineering excellence: CI/CD maturity, observability best practices, operational rigor, and incident readiness
  • Manage budgets, headcount, and vendor relationships to optimize infrastructure investments
  • Represent Infrastructure externally with customers and partners, and internally with executives, as a trusted voice on technical and business tradeoffs
  • Foster a culture of inclusion, accountability, collaboration, and growth
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Senior Director of Engineering, Infrastructure

Senior Director of Engineering to lead the Infrastructure group at PagerDuty, se...
Location
Location
United States , Atlanta
Salary
Salary:
233000.00 - 392000.00 USD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in senior engineering leadership roles, managing multiple layers of managers
  • Significant experience as a hands-on technical contributor earlier in your career
  • Deep knowledge of modern infrastructure and software delivery: high availability, distributed systems, public cloud (AWS), microservices, containers, CI/CD pipelines, observability, and automation
  • Track record of building and scaling high-performing, inclusive engineering organizations
Job Responsibility
Job Responsibility
  • Define and drive the multi-year strategy for PagerDuty's infrastructure and platform foundations
  • Strong ownership of PagerDuty's reliability patterns and practices
  • Lead, mentor, and scale a diverse team of Engineering Managers, Senior Managers, and technical leaders across multiple geographies
  • Ensure the reliability, scalability, and security of PagerDuty's global SaaS platform
  • Partner with peers in Engineering, Product, and Security to deliver large cross-functional initiatives
  • Champion engineering excellence: CI/CD maturity, observability best practices, operational rigor, and incident readiness
  • Manage budgets, headcount, and vendor relationships to optimize infrastructure investments
  • Represent Infrastructure externally with customers and partners, and internally with executives
  • Foster a culture of inclusion, accountability, collaboration, and growth
What we offer
What we offer
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Paid volunteer time off: 20 hours per year
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer - Postgres

ClickHouse is expanding its cloud data platform across AWS, GCP, and Azure—addin...
Location
Location
United States
Salary
Salary:
140000.00 - 208000.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in SRE, DevOps, or infrastructure engineering, with a track record of running distributed, production-grade systems
  • Solid understanding of Postgres operations, scaling, and performance tuning
  • Deep hands-on experience across AWS, with exposure to GCP and Azure
  • comfortable navigating multi-cloud topologies
  • Proficient with Terraform, Kubernetes, and container-based infrastructure
  • Strong Go development skills (or willingness to write and own production Go code)
  • Familiar with tools like Prometheus, Grafana, Loki, OpenTelemetry, or equivalents
  • Deep understanding of SLOs, incident response, and continuous improvement in service reliability
  • You operate with a founder’s mentality — hands-on, resourceful, and willing to dive deep to get things done. You take pride in hard work, autonomy, and shipping impactful systems
Job Responsibility
Job Responsibility
  • Lead reliability and operations for ClickHouse’s Postgres integration — upgrades, patching, maintenance, and scaling
  • Design and implement automation for provisioning, deployments, and service lifecycle management across AWS, GCP, and Azure
  • Develop infrastructure-as-code using Terraform and modern CI/CD tooling to ensure consistent, repeatable deployments
  • Contribute Go-based tooling and services that improve automation, observability, and developer experience
  • Own observability and monitoring, ensuring robust alerting, metrics, and tracing across environments
  • Drive incident management and postmortem practices that strengthen reliability and learning loops
  • Collaborate cross-functionally with platform, networking, and product teams to improve service operability
  • Mentor and enable engineers, helping the team scale effectively as customer adoption grows
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Software Engineer (Infrastructure) - HyperDX

Join us in revolutionizing Observability for Developers! We’re on a mission to r...
Location
Location
Netherlands
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of backend engineering experience
  • Strong TypeScript and Node.js skills (bonus for additional languages)
  • Deep understanding of APIs, event-driven systems, and high-throughput data pipelines
  • Proficiency in SQL and experience working with analytical databases (ClickHouse experience a plus)
  • Experience with Docker and Kubernetes, plus Helm for managing production deployments
  • Experience with infrastructure-as-code (Terraform, Pulumi, or similar)
  • Familiarity with CI/CD pipelines, monitoring systems, and production-grade alerting practices
  • A passion for building reliable, maintainable, cloud-native systems
Job Responsibility
Job Responsibility
  • Build the core platform: Design and implement backend systems and APIs that power HyperDX, enabling engineers to ingest, query, and analyze observability data at massive scale
  • Scale deployments and infrastructure: Architect, deploy, and maintain cloud-native systems that ensure reliability, scalability, and performance. You’ll use Kubernetes, Helm, and infrastructure-as-code to make deployments simple and resilient
  • Ensure maintainability and operational excellence: Define best practices for CI/CD, monitoring, logging, and alerting. Drive automation across testing, scaling, and incident response to keep our platform healthy and developer-friendly
  • Engineer for scale: Design and operate ingestion and data processing pipelines that remain performant, resilient, and observable—even as we grow to petabyte-level workloads
  • Engage with the community: Collaborate with open-source contributors and customers, solve their challenges, and incorporate their feedback into our roadmap
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right