This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Observability team at Airtable ensures that engineers have the tools they need to measure performance, monitor reliability, and debug issues in real time. The mission is to provide actionable insights into errors and crashes, fueling a better and more reliable experience for millions of users. The team builds logging, metrics, and tracing systems leveraged by nearly every engineering team. They also work on LLM observability for AI-powered features, providing visibility into prompts, model calls, and RAG components.
Job Responsibility:
Architect and scale core observability systems
Lead the design and evolution of logging, metrics, and tracing pipelines
Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
Guide and mentor a growing team of infrastructure engineers
Define and uphold coding standards and operational excellence
Partner with Deploy Infrastructure, Service Orchestration, and Product teams
Align infrastructure decisions with business goals
Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
Optimize performance and cost of large-scale data pipelines
Shape the observability roadmap
Extend observability to LLM and AI features
Instrument prompts, model calls, and RAG pipelines
Design online and offline evaluation loops for LLM quality
Build dashboards and alerts for AI feature performance
Partner with AI and Product teams to define SLOs for AI features
Requirements:
6+ years of software engineering experience
3+ years focused on observability or infrastructure at scale
Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
Experience mentoring engineers and collaborating across multiple teams
Strong communication skills
Eagerness to own high-impact initiatives
Proven ability to balance short-term fixes with long-term strategic vision
A passion for enabling engineering organizations through reliable, intuitive observability tools
Commitment to measuring success by team velocity and confidence
Nice to have:
Experience with LLM observability for AI-powered features
Experience instrumenting prompts, model calls, and RAG pipelines
Experience designing evaluation loops for LLM quality
Experience building dashboards and alerts for token usage, error rates, guardrail triggers, and model performance
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.