CrawlJobs Logo

Airflow Reliability Engineer

India, Hyderabad · Job Posted February 01, 2026
Apply Position
Job Link Share

Job Description

As an Airflow Reliability Engineer on the Customer Reliability Engineering (CRE) team at Astronomer, you will have the opportunity to become an Apache Airflow expert, learning directly from leaders of the Airflow project. You’ll provide Apache Airflow expertise directly to customers to help them make the best possible use of our managed Airflow service. CRE is Astronomer’s support team. Because our customers are sophisticated organizations who need and expect high levels of expertise to help them keep mission critical uses of Apache Airflow working consistently, we look a little different from most support teams. Nearly every ticket you will work requires an intersection of strong technical knowledge and customer empathy to understand what the customer needs and how to get them there. Every day is a new challenge and a new thing to learn. This role is based in Hyderabad and requires working in shifts, typically early morning or evening IST; the exact schedule will be set during hiring.

Job Responsibility

  • Learn and build expertise across several software engineering disciplines, including: Airflow and data engineering, Kubernetes, Cloud Engineering
  • Gain exposure to the big picture
  • learn about product, engineering, customer relationship management, and more
  • Solve challenging Airflow problems for our customers
  • Spend up to 20% of your time on side projects that contribute to Astronomer’s overall success
  • Work on a modern, sophisticated, cloud-native product
  • Work directly with our customers’ data engineers, system admins, DevOps teams, and management
  • Provide feedback from your experience that can shape the direction of the Airflow project
  • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance
  • Participate remotely within a fully distributed team
  • Help maintain 24x7 coverage through a specified 6-hour pager period during your work day
  • Participate in paid on-call rotation for weekend coverage

Requirements

  • 5 years of professional experience (any industry)
  • 3 years of experience with Python
  • 1+ year with Apache Airflow
  • Experience with Kubernetes/Docker/Container
  • Customer Support experience
  • Experience working with a distributed system with any major cloud provider (AWS, GCP, Azure)
  • Problem-solving and troubleshooting abilities
  • Work well with autonomy and independence
  • Strong written and verbal communication for connecting with our customers over our ticketing system and through Zoom

Nice to have

  • Familiarity with SQL and PostgreSQL
  • Familiarity with Databricks, Snowflake, Redshift, dbt, or other similar data engineering tools

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Airflow Reliability Engineer

8 matching positions

Senior Data Reliability Engineer

Your Mission Call of Duty is one of the most iconic and successful video game f...
Location
Location
Canada , Vancouver
Salary
Salary:
Not provided
activision.com Logo
Activision
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of programming experience
  • Extensive experience working in Python
  • familiarity with Go
  • Strong experience with data technologies such as SQL, Spark, and Airflow
  • Hands-on experience building observability systems using tools like OpenTelemetry, Prometheus, Loki, and Grafana
  • Experience with dashboarding and alerting for production systems
  • Secure automation of testing and deployments using GitHub Actions / Workflows (GitOps)
  • Experience with Linux system administration in production environments
  • Cloud-native deployment experience using Kubernetes, Helm, and ArgoCD
  • Experience supporting model deployments (batch and online APIs)
Job Responsibility
Job Responsibility
  • Create the ML Data pipeline used for our models including building the ML templates that are used, the observability of our models, the metrics and KPIs used to monitor their efficacy, and the automated retraining required as the data drifts
  • Design and operate large-scale, highly-available data pipelines and platforms for high-volume game telemetry
  • Ensure the integrity, trustworthiness, and quality of Anti-Cheat data
  • Partner closely with Machine Learning teams to support batch, streaming, online inference workflows, automated testing of ML artifacts, and observability and maintenance of automated deployment pipelines
  • Define and maintain GitOps workflows for secure, automated testing, integration, and deployment
  • Build comprehensive observability (metrics, logs, dashboards, alerts) into data pipelines and services
  • Own operational excellence, including incident response, root-cause analysis, and post-mortems
  • Contribute to deployment and release strategies such as canary, blue/green, and shadow deployments
What we offer
What we offer
  • Medical, dental, vision, health savings account or health reimbursement account, healthcare spending accounts, dependent care spending accounts, life and AD&D insurance, disability insurance
  • 401(k) with Company match, tuition reimbursement, charitable donation matching
  • Paid holidays and vacation, paid sick time, floating holidays, compassion and bereavement leaves, parental leave
  • Mental health & wellbeing programs, fitness programs, free and discounted games, and a variety of other voluntary benefit programs like supplemental life & disability, legal service, ID protection, rental insurance, and others
  • If the Company requires that you move geographic locations for the job, then you may also be eligible for relocation assistance
  • Fulltime
Read More
Arrow Right

Data Pipeline Engineer +Airflow

We are looking for a Data Pipeline Engineer to design, build, and operate scalab...
Location
Location
India , Pune City
Salary
Salary:
Not provided
votredircom.fr Logo
Wissen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with dbt
  • Strong hands-on experience with Apache Spark
  • Experience with Dremio/Trino or similar lakehouse query engines
  • Experience with Airflow and/or Dagster
  • Understanding of data catalogs and lineage (e.g., OpenLineage, DataHub, Apache Polaris, openlineage)
  • Proficiency in Python
  • Experience with Git-based development and CI/CD
Job Responsibility
Job Responsibility
  • Build and maintain data transformation pipelines using Dbt/Spark
  • Develop and optimize large-scale/CPU intensive data processing using Apache Spark/Dremio
  • Orchestrate workflows using Airflow and/or Dagster
  • Implement data quality checks, testing, and monitoring for pipelines
  • Support schema evolution, backfills, and incremental processing
  • Ensure pipelines meet SLAs for freshness, reliability, and performance
  • Expertise/working knowledge in Dremio (semantic layer, virtual datasets, Reflections)
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Archer is an aerospace company based in San Jose, California building an all-ele...
Location
Location
United States , San Jose
Salary
Salary:
133400.00 - 200000.00 USD / Year
archer.com Logo
Archer Aviation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or a similar role with a strong focus on operational excellence
  • Deep expertise in Amazon EKS, including cluster provisioning, management, and troubleshooting
  • Extensive experience with observability tools and practices, including Prometheus, Grafana, ELK stack, or similar
  • Proven track record in designing and implementing robust data pipelines (e.g., Kafka, Airflow, Spark)
  • Strong background in CI/CD methodologies and tools (e.g., Jenkins, GitLab CI, ArgoCD)
  • Expert-level knowledge of cloud platforms (AWS preferred), including infrastructure-as-code principles
  • Comprehensive understanding of security best practices for cloud environments, applications, and data
  • Proficiency in Docker for containerization and orchestration
  • Advanced scripting and programming skills in Python, Bash, and PowerShell
  • Solid understanding of networking concepts, distributed systems, and operating systems
Job Responsibility
Job Responsibility
  • Implement and maintain the infrastructure and pipeline required for an internal LLM-powered chat service, potentially leveraging platforms like OpenRouter or similar alternatives
  • implement and maintain highly available, scalable, and secure cloud-native infrastructure on Amazon Elastic Kubernetes Service (EKS)
  • Develop and implement comprehensive observability strategies, including monitoring, logging, and alerting, to ensure the health and performance of our systems
  • Architect and optimize data pipelines to ensure efficient and reliable data flow across various platforms
  • Drive the continuous improvement of our CI/CD pipelines, promoting best practices for automated testing, deployment, and release management
  • Champion cloud-first strategies, leveraging the full capabilities of cloud platforms for infrastructure, services, and operations
  • Implement and enforce robust security practices across our infrastructure, applications, and data
  • Design and maintain Docker-based containerization solutions for our applications
  • Develop and maintain automation scripts and tools using Python, Bash, and PowerShell
  • Collaborate with development teams to ensure reliability is built into the software development lifecycle from inception
  • Fulltime
Read More
Arrow Right

Data Reliability Engineer

We’re looking for a Data Reliability Engineer to help keep our trading and data ...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
sig.com Logo
Susquehanna International Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in a technical or business discipline or equivalent industry experience of 1+ years
  • Demonstrated experience with Python or equivalent language
  • Excellent analytical & troubleshooting skills, self-motivated and curious
  • Willing to work shift hours, to cover early and late responsibilities (alternating)
  • Experience with Change Management, Incident Management Procedures
  • Experience of technical documentation & support cases
Job Responsibility
Job Responsibility
  • Ensure Platform Reliability - Monitor and maintain trading-critical Airflow DAGs and Python-based pipelines, ensuring jobs run on time and within SLAs
  • Incident Response & Recovery - Triage, troubleshoot, and resolve failures quickly
  • validate downstream impacts and maintain tested rollback/recovery procedures
  • Change & Release Management - Act as a release gatekeeper—review code/config changes, enforce safe deployment standards, and coordinate risk-aware releases via Git(lab) and Octopus Deploy
  • Collaboration & Communication - Partner with quants and engineers to assess change impacts, document runbooks, and communicate operational updates and risks
  • Continuous Improvement - Enhance monitoring, alerting, and automation
  • track KPIs and drive initiatives that strengthen platform resilience and reduce incident recurrence
Read More
Arrow Right

Site Reliability Engineer - Data Platform Operation

Join our Data & AI Platform team as a Site Reliability Engineer (SRE) – Platform...
Location
Location
Brazil , Sao Paulo
Salary
Salary:
Not provided
amaris.com Logo
Amaris Consulting
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Academic background: Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field (minimum 3 years of experience)
  • Experience: 5+ years hands-on with cloud platforms (Azure, AWS, GCP), programming (Bash, PowerShell, Terraform, Python, Java), and Infrastructure as Code (IaC)
  • English language: Professional working proficiency in English and the local language
  • Tools / software: Deep expertise in Azure, Databricks, Unity Catalog, Kubernetes, Helm, Docker, Power BI, Datadog, Grafana, GitHub, Azure DevOps, ArgoCD, Airflow, SSIS, Power Query, and relational/NoSQL databases
  • AI experience: Experience supporting enterprise Data & AI platforms
  • Soft skills: Analytical problem-solving
  • Effective communication and active listening
  • Team player with respect for others
  • Strong troubleshooting and platform monitoring skills
  • Automation (Python, PowerShell, CLI, KQL, Terraform)
Job Responsibility
Job Responsibility
  • Support, manage, and maintain Azure resources: Azure SQL, Synapse, Data Factory, Databricks, Unity Catalog
  • Monitor Azure workloads, troubleshoot incidents, alerts, and performance bottlenecks
  • Implement and manage RBAC, identity & access policies, and compliance controls
  • Optimize Azure cost and performance using Azure Monitor, DataDog, and Cost Management tools
  • Automate tasks using PowerShell, Azure CLI, Terraform, and Python
  • Utilize Git, GitHub Actions, and Airflow for workflow automation
  • Provide L2/L3 support for data pipelines, reporting, and cloud services
  • Conduct incident response, root cause analysis (RCA), and proactive issue resolution
  • Collaborate with Cloud Engineering, Data Engineers, BI Developers, and Cloud Architects
  • Follow ITSM processes: Incident, Change, and Problem Management
What we offer
What we offer
  • An international community bringing together 110+ different nationalities
  • An environment where trust has a central place: 70% of our key leaders started their careers at the first level of responsibility
  • A robust training system with our internal Academy and 250+ available modules
  • A vibrant workplace that frequently gathers for internal events (afterworks, team buildings, etc.)
  • Strong commitments to CSR, notably through participation in our WeCare Together program
Read More
Arrow Right

Managed Airflow Platform (MAP) Support Engineer

Location
Location
Salary
Salary:
Not provided
kloud9.nyc Logo
Kloud9
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • 3+ years of experience in large-scale production-grade platform support, including participation in on-call rotations
  • 3+ years of hands-on experience with cloud platforms like AWS, Azure, or GCP
  • 2+ years of experience developing and supporting data pipelines using Apache Airflow including DAG lifecycle management and scheduling best practices
  • Troubleshooting task failures, scheduler issues, performance bottlenecks managing and error handling
  • Strong programming proficiency in Python, especially for developing and troubleshooting RESTful APIs
  • 1+ years of experience in observability using the ELK stack (Elasticsearch, Logstash, Kibana) or Grafana Stack
  • 2+ years of experience with DevOps and Infrastructure-as-Code tools such as GitHub, Jenkins, Docker, and Terraform
  • 2+ years of hands-on experience with Kubernetes, including managing and debugging cluster resources and workloads within Amazon EKS
  • Exposure to Agile and test-driven development a plus
Job Responsibility
Job Responsibility
  • Evangelize and cultivate adoption of Global Platforms, open-source software and agile principles within the organization
  • Ensure solutions are designed and developed using a scalable, highly resilient cloud native architecture
  • Ensure the operational stability, performance, and scalability of cloud-native platforms through proactive monitoring and timely issue resolution
  • Diagnose infrastructure and system issues across cloud environments and Kubernetes clusters, and lead efforts in troubleshooting and remediation
  • Collaborate with engineering and infrastructure teams to manage configurations, resource tuning, and platform upgrades without disrupting business operations
  • Maintain clear, accurate runbooks, support documentation, and platform knowledge bases to enable faster onboarding and incident response
  • Support observability initiatives by improving logging, metrics, dashboards, and alerting frameworks
  • Advocate for operational excellence and drive continuous improvement in system reliability, cost-efficiency, and maintainability
  • Work with product management to support product / service scoping activities
  • Work with leadership to define delivery schedules of key features through an agile framework
What we offer
What we offer
  • Kloud9 provides a robust compensation package and a forward-looking opportunity for growth in emerging fields
Read More
Arrow Right
New

Senior Data Engineer

We are looking for an experienced Senior Data Engineer to design, build, and opt...
Location
Location
Kosovo
Salary
Salary:
Not provided
valtech.com Logo
Valtech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with Apache Spark and Delta Lake
  • strong programming skills in Python and SQL
  • Proven experience building batch and streaming data pipelines and production-grade data platforms
  • solid understanding of data modeling, data quality, and governance principles
  • Experience with one or more major cloud platforms, with preference for Microsoft Azure / Fabric, as well as AWS or GCP
  • Familiarity with modern data platforms such as Databricks and Snowflake
  • Experience with lakehouse architectures and distributed data systems
  • strong understanding of scalability, reliability, and performance considerations in data pipelines
  • Strong problem-solving skills focused on scalability and reliability
  • collaborative approach to working in cross-functional teams
Job Responsibility
Job Responsibility
  • Design and implement scalable data platforms and pipelines across cloud environments (Azure/Fabric, AWS, GCP, Databricks, Snowflake)
  • developing reliable batch, streaming, and near-real-time pipelines using technologies such as Spark and Delta Lake
  • building ingestion, transformation, and curation workflows for both structured and unstructured data
  • implement modern data architectures including lakehouse patterns and medallion layering (bronze, silver, gold)
  • deliver high-quality datasets that support analytics, machine learning, causal modeling, and optimization systems
  • enable data pipelines for GenAI use cases (including LLMs, RAG pipelines, and vector-based data flows)
  • design scalable logical and physical data models for analytical and operational use cases
  • orchestrate workflows using tools such as Airflow, dbt, Lakeflow, or equivalents
  • apply modern architecture patterns including event-driven and streaming architectures
  • ensure adherence to best practices in data governance, lineage, quality, and access control (RBAC/ABAC)
What we offer
What we offer
  • Health insurance that guarantees fast access to contracted health services
  • Vacation Plan
  • Subsidy for study materials, trainings, conferences and events that will contribute to your development
  • Hybrid Working model
  • Performance Evaluation Process that paves the roadmap for a personal and professional career development
  • Refreshments and fruit in the office
  • Team gatherings and parties organized and subsidized by the company
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Data Products

We are seeking a highly skilled Senior Software Engineer with deep expertise in ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Engineering, or a related field (or equivalent experience).
  • 8+ years of experience in software and/or data engineering with expertise in big data technologies such as Apache Spark, Apache Airflow.
  • Expertise with at least one of the following Apache Druid, StarRocks, and Trino.
  • Strong understanding of SOLID principles and distributed systems architecture.
  • Proven experience in distributed data processing, data warehousing, and real-time data pipelines.
  • Advanced SQL skills, with expertise in query optimization for large datasets.
  • Exceptional problem-solving abilities and the capacity to work independently or collaboratively.
  • Excellent verbal and written communication skills.
  • Experience with cloud platforms such as AWS, GCP, or Azure, and containerization tools like Docker and Kubernetes. (preferred)
  • Familiarity with additional big data technologies, including Hadoop and Kafka.
Job Responsibility
Job Responsibility
  • Design and build APIs and backend services using Spring Boot to support data products and analytics workflows.
  • Write clean, maintainable, and efficient code, ensuring adherence to best practices through code reviews.
  • Design, develop, and maintain data pipelines and ETL workflows using Apache Spark and Apache Airflow.
  • Optimize data storage, retrieval, and processing systems to ensure reliability, scalability, and performance.
  • Develop and fine-tune complex queries and analytics solutions using Druid, Trino, and StarRocks for large-scale datasets.
  • Monitor, troubleshoot, and improve data systems to minimize downtime and maximize efficiency.
  • Partner with data scientists, software engineers, and other teams to deliver integrated, high-quality solutions.
  • Provide technical guidance and mentorship to junior engineers, promoting best practices in software and data engineering.
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources.
  • Local benefits including statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension).
  • Time off in accordance with local leave policies.
  • Fulltime
Read More
Arrow Right