CrawlJobs Logo

Kafka Operations Administrator

United States, Seattle 157500.00 USD / Year · Job Posted March 21, 2026
Apply Position
Job Link Share

Job Responsibility

  • Deploy, configure and manage Kafka clusters and related services to meet SLA requirement
  • Participate in 24x7 on-call rotation to respond to incidents, alerts, and escalations
  • Triage, diagnose, and remediate production incidents
  • coordinate with stakeholders, developers and infrastructure teams
  • Implement automation for provisioning, scaling, server/data backups, and disaster recovery
  • Maintain monitoring, alerting thresholds, dashboards, and Kafka ecosystem health
  • Harden Kafka deployments: configure TLS, ACLs, RBAC, encryption, and vulnerability remediation
  • Perform routine maintenance: Kafka ecosystem upgrades (controllers, brokers, connect, and schema registry), rolling restarts, etc
  • Create and maintain runbooks, runbook automation, and post-incident reports
  • Optimize performance and resource utilization
  • benchmark and tune clusters
  • Support Kafka Connect/Schema Registry service and troubleshoot connector issues
  • Contribute to CI/CD pipeline improvements for infrastructure and deployment automation

Requirements

  • Production-grade Apache Kafka operations experience, managing, maintaining and upgrading Kafka clusters in production environments with a focus on high availability, disaster recovery, fail-over and overall reliability
  • Proficiency in installing and configuring monitoring systems using Grafana (building dashboards), Prometheus, Splunk , JMX metrics
  • Automation and orchestration experience: Terraform , Ansible, Helm, Kubernetes (EKS/AKS/GKE)
  • Strong Linux system administration experience, including troubleshooting, automation and scripting for efficient infrastructure management
  • Experience in Production Support (ITIL processes followed) and participating in 24x7 on-call rotations , documenting incidents/postmortems
  • Experience in supporting JVM tuning, GC Analysis, network and disk I/O diagnostics
  • Experience in TCP/IP, routing, switching and firewall configurations relevant to Kafka operations

Nice to have

  • Deep Kafka performance tuning and capacity planning experience
  • Knowledge of message delivery semantics and guarantees (at-least-once, exactly-once)
  • Cloud-native security/compliance experience (IAM, VPC, KMS, Security Groups)
  • Certifications: Confluent Certified Administrator, AWS/Azure/GCP certifications
  • Experience with Apache Kafka in KRaft mode, including set up, configuration, troubleshooting and cluster management
  • Containerization and Container Orchestration Tools experience: Docker, Kubernetes
  • Experience with CI/CD pipelines and Git-based workflows
  • Experience building custom Kafka connect libraries and understanding of data serialization formats (eg: Avro, JSON)
  • Knowledge of networking concepts across on-prem VMs and cloud environments, ensuring seamless integration and communication between services
  • Strong understanding of topic management and security best practices for streaming platforms: TLS, ACLs, RBAC, encryption at rest/in transit
  • Kafka ecosystem tooling experience: Kafka Connect, Schema Registry

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Kafka Operations Administrator

8 matching positions

Kafka Operations Administrator

Location
Location
United States , Seattle; St. Louis; TX
Salary
Salary:
157500.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Production-grade Apache Kafka operations experience, managing, maintaining and upgrading Kafka clusters in production environments with a focus on high availability, disaster recovery, fail-over and overall reliability
  • Proficiency in installing and configuring monitoring systems using Grafana (building dashboards), Prometheus, Splunk , JMX metrics
  • Automation and orchestration experience: Terraform , Ansible, Helm, Kubernetes (EKS/AKS/GKE)
  • Strong Linux system administration experience, including troubleshooting, automation and scripting for efficient infrastructure management
  • Experience in Production Support (ITIL processes followed) and participating in 24x7 on-call rotations , documenting incidents/postmortems
  • Experience in supporting JVM tuning, GC Analysis, network and disk I/O diagnostics
  • Experience in TCP/IP, routing, switching and firewall configurations relevant to Kafka operations
Job Responsibility
Job Responsibility
  • Deploy, configure and manage Kafka clusters and related services to meet SLA requirement
  • Participate in 24x7 on-call rotation to respond to incidents, alerts, and escalations
  • Triage, diagnose, and remediate production incidents
  • coordinate with stakeholders, developers and infrastructure teams
  • Implement automation for provisioning, scaling, server/data backups, and disaster recovery
  • Maintain monitoring, alerting thresholds, dashboards, and Kafka ecosystem health
  • Harden Kafka deployments: configure TLS, ACLs, RBAC, encryption, and vulnerability remediation
  • Perform routine maintenance: Kafka ecosystem upgrades (controllers, brokers, connect, and schema registry), rolling restarts, etc
  • Create and maintain runbooks, runbook automation, and post-incident reports
  • Optimize performance and resource utilization
  • Fulltime
Read More
Arrow Right
New

Sr Principal Site Reliability Engineer (Sovereign Cloud)

Palo Alto Networks runs a large infrastructure and is one of the largest GCP cus...
Location
Location
Bulgaria , Sofia
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering
  • 7+ years building high availability, scalable cloud-native applications on AWS and GCP
  • BS or MS in Computer Science, a related field, or equivalent professional experience required
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Passion for infrastructure and monitoring as code
  • Solid experience in container workloads and Kubernetes
  • Familiarity with PKI concepts, Networking concepts
  • In-depth knowledge of different security controls ( app-id, user-id, security profile, url category, content, ssl decryption, firewall MFA etc)
  • Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Golang or Python along with shell scripting to automate tasks
Job Responsibility
Job Responsibility
  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate in on-call rotations to support critical business and production systems
  • Lead root cause analysis of critical business and production issues
  • Fulltime
Read More
Arrow Right
New

Devops Lead

We are seeking a highly experienced and motivated Lead DevOps with 10+ years of ...
Location
Location
India , Chennai, Tamil Nadu, India; Pune, Maharashtra, India
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience
  • Deep expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes, OpenShift)
  • Extensive experience and deep understanding of Java, J2EE, Microservices, Spring Boot, and Spring Cloud frameworks
  • Strong experience in designing, implementing, and maintaining robust CI/CD pipelines using GitHub, Ansible, Tekton, Harness
  • Extensive hands-on experience with Infrastructure as Code (IaC) principles and tools (Terraform, Ansible)
  • Expert in implementing and managing monitoring, logging, tracing, and alerting solutions using Prometheus, Grafana, ELK stack, Splunk, ITRS/AppDynamics, Jaeger
  • Proficient in automation
  • Proficient in leveraging AI tools and Generative AI for DevOps
  • Ability to proactively identify system risks, performance bottlenecks, and security vulnerabilities
  • Provide architectural and technical leadership
Job Responsibility
Job Responsibility
  • Drive DevOps culture and best practices
  • Ensure operational excellence of critical systems
  • Design, implement, and maintain CI/CD pipelines
  • Automate provisioning, configuration, and management of infrastructure
  • Implement and manage monitoring, logging, tracing, and alerting solutions
  • Automate repetitive operational tasks
  • Leverage AI tools for troubleshooting and Generative AI for DevOps processes
  • Proactively identify system risks, performance bottlenecks, and security vulnerabilities
  • Provide architectural and technical leadership
  • Install, configure, performance tune, and troubleshoot application and web servers
  • Fulltime
Read More
Arrow Right

Data Engineer

We are looking for a Data Engineer to support and enhance critical data operatio...
Location
Location
United States , Greenville
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working in data engineering or data platform operations roles
  • Strong hands-on knowledge of Python for scripting, automation, and operational support
  • Experience managing or supporting Apache Kafka environments and related streaming data workflows
  • Familiarity with Snowflake administration, performance tuning, and resource management
  • Solid understanding of ETL processes, including data ingestion, transformation, and delivery concepts
  • Experience working with AWS services in support of modern data infrastructure
  • Knowledge of Terraform or similar infrastructure-as-code tools for environment management and automation
Job Responsibility
Job Responsibility
  • Oversee the health and performance of data pipelines that run across Snowflake, Kafka, and connected platforms
  • Investigate operational issues affecting data ingestion, transformation, or downstream delivery and drive timely resolution
  • Maintain stable batch and streaming processes by improving resiliency, uptime, and overall execution efficiency
  • Administer Snowflake resources, including warehouses, databases, permissions, and usage optimization
  • Manage Kafka infrastructure by tuning clusters, topics, partitions, and consumer group behavior for reliable throughput
  • Create and maintain automated solutions for deployment, monitoring, failure recovery, and routine workflow support
  • Develop operational scripts and utilities using Python, Bash, and related tools to reduce manual effort and improve consistency
  • Contribute to CI/CD practices that strengthen the release and maintenance process for data infrastructure
  • Partner with engineering and analytics teams to improve pipeline design, data performance, and delivery accuracy
  • Support data governance, security, compliance, and data quality standards through validation checks and alerting frameworks
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

Sr.DevOps Engineer-VOIS

We are seeking an experienced DevOps professional to provide global IT integrati...
Location
Location
Egypt , Giza
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experienced DevOps or platform engineer with strong hands-on expertise in Kubernetes and containerisation
  • Proficient in working with Docker, Kafka, API platforms, and microservices architectures
  • Comfortable supporting and troubleshooting databases such as PostgreSQL, DynamoDB, or Cassandra
  • Skilled in at least one programming or scripting language such as Python, Ruby, or Go
  • Knowledgeable in UNIX/Linux administration and core infrastructure concepts such as DNS and proxy services
  • Familiar with cloud platforms, with exposure to AWS or Google Cloud being advantageous
  • Able to prioritise multiple incidents and tasks while maintaining service quality and stakeholder trust
  • Collaborative problem-solver who values documentation, knowledge sharing, and continuous improvement
Job Responsibility
Job Responsibility
  • Lead system integration and implementation of DevOps tools across global environments
  • Design, implement, and support Kubernetes- and Docker-based platforms for hosting microservices and APIs
  • Own migration and upgrade activities for tools, platforms, and services, ensuring minimal disruption
  • Maintain application and server availability across UNIX and Windows environments
  • Manage incidents and service requests, ensuring resolution within agreed service level agreements
  • Troubleshoot complex technical issues, including database-related incidents
  • Maintain accurate technical documentation and contribute to the team knowledge base
  • Collaborate with internal stakeholders and external vendors to resolve issues and improve service quality
  • Apply ITIL-aligned service management practices in day-to-day operations
What we offer
What we offer
  • Exposure to global-scale platforms supporting Vodafone operations worldwide
  • Opportunities to work with modern DevOps, container, and cloud technologies
  • Collaborative environment that values innovation, learning, and service excellence
  • Chance to influence platform reliability and customer experience at scale
  • Fulltime
Read More
Arrow Right

DevOps Engineer

We are seeking a skilled and experienced individual to fill a unique role that c...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of relevant experience
  • Hands-on DevOps & Infrastructure Engineering Expertise
  • Secret & Certificate Management: Proven hands-on experience with HashiCorp Vault (installation, configuration, policy management, integrations)
  • Container Orchestration: In-depth hands-on experience with Kubernetes and Helm, including YAML configuration, troubleshooting PODs/Jobs/Deployments, and integrations with secrets management (CyberArk, HashiCorp)
  • Storage Management: Practical experience with Kubernetes PVCs, Persistent Volumes, S3, and/or enterprise NAS solutions (e.g., SONiC NAS)
  • Monitoring & Logging: Strong hands-on experience with Prometheus, Grafana, and the ELK Stack (setup, dashboard creation, query optimization, alert configuration)
  • Scripting & Automation: High proficiency in Python, Bash, or Go for automation, tooling development, and system administration
  • Cloud Platforms: Extensive hands-on experience with at least one major cloud provider (AWS, Azure, GCP)
  • Infrastructure as Code (IaC): Proficiency with IaC tools such as Terraform or Ansible
  • CI/CD: Experience designing, implementing, and maintaining CI/CD pipelines (e.g., Jenkins, GitHub Actions)
Job Responsibility
Job Responsibility
  • Hands-on DevOps Engineering
  • Implementation: implementation, and ongoing management of secure, scalable, and resilient infrastructure components
  • Secret & Certificate Management: Administer and maintain secret and certificate management solutions using HashiCorp Vault, including policy definition and integration
  • Workflow Orchestration: Deploy, monitor, and troubleshoot data orchestration workflows
  • Messaging Systems: Implement and manage messaging technology such as Kafka and Solace
  • Build Automation: Implement and optimize build and deployment processes using Gradle
  • Container Orchestration: Design, implement, and manage container orchestration platforms with Kubernetes and Helm, including integration with CyberArk and HashiCorp for secrets management. Create, debug, and troubleshoot Kubernetes PODs, Jobs, and Deployments using YAML
  • Storage Management: Configure and manage persistent storage solutions including PVC, SONiC NAS, and S3
  • Monitoring & Logging: Implement, configure, and utilize comprehensive monitoring and logging solutions (Prometheus, Grafana, ELK Stack) to ensure system health and proactively identify issues, including those relevant to applications
  • Automation & Scripting: Develop robust automation scripts and tools using Python, Bash, Go, or similar languages to streamline operations and enhance efficiency
  • Fulltime
Read More
Arrow Right

Data Engineer

We are looking for a Data Engineer to support a long-term contract assignment in...
Location
Location
United States , Beverly Hills
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in data engineering, data migration, system onboarding, or similar project-based data work
  • Strong hands-on ability with structured data sets, spreadsheets, data templates, and ETL-oriented processes
  • Proficiency with Python and experience working with technologies such as Apache Spark, Hadoop, and Kafka
  • Understanding of identity and access management concepts, including users, groups, and directory-based administration in Active Directory
  • Ability to investigate data discrepancies, resolve quality issues, and maintain high levels of accuracy across integrated systems
  • Experience partnering with both technical and non-technical stakeholders, including operational teams such as Facilities
  • Preferred background with access control or identity platforms such as Genea, Okta, CCure, or similar tools
  • Genea expertise with substantial hands-on administrative experience is highly valued
Job Responsibility
Job Responsibility
  • Collaborate with Facilities and cross-functional partners to collect, cleanse, and verify access control information before deployment activities begin
  • Reconcile user, badge, and permission records across legacy tools, Workday, Active Directory, and related platforms to maintain consistent data alignment
  • Build and validate migration files, import templates, and assignment lists needed for loading records into Genea and associated systems
  • Execute data upload activities with internal stakeholders and external vendors, then perform detailed checks to confirm completeness and accuracy
  • Translate site and business access needs into structured mappings that connect users with the appropriate access groups and permissions
  • Coordinate with Identity and Security teams to ensure access group design aligns with Active Directory, Okta, and established governance standards
  • Support go-live and cutover efforts by preparing final data sets, applying last-minute updates, and assisting teams during rollout windows
  • Maintain clear documentation for templates, mappings, validation steps, and repeatable processes while incorporating lessons learned for future deployments
  • Provide post-launch support by troubleshooting data issues, correcting access assignments, and helping sites transition into steady-state operations
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • Company 401(k) plan
  • Fulltime
Read More
Arrow Right

Senior Devops Engineer- Assistant Vice President

Join a world-class technology team at the heart of global finance. The Global Cu...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep, practical experience with Docker and Kubernetes for deploying and managing enterprise-scale applications
  • Hands-on proficiency with tools like Terraform or Ansible
  • Proven experience designing and maintaining sophisticated CI/CD pipelines using tools like Jenkins or TeamCity
  • Strong experience with monitoring and logging stacks such as Prometheus, Grafana, or ELK to ensure system health and performance
  • Solid understanding of cloud-native architecture and experience deploying applications on platforms like OpenShift, AWS, Azure, or GCP
  • Proficiency in Java (especially with frameworks like Spring Boot) and/or Python
  • Hands-on experience with the configuration, administration, and troubleshooting of messaging technologies such as IBM MQ, RabbitMQ, or Apache Kafka
  • Strong background in administering IBM WebSphere Application Server (WAS), including clustering and admin scripting
  • Experience with relational and/or NoSQL databases (e.g., Oracle, PostgreSQL, MongoDB)
  • Strong background in Linux/Unix administration and shell scripting
Job Responsibility
Job Responsibility
  • Design, implement, and manage robust, scalable, and secure application systems in coordination with the global technology team
  • Develop and maintain resilient CI/CD pipelines to automate builds, testing, and deployments, ensuring rapid and reliable delivery
  • Automate infrastructure provisioning and configuration management using Infrastructure as Code (IaC) principles and tools
  • Architect and manage containerized applications using Docker and Kubernetes on private and public cloud platforms (OpenShift, AWS, Azure, GCP)
  • Implement and refine observability strategies using industry-standard monitoring, logging, and tracing tools (e.g., Prometheus, Grafana, ELK)
  • Analyze and tune application performance, troubleshoot complex issues in distributed systems, and ensure high availability in an always-on service environment
  • Collaborate with cross-functional teams to integrate security best practices throughout the development lifecycle (DevSecOps)
  • Fulltime
Read More
Arrow Right