CrawlJobs Logo

Senior Infrastructure Kafka Engineer

United States, Phoenix Employment contract · Job Posted April 27, 2026
Apply Position
Job Link Share

Job Description

We are seeking a Senior Infrastructure - Kafka Engineer to join a high-performing data engineering team supporting large-scale, event-driven data platforms. This role is ideal for a seasoned engineer with deep experience in Apache Kafka / Confluent Kafka, messaging platforms, SQL/NoSQL databases, and cloud infrastructure, who can lead engineering, operations, and automation efforts across complex enterprise environments. This is a 6-month contract-to-hire opportunity supporting a hybrid work model in Phoenix, AZ. The ideal candidate is a hands-on infrastructure engineer with strong experience designing resilient Kafka environments, building real-time data pipelines, and supporting production systems in fast-paced enterprise settings.

Job Responsibility

  • Administer, configure, and troubleshoot Kafka clusters across on-prem and cloud environments, including broker and cluster configuration, partitioning, and performance tuning
  • Design and implement scalable, highly available Kafka infrastructure, including disaster recovery and multi-environment strategies
  • Integrate Kafka with upstream and downstream systems using Kafka Connect and related connectors, including MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL
  • Build and support real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming and Kafka Streams
  • Automate infrastructure provisioning and configuration across environments using Terraform and modern DevOps practices
  • Deploy and manage Kafka components and clients in production and disaster recovery environments, ensuring resilience and recoverability
  • Lead a small team of engineers and technicians in monitoring, diagnosis, and remediation of infrastructure issues
  • Implement and maintain comprehensive monitoring, logging, and alerting using tools such as Splunk, Datadog, and Grafana
  • Perform proactive health checks and capacity planning to identify and resolve issues before they impact service
  • Serve as a primary point of contact for daily operations, major incidents, and escalations related to Kafka and associated infrastructure
  • Develop, maintain, and continuously improve runbooks and playbooks for incident response, maintenance, and recurring operational tasks
  • Analyze support trends and incident patterns to reduce downtime and drive root-cause resolution
  • Ensure infrastructure and platform changes comply with internal standards, security policies, and applicable regulatory requirements
  • Partner with security, networking, application, and data engineering teams to design and operate secure, compliant, event-driven architectures
  • Contribute to standards, best practices, and technical documentation for Kafka, messaging, and integration patterns
  • Participate in agile ceremonies and help influence technical direction for streaming and integration platforms

Requirements

  • 7+ years of experience in infrastructure engineering with a strong focus on: Kafka administration across on-prem and cloud environments
  • Kafka ecosystem components including brokers, topics, consumer groups, replication, and failover
  • Messaging systems such as MQ
  • SQL and NoSQL database integration
  • Proven experience designing, deploying, and scaling Kafka clusters and connector infrastructure in production and DR environments
  • Hands-on experience building real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming
  • Strong proficiency with at least one major cloud platform: AWS, GCP, or Azure
  • Experience with event-driven architectures, containerization, and DevOps practices
  • Experience with observability and monitoring tools such as Splunk, Datadog, and Grafana
  • Solid understanding of networking, Linux/Windows operating systems, and core diagnostic tools
  • Proficiency with source control tools such as SVN and Git
  • Scripting and programming experience with tools such as PowerShell, Bash, Python, or Perl
  • Demonstrated ability to analyze complex issues, make sound decisions with limited information, and drive issues through resolution
  • Strong communication, customer service, and collaboration skills with the ability to work effectively across cross-functional technical teams

Nice to have

  • Experience with additional enterprise monitoring and infrastructure support tools
  • Experience working in highly regulated enterprise environments
  • Prior exposure to large-scale data engineering or integration platforms

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Infrastructure Kafka Engineer

8 matching positions

Kafka Infrastructure Engineer

We are seeking a highly skilled and motivated Senior Infrastructure Engineer to ...
Location
Location
United States , Phoenix; Johnston; Iselin; Westwood; Plano
Salary
Salary:
125000.00 - 155000.00 USD / Year
citizensbank.com Logo
Citizens Bank
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 or more years of experience in Kafka administration on-prem and cloud, messaging systems, and database integration
  • Proficiency with cloud platforms such as AWS, GCP, or Azure, event-driven architecture, DevOps, and containerization
  • Experience deploying Kafka clients and brokers in production and disaster recovery environments
  • Proven ability to scale Kafka clusters and connector infrastructure
  • Hands-on experience building real-time data pipelines using Kafka producers and Spark Streaming consumers
  • Familiarity with monitoring tools such as Splunk, Datadog, and Grafana
  • Strong knowledge of source control systems such as SVN and Git
  • Solid understanding of networking protocols, operating systems, and diagnostic tools
  • Proficiency in scripting languages such as PowerShell, Bash, Python, and Perl
  • Strong analytical and decision-making skills, even with limited information
Job Responsibility
Job Responsibility
  • Lead a team of engineers and technicians in monitoring, diagnosing, and resolving infrastructure issues using event-based management
  • Administer and troubleshoot Kafka clusters, including configuration and performance tuning
  • Integrate Kafka with various systems using connectors such as MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL
  • Automate infrastructure setup across environments using Terraform
  • Provide senior-level support and troubleshooting across a wide range of technologies
  • Collaborate within agile teams to drive modern development practices and product vision
  • Serve as the primary point of contact for daily operations and incident management
  • Conduct proactive monitoring to identify and mitigate potential service disruptions
  • Document actions, create reports, and establish escalation procedures
  • Audit support tickets to identify patterns and reduce downtime
What we offer
What we offer
  • Medical, dental and vision coverage
  • Retirement benefits
  • Maternity/paternity leave
  • Flexible work arrangements
  • Education reimbursement
  • Wellness programs
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer - GenAI

We are seeking an experienced Senior Backend Engineer to design, develop, and ma...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field, or equivalent practical experience
  • 4–6 years of experience in backend engineering with focus on scalable, production systems
  • 2+ years of hands-on experience with containerization, Kubernetes, and cloud infrastructure in production environments
  • Demonstrated experience with AI/ML model deployment and serving in production systems
  • Strong experience with backend development using Python, with familiarity in Go, Node.js, or Java for building scalable web services and APIs
  • Hands-on experience with containerization using Docker and orchestration platforms including Kubernetes, OpenShift, and AWS ECS in production environments
  • Proficient with cloud infrastructure, particularly AWS services (Lambda, ECS, EKS, S3, RDS, ElastiCache) and serverless architectures
  • Experience with CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or similar tools, including Infrastructure as Code with Terraform or CloudFormation
  • Strong knowledge of databases including PostgreSQL, MongoDB, Redis, and experience with vector databases for AI applications
  • Familiarity with message queues (RabbitMQ, Apache Kafka, AWS SQS/SNS) and event-driven architectures
Job Responsibility
Job Responsibility
  • Design and implement scalable backend services and APIs for generative AI applications using microservices architecture and cloud-native patterns
  • Build and maintain model serving infrastructure with load balancing, auto-scaling, caching, and failover capabilities for high-availability AI services
  • Deploy and orchestrate containerized AI workloads using Docker, Kubernetes, ECS, and OpenShift across development, staging, and production environments
  • Develop serverless AI functions using AWS Lambda, ECS Fargate, and other cloud services for scalable, cost-effective inference
  • Implement robust CI/CD pipelines for automated deployment of AI services, including model versioning and gradual rollout strategies
  • Create comprehensive monitoring, logging, and alerting systems for AI service performance, reliability, and cost optimization
  • Integrate with various LLM APIs (OpenAI, Anthropic, Google) and open-source models, implementing efficient batching and optimization techniques
  • Build data pipelines for training data preparation, model fine-tuning workflows, and real-time streaming capabilities
  • Ensure adherence to security best practices, including authentication, authorization, API rate limiting, and data encryption
  • Collaborate with AI researchers and product teams to translate AI capabilities into production-ready backend services
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Coralogix is seeking a Senior Infrastructure Engineer to join our Core SRE team ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, SRE, platform engineering, or infrastructure roles
  • Deep understanding of Kubernetes: API, CNI, scheduling, container runtimes and such
  • Strong hands-on experience with Kafka and Istio (or similar technologies ), and core networking protocols (HTTP, gRPC, TLS)
  • Proven experience managing large-scale cloud infrastructure (AWS, GCP, etc.)
  • Experience in incident response and troubleshooting complex distributed systems
  • Some software engineering experience, preferably in Golang
  • Passion for automation, performance tuning, and operational excellence
Job Responsibility
Job Responsibility
  • Act as a hands-on technical leader with deep expertise in modern cloud infrastructure
  • Serve as a go-to person in the team — leading through influence, not hierarchy
  • Collaborate cross-functionally to refine requirements and propose innovative, scalable solutions
  • Drive long-term, high-impact infrastructure projects across multiple teams, from design to implementation, within defined timelines
  • Contribute to improving system reliability, performance, and cost-efficiency at scale
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Coralogix is seeking a Senior Infrastructure Engineer to join our Core SRE team ...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, SRE, platform engineering, or infrastructure roles
  • Deep understanding of Kubernetes: API, CNI, scheduling, container runtimes and such
  • Strong hands-on experience with Kafka and Istio (or similar technologies ), and core networking protocols (HTTP, gRPC, TLS)
  • Proven experience managing large-scale cloud infrastructure (AWS, GCP, etc.)
  • Experience in incident response and troubleshooting complex distributed systems
  • Some software engineering experience, preferably in Golang
  • Passion for automation, performance tuning, and operational excellence
Job Responsibility
Job Responsibility
  • Act as a hands-on technical leader with deep expertise in modern cloud infrastructure
  • Serve as a go-to person in the team — leading through influence, not hierarchy
  • Collaborate cross-functionally to refine requirements and propose innovative, scalable solutions
  • Drive long-term, high-impact infrastructure projects across multiple teams, from design to implementation, within defined timelines
  • Contribute to improving system reliability, performance, and cost-efficiency at scale
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Senior Infrastructure - Kafka Engineer, Enterprise Data Engineering. We are seek...
Location
Location
United States , Phoenix; Westwood; Johnston; Iselin; Plano
Salary
Salary:
125000.00 - 145000.00 USD / Year
citizensbank.com Logo
Citizens Bank
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 or more years of experience in Kafka administration on-prem and cloud, messaging systems, and database integration
  • Proficiency with cloud platforms such as AWS, GCP, or Azure, event-driven architecture, DevOps, and containerization
  • Experience deploying Kafka clients and brokers in production and disaster recovery environments
  • Proven ability to scale Kafka clusters and connector infrastructure
  • Hands-on experience building real-time data pipelines using Kafka producers and Spark Streaming consumers
  • Familiarity with monitoring tools such as Splunk, Datadog, and Grafana
  • Strong knowledge of source control systems such as SVN and Git
  • Solid understanding of networking protocols, operating systems, and diagnostic tools
  • Proficiency in scripting languages such as PowerShell, Bash, Python, and Perl
  • Strong analytical and decision-making skills, even with limited information
Job Responsibility
Job Responsibility
  • Lead a team of engineers and technicians in monitoring, diagnosing, and resolving infrastructure issues using event-based management
  • Administer and troubleshoot Kafka clusters, including configuration and performance tuning
  • Integrate Kafka with various systems using connectors such as MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL
  • Automate infrastructure setup across environments using Terraform
  • Provide senior-level support and troubleshooting across a wide range of technologies
  • Collaborate within agile teams to drive modern development practices and product vision
  • Serve as the primary point of contact for daily operations and incident management
  • Conduct proactive monitoring to identify and mitigate potential service disruptions
  • Document actions, create reports, and establish escalation procedures
  • Audit support tickets to identify patterns and reduce downtime
What we offer
What we offer
  • comprehensive medical, dental and vision coverage
  • retirement benefits
  • maternity/paternity leave
  • flexible work arrangements
  • education reimbursement
  • wellness programs
  • competitive pay
  • opportunity to earn an annual discretionary bonus
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Infrastructure Reliability

We are seeking a Senior Software Engineer to join our Security Product team, foc...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in software engineering, with at least 3+ years focused on debugging and solving infrastructure-level problems in distributed systems
  • Strong proficiency in Go
  • familiarity with Python and Helm is a plus
  • Deep hands-on experience with RabbitMQ or similar message brokers (Kafka, ActiveMQ) - including queue management, clustering, monitoring, and production troubleshooting
  • Solid working knowledge of Kubernetes (pod lifecycle, resource management, networking, debugging CrashLoopBackOff / OOMKilled scenarios) and Docker
  • Experience investigating production incidents and conducting post-incident reviews with clear root cause analysis and follow-through
  • Strong understanding of Linux systems, networking fundamentals, and cloud infrastructure (AWS, Azure, or GCP)
  • Ability to read and interpret logs, thread dumps, heap dumps, and system metrics to isolate root causes under time pressure
  • Excellent analytical and problem-solving skills with a methodical approach to debugging
  • Strong written and verbal communication skills - ability to produce clear incident reports, root cause analyses, and playbooks, and to communicate effectively across engineering, SRE, and customer-facing teams
Job Responsibility
Job Responsibility
  • Investigate system outages and production failures across customer environments (SaaS and self-hosted), spanning RabbitMQ, Kubernetes, Docker, Postgres, and cloud infrastructure (AWS, Azure, GCP)
  • Identify recurring failure patterns and systemic weaknesses from incident data, and drive them to resolution - whether by writing Go code yourself (resilience features, infrastructure fixes, observability) or by collaborating with service owners to prioritize and address reliability gaps
  • Lead and participate in post-incident reviews - document root causes, corrective actions, and follow through to ensure issues are properly resolved
  • Collaborate with production engineering and SRE teams to develop and maintain operational playbooks and runbooks that reduce time-to-resolution
  • Diagnose root causes across the full stack - message queue failures, container lifecycle issues, cloud networking, disk and memory pressure, and deployment topology mismatches
  • Design and implement data migrations and lifecycle management for infrastructure components such as queue management and vhost operations
  • Emit and monitor operational metrics to proactively detect infrastructure degradation and measure service reliability
Read More
Arrow Right

Senior Cloud Infrastructure Engineer

HPE Aruba Networking is a leading provider of next-generation networking solutio...
Location
Location
United States , San Jose
Salary
Salary:
133500.00 - 307000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum expected industry experience is around 6 years
  • Minimum education at BS or MS level in Computer Science or related fields
  • Proven record of developing and releasing cloud applications in the production environment
  • Experience with DevOps and Cloud Infrastructure Deployment and Automation in Python, Terraform, Ansibles, GitOps, GitLabs, and Jenkins/Spinnaker
  • Experience in RDBMS (Postgres), GraphQL, and NoSQL (Cassandra, OpenSearch, Clickhouse, and etc.)
  • Experience in cloud stacks such as Redis, Kafka, RabbitMQ, Hazelcast
  • Experience in development in Kubernetes and Docker containers
  • Programming language experience with Shell Scripts, Python, Golang, or Java
  • Ability to deploy various techniques to ‘scale’ an application in a cloud environment
  • Demonstrated abilities to work with QA and Remote Teams
Job Responsibility
Job Responsibility
  • Participate in architecture and design discussions
  • Develop scalable applications that run on top of Next Generation Central
  • Contribute to multiple technical programs simultaneously
What we offer
What we offer
  • Health benefits
  • Comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • Personal and professional development programs
  • Inclusion and diversity initiatives
  • Exciting and fun work culture
  • Innovation and growth opportunities
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Streaming Infrastructure

Join us in building the future of finance. Our mission is to democratize finance...
Location
Location
United States , Bellevue
Salary
Salary:
196000.00 - 230000.00 USD / Year
robinhood.com Logo
Robinhood
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in software engineering, including building distributed systems at scale
  • A background in tools like Kafka, Flink and Debezium
  • Proficiency in designing and implementing event-driven architectures and stream processing systems
  • A passion for platform engineering and creating great experiences for other developers
  • Strong communication and collaboration skills to work across technical teams
Job Responsibility
Job Responsibility
  • Design and operate distributed data streaming platforms that scale to billions of events per day
  • Develop secure, performant, and highly reliable systems using technologies like Kafka, Flink, and Debezium
  • Collaborate closely with product, infrastructure, data, and ML teams to ensure the platform supports diverse use cases
  • Build tools and documentation to deliver a smooth, empowering experience for internal developers
  • Mentor and support other engineers to drive architectural decisions and long-term technical strategy
What we offer
What we offer
  • Performance-driven compensation with multipliers for outsized impact, bonus programs, equity ownership, and 401(k) matching
  • 100% paid health insurance for employees with 90% coverage for dependents
  • Lifestyle wallet — a highly flexible benefits spending account for wellness, learning, and more
  • Employer-paid life & disability insurance, fertility benefits, and mental health benefits
  • Time off to recharge including company holidays, paid time off, sick time, parental leave, and more
  • Exceptional office experience with catered meals, events, and comfortable workspaces
  • Fulltime
Read More
Arrow Right