CrawlJobs Logo

Kafka Infrastructure Engineer

United States, Phoenix Employment contract 125000.00 - 155000.00 USD / Year · Job Posted May 16, 2026
Apply Position
Job Link Share

Job Description

We are seeking a highly skilled and motivated Senior Infrastructure Engineer to join our Enterprise Data Engineering team. This full-time role is ideal for candidates with hands-on experience in infrastructure technologies, Apache Kafka including Confluent Kafka and Kafka Streams, MQ, SQL and NoSQL databases, and cloud engineering. If you are passionate about building efficient, scalable platforms, we would love to hear from you.

Job Responsibility

  • Lead a team of engineers and technicians in monitoring, diagnosing, and resolving infrastructure issues using event-based management
  • Administer and troubleshoot Kafka clusters, including configuration and performance tuning
  • Integrate Kafka with various systems using connectors such as MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL
  • Automate infrastructure setup across environments using Terraform
  • Provide senior-level support and troubleshooting across a wide range of technologies
  • Collaborate within agile teams to drive modern development practices and product vision
  • Serve as the primary point of contact for daily operations and incident management
  • Conduct proactive monitoring to identify and mitigate potential service disruptions
  • Document actions, create reports, and establish escalation procedures
  • Audit support tickets to identify patterns and reduce downtime
  • Ensure compliance with internal policies and industry standards
  • Develop and maintain runbooks and playbooks for operational excellence

Requirements

  • 7 or more years of experience in Kafka administration on-prem and cloud, messaging systems, and database integration
  • Proficiency with cloud platforms such as AWS, GCP, or Azure, event-driven architecture, DevOps, and containerization
  • Experience deploying Kafka clients and brokers in production and disaster recovery environments
  • Proven ability to scale Kafka clusters and connector infrastructure
  • Hands-on experience building real-time data pipelines using Kafka producers and Spark Streaming consumers
  • Familiarity with monitoring tools such as Splunk, Datadog, and Grafana
  • Strong knowledge of source control systems such as SVN and Git
  • Solid understanding of networking protocols, operating systems, and diagnostic tools
  • Proficiency in scripting languages such as PowerShell, Bash, Python, and Perl
  • Strong analytical and decision-making skills, even with limited information
  • Ability to work independently and track issues to identify trends
  • Excellent communication and customer service skills
  • Sound judgment and autonomy in handling both emergency and routine situations
  • Experience collaborating with technical teams for issue resolution
  • Bachelor's degree in Computer Science, Computer Engineering, Electronics Engineering

Nice to have

  • Experience with monitoring tools
  • Working experience in the financial services industry

What we offer

  • Medical, dental and vision coverage
  • Retirement benefits
  • Maternity/paternity leave
  • Flexible work arrangements
  • Education reimbursement
  • Wellness programs

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Kafka Infrastructure Engineer

8 matching positions

Senior Infrastructure Kafka Engineer

We are seeking a Senior Infrastructure - Kafka Engineer to join a high-performin...
Location
Location
United States , Phoenix
Salary
Salary:
Not provided
technologent.com Logo
Technologent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in infrastructure engineering with a strong focus on: Kafka administration across on-prem and cloud environments
  • Kafka ecosystem components including brokers, topics, consumer groups, replication, and failover
  • Messaging systems such as MQ
  • SQL and NoSQL database integration
  • Proven experience designing, deploying, and scaling Kafka clusters and connector infrastructure in production and DR environments
  • Hands-on experience building real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming
  • Strong proficiency with at least one major cloud platform: AWS, GCP, or Azure
  • Experience with event-driven architectures, containerization, and DevOps practices
  • Experience with observability and monitoring tools such as Splunk, Datadog, and Grafana
  • Solid understanding of networking, Linux/Windows operating systems, and core diagnostic tools
Job Responsibility
Job Responsibility
  • Administer, configure, and troubleshoot Kafka clusters across on-prem and cloud environments, including broker and cluster configuration, partitioning, and performance tuning
  • Design and implement scalable, highly available Kafka infrastructure, including disaster recovery and multi-environment strategies
  • Integrate Kafka with upstream and downstream systems using Kafka Connect and related connectors, including MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL
  • Build and support real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming and Kafka Streams
  • Automate infrastructure provisioning and configuration across environments using Terraform and modern DevOps practices
  • Deploy and manage Kafka components and clients in production and disaster recovery environments, ensuring resilience and recoverability
  • Lead a small team of engineers and technicians in monitoring, diagnosis, and remediation of infrastructure issues
  • Implement and maintain comprehensive monitoring, logging, and alerting using tools such as Splunk, Datadog, and Grafana
  • Perform proactive health checks and capacity planning to identify and resolve issues before they impact service
  • Serve as a primary point of contact for daily operations, major incidents, and escalations related to Kafka and associated infrastructure
  • Fulltime
Read More
Arrow Right

Senior DevOps / Voice Infrastructure Engineer

As we grow and take on exciting new challenges, we’re on the lookout for excepti...
Location
Location
Salary
Salary:
Not provided
maddevs.io Logo
Mad Devs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of hands-on experience with Asterisk or FreeSWITCH
  • Deep knowledge of SIP, RTP, SRTP protocols
  • Experience with SIP proxies — Kamailio or OpenSIPS
  • WebRTC integrations
  • Trunk configuration, dialplan design, codec negotiation
  • GCP and/or AWS hands-on experience (2+ years)
  • Kubernetes (GKE or EKS) in production environments
  • Terraform — custom modules, multi-environment setups
  • Docker, Docker Compose
  • CI/CD: GitHub Actions, ArgoCD / Flux
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain SIP/VoIP infrastructure (Asterisk, FreeSWITCH, Kamailio) for AI Agents
  • Integrate voice platforms with cloud services (GCP, AWS) and internal AI pipelines
  • Ensure high availability and low latency of voice services (HA, load balancing, failover)
  • Manage cloud infrastructure via IaC (Terraform) and container orchestration in Kubernetes
  • Set up call quality monitoring (MOS, jitter, packet loss) and alerting with Grafana / Victoria Metrics
  • Build and optimize CI/CD pipelines (GitHub Actions, ArgoCD) for voice services
  • Harden voice infrastructure security: encryption (SRTP, TLS), toll fraud prevention, DoS protection
  • Integrate with PSTN/SIP trunk providers, manage DID numbers and call routing
What we offer
What we offer
  • Flexible working hours
  • Remote-first culture
  • Long-term projects
  • Salary in dollars
  • Professional communities
  • Onsite business trips
  • Training budget
  • Paid conferences
  • Fulltime
Read More
Arrow Right

Financial Infrastructure Engineer

We're working with a high growth AI company who is building a payment platform t...
Location
Location
United States , Daly City
Salary
Salary:
150000.00 USD / Year
careermovement.com Logo
Career Movement
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years building scalable backend systems, ideally in payments, fintech, billing, or financial infrastructure
  • 3+ years owning production backend or platform systems
  • Direct experience integrating with payment gateways (Stripe, Adyen, or similar) and/or banking APIs
  • Familiarity with PCI DSS compliance and secure handling of sensitive financial data
  • Strong experience working with transactional databases such as PostgreSQL or MySQL
  • Familiarity with message queues, streaming platforms, and distributed systems tools such as Kafka, SQS, RabbitMQ, Temporal, or Kubernetes
  • Python proficiency
  • Systems thinker who prioritizes reliability, observability, and automation over workarounds
Job Responsibility
Job Responsibility
  • Automate payout orchestration to eliminate manual, human triggered workflows
  • Build an immutable, auditable operational ledger
  • Design settlement ingestion and reconciliation pipelines
  • Improve observability across the full payout to processor to bank flow
  • Fix recurring payment bugs at the root cause
  • Reduce manual treasury overhead
  • Abstract processor dependencies to avoid vendor lock in
  • Fulltime
Read More
Arrow Right

Staff Observability Data Infrastructure Engineer

CVS Health is seeking a highly skilled Observability Data Infrastructure Enginee...
Location
Location
United States , Work at Home, Maryland
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 30, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of experience building and operating log, metric, and trace pipelines in Data, Security Data, or Observability Engineering roles
  • 5+ years of hands-on experience with Databricks, Apache Spark, or other large-scale distributed data platforms
  • 5+ years of experience working across cloud platforms (AWS, Azure, or GCP), including storage, compute, and event-driven services
  • 5+ years of production experience using SQL and Python in data-intensive environments
  • 3+ years of experience with enterprise observability platforms (Splunk, Datadog, Elastic, or equivalent)
  • 3+ years of experience with high-throughput ingestion and streaming technologies such as Cribl, Vector, or Kafka
  • 3+ years of experience designing telemetry systems aligned to OpenTelemetry (OTEL) or similar standards
  • Bachelor's degree from accredited university or equivalent work experience (HS diploma + 4 years relevant experience)
Job Responsibility
Job Responsibility
  • Design, build, and operate high-volume log, metric, and trace pipelines using Databricks, cloud data lakes, and distributed processing engines
  • Architect and evolve an Observability Lakehouse aligned with OpenTelemetry (OTEL) data models and standards
  • Implement ingestion and transformation workflows using technologies such as Cribl, Vector, Jenkins, GitHub Actions, or equivalent tools
  • Normalize, model, and enrich telemetry data to support detection engineering, forensics, and operational analytics
  • Develop scalable ETL/ELT frameworks, Delta Lake architectures, and automated data quality validation for unstructured and semi-structured data
  • Partner with Security Engineering, SRE, Cloud, and SOC teams to improve enterprise visibility and detection accuracy
  • Build and maintain CI/CD pipelines and reusable Infrastructure-as-Code (IaC) patterns for observability platform deployment
  • Identify and resolve performance, latency, cost, and reliability issues across telemetry pipelines
  • Contribute to engineering standards, documentation, and knowledge sharing across observability and security platforms
What we offer
What we offer
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Bonus, commission or short-term incentive program
  • Equity award program
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Infrastructure Reliability

We are seeking a Senior Software Engineer to join our Security Product team, foc...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in software engineering, with at least 3+ years focused on debugging and solving infrastructure-level problems in distributed systems
  • Strong proficiency in Go
  • familiarity with Python and Helm is a plus
  • Deep hands-on experience with RabbitMQ or similar message brokers (Kafka, ActiveMQ) - including queue management, clustering, monitoring, and production troubleshooting
  • Solid working knowledge of Kubernetes (pod lifecycle, resource management, networking, debugging CrashLoopBackOff / OOMKilled scenarios) and Docker
  • Experience investigating production incidents and conducting post-incident reviews with clear root cause analysis and follow-through
  • Strong understanding of Linux systems, networking fundamentals, and cloud infrastructure (AWS, Azure, or GCP)
  • Ability to read and interpret logs, thread dumps, heap dumps, and system metrics to isolate root causes under time pressure
  • Excellent analytical and problem-solving skills with a methodical approach to debugging
  • Strong written and verbal communication skills - ability to produce clear incident reports, root cause analyses, and playbooks, and to communicate effectively across engineering, SRE, and customer-facing teams
Job Responsibility
Job Responsibility
  • Investigate system outages and production failures across customer environments (SaaS and self-hosted), spanning RabbitMQ, Kubernetes, Docker, Postgres, and cloud infrastructure (AWS, Azure, GCP)
  • Identify recurring failure patterns and systemic weaknesses from incident data, and drive them to resolution - whether by writing Go code yourself (resilience features, infrastructure fixes, observability) or by collaborating with service owners to prioritize and address reliability gaps
  • Lead and participate in post-incident reviews - document root causes, corrective actions, and follow through to ensure issues are properly resolved
  • Collaborate with production engineering and SRE teams to develop and maintain operational playbooks and runbooks that reduce time-to-resolution
  • Diagnose root causes across the full stack - message queue failures, container lifecycle issues, cloud networking, disk and memory pressure, and deployment topology mismatches
  • Design and implement data migrations and lifecycle management for infrastructure components such as queue management and vhost operations
  • Emit and monitor operational metrics to proactively detect infrastructure degradation and measure service reliability
Read More
Arrow Right

Senior Infrastructure Engineer - GenAI

We are seeking an experienced Senior Backend Engineer to design, develop, and ma...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field, or equivalent practical experience
  • 4–6 years of experience in backend engineering with focus on scalable, production systems
  • 2+ years of hands-on experience with containerization, Kubernetes, and cloud infrastructure in production environments
  • Demonstrated experience with AI/ML model deployment and serving in production systems
  • Strong experience with backend development using Python, with familiarity in Go, Node.js, or Java for building scalable web services and APIs
  • Hands-on experience with containerization using Docker and orchestration platforms including Kubernetes, OpenShift, and AWS ECS in production environments
  • Proficient with cloud infrastructure, particularly AWS services (Lambda, ECS, EKS, S3, RDS, ElastiCache) and serverless architectures
  • Experience with CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or similar tools, including Infrastructure as Code with Terraform or CloudFormation
  • Strong knowledge of databases including PostgreSQL, MongoDB, Redis, and experience with vector databases for AI applications
  • Familiarity with message queues (RabbitMQ, Apache Kafka, AWS SQS/SNS) and event-driven architectures
Job Responsibility
Job Responsibility
  • Design and implement scalable backend services and APIs for generative AI applications using microservices architecture and cloud-native patterns
  • Build and maintain model serving infrastructure with load balancing, auto-scaling, caching, and failover capabilities for high-availability AI services
  • Deploy and orchestrate containerized AI workloads using Docker, Kubernetes, ECS, and OpenShift across development, staging, and production environments
  • Develop serverless AI functions using AWS Lambda, ECS Fargate, and other cloud services for scalable, cost-effective inference
  • Implement robust CI/CD pipelines for automated deployment of AI services, including model versioning and gradual rollout strategies
  • Create comprehensive monitoring, logging, and alerting systems for AI service performance, reliability, and cost optimization
  • Integrate with various LLM APIs (OpenAI, Anthropic, Google) and open-source models, implementing efficient batching and optimization techniques
  • Build data pipelines for training data preparation, model fine-tuning workflows, and real-time streaming capabilities
  • Ensure adherence to security best practices, including authentication, authorization, API rate limiting, and data encryption
  • Collaborate with AI researchers and product teams to translate AI capabilities into production-ready backend services
  • Fulltime
Read More
Arrow Right

Data Engineer (Kafka)

Altamira is seeking a Data Engineer to design, build, and operate high-performan...
Location
Location
United States , Dayton, OH
Salary
Salary:
Not provided
altamiracorp.com Logo
Altamira Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Active TS/SCI clearance
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • Experience in data engineering, distributed systems, or backend engineering roles
  • Hands-on experience with Apache Kafka in production environments
  • Experience building and supporting real-time data pipelines
  • Strong proficiency in Java, Python, Scala, or similar programming languages
  • Experience working in AWS or hybrid cloud environments
  • Strong Linux systems administration and troubleshooting skills
  • Ability to work effectively in secure, mission-focused environments
Job Responsibility
Job Responsibility
  • Design, deploy, and operate Apache Kafka clusters in classified and hybrid environments
  • Build and maintain reliable, scalable, and secure data streaming pipelines
  • Develop and optimize producers, consumers, and stream processing applications
  • Configure and manage topics, partitions, replication, and retention policies
  • Monitor, tune, and troubleshoot Kafka performance, availability, and latency
  • Integrate streaming platforms with databases, storage systems, and analytics tools
  • Implement data governance, retention, and access control policies
  • Automate deployment and management of streaming infrastructure
  • Collaborate with platform, infrastructure, and application teams to support data requirements
  • Support system accreditation, compliance, and security requirements
  • Fulltime
Read More
Arrow Right

Software Engineer, Infrastructure

At Ramp, we’re rethinking how modern finance teams function in the age of AI. We...
Location
Location
United States , New York, NY; San Francisco, CA
Salary
Salary:
184800.00 - 374900.00 USD / Year
ramp.com Logo
Ramp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ of experience shipping high-quality architectures for critical systems preferred
  • Production experience in AWS, GCP, or Azure
  • An ability to think through customer requirements and come up with high-impact ways to quickly solve their problems
  • Expertise in a production deployment of Infrastructure-as-Code i.e. Terraform
  • Proficiency in an object-oriented programing language
  • Deep experience in one of the following: Large-scale SQL database administration (e.g. PostgreSQL, MySQL)
  • Real-time queue systems (e.g. Kafka, Celery, SQS, Temporal)
  • Container Orchestration/Web Server Administration (ECS/Kubernetes, Load Balancing, Gunicorn, Flask)
Job Responsibility
Job Responsibility
  • Influence and implement the next generation of Ramp's database, real-time queue, or container orchestration infrastructure
  • Work across our engineering organization to introduce and scale best practices with cloud-native technologies like Cloudflare, Amazon ALB, Service Discovery, ECS/EKS, Celery, Kafka, Amazon Aurora PostgreSQL, Elasticache Redis, and S3
  • Build abstractions within Terraform to simplify the architecture and increase velocity and ownership
  • Find solutions to Ramp's toughest scaling, performance, and low latency problems
  • Participate in an On Call rotation to solve critical production events
What we offer
What we offer
  • 100% medical, dental & vision insurance coverage for you
  • Partially covered for your dependents
  • One Medical annual membership
  • 401k (including employer match on contributions made while employed by Ramp)
  • Flexible PTO
  • Fertility HRA (up to $10,000 per year)
  • Parental Leave
  • Unlimited AI token usage
  • Pet insurance
  • Centralized home-office equipment ordering for all employees
  • Fulltime
Read More
Arrow Right