Kafka Infrastructure Engineer Job at Citizens Bank (Phoenix)

Senior Infrastructure Kafka Engineer

We are seeking a Senior Infrastructure - Kafka Engineer to join a high-performin...

Location

United States , Phoenix

Salary:

Not provided

Technologent

Expiration Date

Until further notice

Requirements

7+ years of experience in infrastructure engineering with a strong focus on: Kafka administration across on-prem and cloud environments
Kafka ecosystem components including brokers, topics, consumer groups, replication, and failover
Messaging systems such as MQ
SQL and NoSQL database integration
Proven experience designing, deploying, and scaling Kafka clusters and connector infrastructure in production and DR environments
Hands-on experience building real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming
Strong proficiency with at least one major cloud platform: AWS, GCP, or Azure
Experience with event-driven architectures, containerization, and DevOps practices
Experience with observability and monitoring tools such as Splunk, Datadog, and Grafana
Solid understanding of networking, Linux/Windows operating systems, and core diagnostic tools

Job Responsibility

Administer, configure, and troubleshoot Kafka clusters across on-prem and cloud environments, including broker and cluster configuration, partitioning, and performance tuning
Design and implement scalable, highly available Kafka infrastructure, including disaster recovery and multi-environment strategies
Integrate Kafka with upstream and downstream systems using Kafka Connect and related connectors, including MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL
Build and support real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming and Kafka Streams
Automate infrastructure provisioning and configuration across environments using Terraform and modern DevOps practices
Deploy and manage Kafka components and clients in production and disaster recovery environments, ensuring resilience and recoverability
Lead a small team of engineers and technicians in monitoring, diagnosis, and remediation of infrastructure issues
Implement and maintain comprehensive monitoring, logging, and alerting using tools such as Splunk, Datadog, and Grafana
Perform proactive health checks and capacity planning to identify and resolve issues before they impact service
Serve as a primary point of contact for daily operations, major incidents, and escalations related to Kafka and associated infrastructure

Fulltime

Senior DevOps / Voice Infrastructure Engineer

As we grow and take on exciting new challenges, we’re on the lookout for excepti...

Location

Salary:

Not provided

Mad Devs

Expiration Date

Until further notice

Requirements

3+ years of hands-on experience with Asterisk or FreeSWITCH
Deep knowledge of SIP, RTP, SRTP protocols
Experience with SIP proxies — Kamailio or OpenSIPS
WebRTC integrations
Trunk configuration, dialplan design, codec negotiation
GCP and/or AWS hands-on experience (2+ years)
Kubernetes (GKE or EKS) in production environments
Terraform — custom modules, multi-environment setups
Docker, Docker Compose
CI/CD: GitHub Actions, ArgoCD / Flux

Job Responsibility

Design, deploy, and maintain SIP/VoIP infrastructure (Asterisk, FreeSWITCH, Kamailio) for AI Agents
Integrate voice platforms with cloud services (GCP, AWS) and internal AI pipelines
Ensure high availability and low latency of voice services (HA, load balancing, failover)
Manage cloud infrastructure via IaC (Terraform) and container orchestration in Kubernetes
Set up call quality monitoring (MOS, jitter, packet loss) and alerting with Grafana / Victoria Metrics
Build and optimize CI/CD pipelines (GitHub Actions, ArgoCD) for voice services
Harden voice infrastructure security: encryption (SRTP, TLS), toll fraud prevention, DoS protection
Integrate with PSTN/SIP trunk providers, manage DID numbers and call routing

What we offer

Flexible working hours
Remote-first culture
Long-term projects
Salary in dollars
Professional communities
Onsite business trips
Training budget
Paid conferences

Fulltime

Financial Infrastructure Engineer

We're working with a high growth AI company who is building a payment platform t...

Location

United States , Daly City

Salary:

150000.00 USD / Year

Career Movement

Expiration Date

Until further notice

Requirements

5+ years building scalable backend systems, ideally in payments, fintech, billing, or financial infrastructure
3+ years owning production backend or platform systems
Direct experience integrating with payment gateways (Stripe, Adyen, or similar) and/or banking APIs
Familiarity with PCI DSS compliance and secure handling of sensitive financial data
Strong experience working with transactional databases such as PostgreSQL or MySQL
Familiarity with message queues, streaming platforms, and distributed systems tools such as Kafka, SQS, RabbitMQ, Temporal, or Kubernetes
Python proficiency
Systems thinker who prioritizes reliability, observability, and automation over workarounds

Job Responsibility

Automate payout orchestration to eliminate manual, human triggered workflows
Build an immutable, auditable operational ledger
Design settlement ingestion and reconciliation pipelines
Improve observability across the full payout to processor to bank flow
Fix recurring payment bugs at the root cause
Reduce manual treasury overhead
Abstract processor dependencies to avoid vendor lock in

Fulltime

Senior Software Engineer - Infrastructure Reliability

We are seeking a Senior Software Engineer to join our Security Product team, foc...

Location

India , Bangalore

Salary:

Not provided

JFrog

Expiration Date

Until further notice

Requirements

7+ years of experience in software engineering, with at least 3+ years focused on debugging and solving infrastructure-level problems in distributed systems
Strong proficiency in Go
familiarity with Python and Helm is a plus
Deep hands-on experience with RabbitMQ or similar message brokers (Kafka, ActiveMQ) - including queue management, clustering, monitoring, and production troubleshooting
Solid working knowledge of Kubernetes (pod lifecycle, resource management, networking, debugging CrashLoopBackOff / OOMKilled scenarios) and Docker
Experience investigating production incidents and conducting post-incident reviews with clear root cause analysis and follow-through
Strong understanding of Linux systems, networking fundamentals, and cloud infrastructure (AWS, Azure, or GCP)
Ability to read and interpret logs, thread dumps, heap dumps, and system metrics to isolate root causes under time pressure
Excellent analytical and problem-solving skills with a methodical approach to debugging
Strong written and verbal communication skills - ability to produce clear incident reports, root cause analyses, and playbooks, and to communicate effectively across engineering, SRE, and customer-facing teams

Job Responsibility

Investigate system outages and production failures across customer environments (SaaS and self-hosted), spanning RabbitMQ, Kubernetes, Docker, Postgres, and cloud infrastructure (AWS, Azure, GCP)
Identify recurring failure patterns and systemic weaknesses from incident data, and drive them to resolution - whether by writing Go code yourself (resilience features, infrastructure fixes, observability) or by collaborating with service owners to prioritize and address reliability gaps
Lead and participate in post-incident reviews - document root causes, corrective actions, and follow through to ensure issues are properly resolved
Collaborate with production engineering and SRE teams to develop and maintain operational playbooks and runbooks that reduce time-to-resolution
Diagnose root causes across the full stack - message queue failures, container lifecycle issues, cloud networking, disk and memory pressure, and deployment topology mismatches
Design and implement data migrations and lifecycle management for infrastructure components such as queue management and vhost operations
Emit and monitor operational metrics to proactively detect infrastructure degradation and measure service reliability

Senior Infrastructure Engineer - GenAI

We are seeking an experienced Senior Backend Engineer to design, develop, and ma...

Location

India , Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field, or equivalent practical experience
4–6 years of experience in backend engineering with focus on scalable, production systems
2+ years of hands-on experience with containerization, Kubernetes, and cloud infrastructure in production environments
Demonstrated experience with AI/ML model deployment and serving in production systems
Strong experience with backend development using Python, with familiarity in Go, Node.js, or Java for building scalable web services and APIs
Hands-on experience with containerization using Docker and orchestration platforms including Kubernetes, OpenShift, and AWS ECS in production environments
Proficient with cloud infrastructure, particularly AWS services (Lambda, ECS, EKS, S3, RDS, ElastiCache) and serverless architectures
Experience with CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or similar tools, including Infrastructure as Code with Terraform or CloudFormation
Strong knowledge of databases including PostgreSQL, MongoDB, Redis, and experience with vector databases for AI applications
Familiarity with message queues (RabbitMQ, Apache Kafka, AWS SQS/SNS) and event-driven architectures

Job Responsibility

Design and implement scalable backend services and APIs for generative AI applications using microservices architecture and cloud-native patterns
Build and maintain model serving infrastructure with load balancing, auto-scaling, caching, and failover capabilities for high-availability AI services
Deploy and orchestrate containerized AI workloads using Docker, Kubernetes, ECS, and OpenShift across development, staging, and production environments
Develop serverless AI functions using AWS Lambda, ECS Fargate, and other cloud services for scalable, cost-effective inference
Implement robust CI/CD pipelines for automated deployment of AI services, including model versioning and gradual rollout strategies
Create comprehensive monitoring, logging, and alerting systems for AI service performance, reliability, and cost optimization
Integrate with various LLM APIs (OpenAI, Anthropic, Google) and open-source models, implementing efficient batching and optimization techniques
Build data pipelines for training data preparation, model fine-tuning workflows, and real-time streaming capabilities
Ensure adherence to security best practices, including authentication, authorization, API rate limiting, and data encryption
Collaborate with AI researchers and product teams to translate AI capabilities into production-ready backend services

Fulltime

Data Engineer (Kafka)

Altamira is seeking a Data Engineer to design, build, and operate high-performan...

Location

United States , Dayton, OH

Salary:

Not provided

Altamira Technologies

Expiration Date

Until further notice

Requirements

Active TS/SCI clearance
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
Experience in data engineering, distributed systems, or backend engineering roles
Hands-on experience with Apache Kafka in production environments
Experience building and supporting real-time data pipelines
Strong proficiency in Java, Python, Scala, or similar programming languages
Experience working in AWS or hybrid cloud environments
Strong Linux systems administration and troubleshooting skills
Ability to work effectively in secure, mission-focused environments

Job Responsibility

Design, deploy, and operate Apache Kafka clusters in classified and hybrid environments
Build and maintain reliable, scalable, and secure data streaming pipelines
Develop and optimize producers, consumers, and stream processing applications
Configure and manage topics, partitions, replication, and retention policies
Monitor, tune, and troubleshoot Kafka performance, availability, and latency
Integrate streaming platforms with databases, storage systems, and analytics tools
Implement data governance, retention, and access control policies
Automate deployment and management of streaming infrastructure
Collaborate with platform, infrastructure, and application teams to support data requirements
Support system accreditation, compliance, and security requirements

Fulltime

Software Engineer, Infrastructure

At Ramp, we’re rethinking how modern finance teams function in the age of AI. We...

Location

United States , New York, NY; San Francisco, CA

Salary:

184800.00 - 374900.00 USD / Year

Ramp

Expiration Date

Until further notice

Requirements

2+ of experience shipping high-quality architectures for critical systems preferred
Production experience in AWS, GCP, or Azure
An ability to think through customer requirements and come up with high-impact ways to quickly solve their problems
Expertise in a production deployment of Infrastructure-as-Code i.e. Terraform
Proficiency in an object-oriented programing language
Deep experience in one of the following: Large-scale SQL database administration (e.g. PostgreSQL, MySQL)
Real-time queue systems (e.g. Kafka, Celery, SQS, Temporal)
Container Orchestration/Web Server Administration (ECS/Kubernetes, Load Balancing, Gunicorn, Flask)

Job Responsibility

Influence and implement the next generation of Ramp's database, real-time queue, or container orchestration infrastructure
Work across our engineering organization to introduce and scale best practices with cloud-native technologies like Cloudflare, Amazon ALB, Service Discovery, ECS/EKS, Celery, Kafka, Amazon Aurora PostgreSQL, Elasticache Redis, and S3
Build abstractions within Terraform to simplify the architecture and increase velocity and ownership
Find solutions to Ramp's toughest scaling, performance, and low latency problems
Participate in an On Call rotation to solve critical production events

What we offer

100% medical, dental & vision insurance coverage for you
Partially covered for your dependents
One Medical annual membership
401k (including employer match on contributions made while employed by Ramp)
Flexible PTO
Fertility HRA (up to $10,000 per year)
Parental Leave
Unlimited AI token usage
Pet insurance
Centralized home-office equipment ordering for all employees

Fulltime

Software Engineer, Infrastructure - Analytics

The Scaling team designs, builds, and operates critical infrastructure that enab...

Location

United States , San Francisco

Salary:

295000.00 - 445000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Strong proficiency in Python/Rust and backend software development, ideally in large codebases
Experience with distributed systems and scalable data processing infrastructure, including technologies like Kafka, Spark, Trino/Presto, Iceberg
Hands-on experience operating services in Kubernetes, with familiarity in tools like Terraform and Helm
Comfort working across the stack - from low-level infrastructure components to application logic - and making trade-offs to move quickly
A focus on building systems that are both technically sound and easy for others to use
Curiosity and adaptability in fast-changing environments, especially in high-growth orgs

Job Responsibility

Design, build, and operate scalable backend systems that support various ML research workflows, including observability and analytics
Develop reliable infrastructure that supports both streaming and batch data processing at scale
Creating internal-facing tools and applications as needed
Debug and improve performance of services running on Kubernetes, including operational tooling and observability
Collaborate with engineers and researchers to deliver reliable systems that meet real-world needs in production
Help improve system reliability by participating in the on-call rotation and responding to critical incidents

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Select Country

Kafka Infrastructure Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?