This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them. The Kora Group Coordinator team is responsible for the core coordination layer that ensures large fleets of consumers and transactional workloads within Kora remain balanced, fault-tolerant, and predictable. We are transitioning from broker-embedded coordination to a standalone, cloud-native service, prompting us to rethink how group membership, state, and metadata are stored, replicated, and recovered at scale. The challenges we address are at the intersection of distributed systems, storage, and cloud operations: designing replicated state machines, managing high-volume, low-latency traffic, handling failures and rebalances gracefully, and achieving all of this across multiple regions and tenants while meeting strict reliability and performance requirements.
Job Responsibility:
Work on the core coordination layer of a modern cloud data platform: a highly available, low-latency service that keeps large fleets of distributed workloads balanced, fault-tolerant, and predictable
Tackle real distributed systems challenges: scaling coordination and metadata services to thousands of nodes, hardening them against failures, and making them observable and easy to operate
Contribute to enhanced reliability and performance for all Confluent customers through the improvements you deliver
Requirements:
Design and implement high‑scale, low‑latency features in distributed coordination cloud-native services
Own complex projects end to end: refine ambiguous requirements into clear milestones, drive execution across components and services, and deliver production‑ready changes with appropriate test coverage and observability
Lead reliability and operational improvements for core services, including health checks, incident reduction, resilience testing, rollout and migration strategies, dashboards and alerts
Raise the technical bar through thoughtful design documents, code reviews and technical mentorship of other engineers
Collaborate cross‑functionally with product, platform, security, and other engineering teams to define requirements, evaluate trade‑offs and deliver designs that balance reliability, cost, and customer impact
Contribute to operational excellence by improving runbooks, debugging workflows, tooling, and automation around service health, incidents and rollouts
Nice to have:
Experience with high‑throughput data infrastructure, streaming systems or large‑scale platform services
Experience with migration projects, such as moving state from tightly coupled components to external services, multi‑phase rollouts or transitions from legacy to new architectures