This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Platform Software Engineers (PSWEs) design and build the distributed systems that power Braze's massive-scale background processing platform. We own Sidekiq at Braze—processing more than a trillion jobs daily across Kubernetes clusters worldwide. Our work spans autoscaling systems, metrics pipelines, reliable job execution, and internal frameworks that make distributed processing safe for application teams. Braze operates at a massive scale: 3.3 billion monthly active users, hundreds of billions of data points collected each month, and billions of messages sent daily. Our tech stack is rooted in Ruby on Rails, Go, MongoDB, Redis, and Kafka. As a PSWE, you'll collaborate with application teams to evolve the Sidekiq platform they depend on and improve the reliability, performance, and developer experience.
Job Responsibility:
Develop Braze’s embedded frameworks that enable large-scale distributed processing
Design, build, and operate internal software frameworks that power Braze’s asynchronous and background processing systems at massive scale
Evolve and extend frameworks built on technologies such as Sidekiq to reliably execute over a trillion jobs per day across a globally distributed platform
Own scaling behavior, reliability guarantees, failure modes, and operational safety of these systems
Provide opinionated abstractions, tooling, and guardrails that allow application teams to use distributed processing safely without needing to manage underlying complexity
Improve observability, debuggability, and operational ergonomics for large-scale job-processing systems
Manage incidents: Be on a PagerDuty rotation to respond to availability incidents and provide support for other engineers
Use your on-call shift to prevent incidents from ever happening
Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc.
Requirements:
5+ years of distributed systems development or platform/infrastructure experience
Think about systems - interfaces, boundaries, edge cases, failure modes, behaviors, and specific implementations
Have an urge to collaborate, document, and deliver quickly
Collaborating across the global remote teams, often working asynchronously
Document everything so you don't need to learn the same thing (or plan the same work) twice
Delivering fast to delight our customers–even internal ones
Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
Have a desire to solve everyday challenges facing software engineers and automate their toil away
Have an excellent ability to manage multiple tasks and expectations at once
Experienced in working on large-scale API-driven systems
Experienced in application and systems observability
Experience with distributed systems, message queues, or background job processing, with a strong focus on Sidekiq
Strong Ruby and Rails experience, with Go experience being helpful (our primary languages)
Interest in reliability engineering—failure modes, retry semantics, idempotency
What we offer:
Competitive compensation that may include equity
Retirement and Employee Stock Purchase Plans
Flexible paid time off
Comprehensive benefit plans covering medical, dental, vision, life, and disability
Family services that include fertility benefits and equal paid parental leave
Professional development supported by formal career pathing, learning platforms, and a yearly learning stipend
A curated in-office employee experience, designed to foster community, team connections, and innovation
Opportunities to give back to your community, including an annual company-wide Volunteer Week and donation matching
Employee Resource Groups that provide supportive communities within Braze