This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Platform Infrastructure Engineers (PIEs) are responsible for managing, maintaining, and evolving the foundational infrastructure that supports our internal Infrastructure-as-a-Service platform. PIEs specialize in building robust, scalable, and highly available systems such as Kubernetes clusters, Kafka ecosystems, and cloud environments. They apply sound engineering principles, operational discipline, and mature automation to ensure a reliable infrastructure foundation for all platform services and applications. Our team helps to improve automation infrastructure reliability. It empowers Braze’s other engineering teams to leverage the infrastructure products and platforms we create easily. Braze operates at a massive scale with over 3.3 billion monthly active users across our customers, collecting hundreds of billions of data points each month and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. As a Platform Software Engineer at Braze, you will collaborate with your team and consumer engineering teams to build and continuously improve the infrastructure as a service platform that every other team at Braze depends on.
Job Responsibility:
Design and Manage Infrastructure: Build, optimize, and manage foundational systems such as Kubernetes clusters, Kafka ecosystems, and cloud resources (e.g., EC2, S3)
Develop automation frameworks for provisioning and maintaining infrastructure at scale
Design scalable architectures to support seamless operations of platform services
Ensure Reliability and Performance: Implement high-availability and fault-tolerant infrastructure strategies
Collaborate with Platform Software Engineers and Product teams to establish and meet Service Level Objectives (SLOs) for infrastructure components
Continuously monitor and optimize infrastructure performance to meet evolving demands
Incident Response and Resilience: Be part of a PagerDuty rotation to respond to infrastructure-related incidents
Implement failover strategies, backups, and disaster recovery plans to mitigate risks
Conduct root cause analyses and retrospectives to improve system resilience
Collaboration and Knowledge Sharing: Partner with Platform Software Engineers to integrate infrastructure with service abstractions and APIs
Document processes, tools, and best practices to streamline development and operations
Share expertise and mentor team members to foster a culture of operational excellence
Innovate and Automate: Stay ahead of emerging trends in infrastructure technology and integrate innovative solutions
Reduce manual tasks by developing automated solutions for infrastructure provisioning, scaling, and maintenance
Optimize for performance, security, and scalability in all aspects of infrastructure design
Requirements:
5+ years managing and scaling large-scale infrastructure systems in production environments
Proven expertise with Kubernetes, Kafka, cloud services (AWS/GCP/Azure), and configuration management tools
Proficiency in infrastructure as code (IaC) tools like Terraform, Ansible, or similar
Strong understanding of network architecture, security, and performance tuning
Familiarity with containerization, service discovery, and load-balancing technologies
Have an excellent ability to manage multiple tasks and expectations at once
Focused on building robust, scalable systems that enhance developer productivity
Collaborative and communicative, with a strong desire to document and share knowledge
Committed to continuous improvement, staying ahead of technological advancements
Have an urge to collaborate, document, and deliver quickly
Collaborating across the global remote teams, often working asynchronously
Document everything so you don't need to learn the same thing (or plan the same work) twice
Delivering fast to delight our customers - even internal ones
What we offer:
Competitive compensation that may include equity
Retirement and Employee Stock Purchase Plans
Flexible paid time off
Comprehensive benefit plans covering medical, dental, vision, life, and disability
Family services that include fertility benefits and equal paid parental leave
Professional development supported by formal career pathing, learning platforms, and a yearly learning stipend
A curated in-office employee experience, designed to foster community, team connections, and innovation
Opportunities to give back to your community, including an annual company-wide Volunteer Week and donation matching
Employee Resource Groups that provide supportive communities within Braze