This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You will be an integral part of our engineering team, collaborating closely with backend, data, and AI/ML engineers to design and implement infrastructure solutions that support our rapid growth and evolving product needs. You'll work directly with our security and compliance teams to ensure our infrastructure meets stringent healthcare regulations, including HIPAA and SOC 2. Members located around the San Francisco Bay Area come to office once or more weekly. While relocation is encouraged, we are a remote-first company. You must be able to work during the core hours in the Pacific timezone. For compliance reasons, we cannot employ you outside the United States.
Job Responsibility:
Design, implement, and maintain highly available, scalable, and secure cloud infrastructure on Google Cloud Platform (GCP) to support our Clinical Data Intelligence Platform and SMART on FHIR applications
Develop and implement Infrastructure as Code (IaC) solutions to automate provisioning, configuration, and management of our environments
Build and optimize CI/CD pipelines using tools like GitHub Actions to enable rapid and reliable deployment of our applications and services
Implement and manage monitoring, alerting, and logging solutions with a focus on OpenTelemetry to ensure system health, identify performance bottlenecks, and proactively address issues
Collaborate with engineering teams to optimize application performance, reliability, and cost efficiency
Ensure strict adherence to security best practices and compliance requirements (e.g., HIPAA, SOC 2) across all infrastructure components and processes
Manage and improve database infrastructure (e.g., PostgreSQL, AlloyDB, Cloud SQL) for performance and scalability
Take part in rotating on-call duties to maintain the stability and availability of our production systems
Requirements:
7+ years of experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering roles
Deep expertise in cloud platforms, with significant experience in Google Cloud Platform (GCP) services (e.g., Kubernetes (GKE), Cloud Run, Cloud SQL, AlloyDB, Pub/Sub, Cloud Storage, Compute Engine)
Strong proficiency with Infrastructure as Code (IaC) concepts and tools
Extensive experience with CI/CD pipeline development and management, specifically with GitHub Actions
Solid understanding of containerization technologies, especially Docker and Kubernetes
Proficiency in scripting languages (e.g., Python, Bash) for automation and system management
Experience with monitoring, logging, and alerting tools, with a focus on OpenTelemetry
Demonstrated knowledge of database administration and optimization, particularly PostgreSQL, AlloyDB, and Cloud SQL
A strong commitment to information security and privacy, with experience in implementing and maintaining systems in compliance with frameworks like HIPAA and SOC 2
Excellent problem-solving skills and the ability to troubleshoot complex infrastructure issues
Clear communication, documentation, and collaboration skills
Nice to have:
Familiarity with healthcare data standards (e.g., FHIR, HL7) and experience supporting SMART on FHIR applications