This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Software Engineer on the Core Infrastructure team at Harvey, you'll play a critical role in designing and building new infrastructure systems while equally scaling and strengthening our existing infrastructure. Our infrastructure is the foundation that powers every user interaction with Harvey — processing billions of prompt tokens and millions of daily requests across our global legal AI platform.
Job Responsibility:
Design and build scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions
Own and evolve our multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management
Lead technical initiatives around observability, incident response, and operational excellence — building systems that enable rapid detection and resolution of issues
Architect and optimize our distributed systems for reliability, including load balancing, quota management, and failover mechanisms
Partner with Product Engineering and Security teams to ensure our infrastructure is an accelerant, not a constraint
Drive infrastructure-as-code practices using tools like Terraform and Pulumi to enable reproducible, auditable deployments
Mentor junior / intern engineers and raise the technical bar across the organization through code reviews, design reviews, and technical leadership
Requirements:
4+ years of experience in Infrastructure Engineering or Platform Engineering in a production environment
Long track record building and scaling complex, large-scale distributed systems
Deep proficiency with cloud infrastructure platforms (Azure preferred
GCP or AWS experience transfers well)
Strong fluency in Infrastructure as Code (IaC) tools — Terraform, Pulumi, or CloudFormation
Solid understanding of Kubernetes, container orchestration, networking, and cloud security at scale
Experience with observability tools (Datadog, Sentry) and incident response practices (PagerDuty, Incident.io)
Strong programming skills in Python, Go, or similar languages
Excellent problem-solving skills, a "spidey sense" of where things could go wrong, and a commitment to operational excellence
Nice to have:
Experience building infrastructure for AI/ML workloads or high-throughput inference systems
Background with distributed rate limiting, load balancing, or quota management systems
Experience operating multi-tenant platforms with strict security and compliance requirements
Track record of leading complex cross-functional projects and delivering measurable impact