This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Serval is building an AI platform to automate complex IT workflows for modern enterprises. As a Software Engineer, Infrastructure, you'll build and scale the foundational systems that power Serval's AI agents and workflow automation platform. A critical part of this role is enabling and supporting self-hosted deployments for enterprise customers who require on-premises or private cloud installations. This role is for engineers with deep expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing infrastructure support who want to shape the technical architecture of a fast-growing platform.
Job Responsibility:
Design, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines
Write and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments
Build and maintain deployment packages, installation scripts, and infrastructure templates that enable customers to self-host Serval in their own environments
Provide technical guidance and troubleshooting support to enterprise customers deploying and operating self-hosted instances of Serval
Ensure high availability, performance, and reliability of production systems through monitoring, alerting, incident response, and capacity planning
Build internal tools and platforms that enable product engineers to deploy, test, and operate services efficiently
Collaborate with engineering teams to design resilient, scalable architectures that support both cloud-hosted and self-hosted deployment models
Profile and optimize system performance, including compute, storage, networking, and database layers
Implement security best practices and ensure infrastructure meets enterprise compliance requirements for both managed and self-hosted deployments
Requirements:
3+ years building and operating large-scale distributed systems in production environments
Strong experience writing and maintaining Terraform for infrastructure provisioning and management
Deep knowledge of at least one major cloud provider (AWS, GCP, or Azure), including compute, networking, storage, and managed services
Experience building, packaging, and supporting self-hosted or on-premises software deployments for enterprise customers
Proficiency in Python, Go, or similar languages for building automation, tooling, and infrastructure services
Strong understanding of networking, databases, containerization (Docker, Kubernetes), and orchestration systems
Experience with monitoring, logging, alerting, and incident management tools (e.g., Datadog, Prometheus, Grafana, PagerDuty)
Ability to communicate technical concepts clearly to customers and provide infrastructure support and guidance
Ability to debug complex system issues, analyze performance bottlenecks, and implement effective solutions
Nice to have:
Experience with Kubernetes in production, including cluster management and workload orchestration
Background in CI/CD systems, build pipelines, and deployment automation
Experience with workflow orchestration systems such as Temporal, including long-running workflows, retries, and failure handling
Experience with data infrastructure (streaming systems like Kafka, data warehouses, ETL pipelines)
Knowledge of security and compliance frameworks (SOC 2, ISO 27001, GDPR)
Experience supporting enterprise customers with complex deployment requirements
Previous work at a high-growth startup or experience scaling infrastructure rapidly