This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Reliability Engineer, you will own the reliability, performance, and operational excellence of Proscia’s on-premise installations at customer sites. Our platform powers high-resolution digital pathology and AI-assisted workflows in clinical and research environments, often running on customer-managed infrastructure. You’ll ensure these deployments are stable, performant, secure, and continuously improving. This is a hands-on role focused on on-premise container based deployments, systems performance, and real-world operational problem solving in complex customer environments.
Job Responsibility:
Deploy, configure, and support Proscia’s container based application stack in on-premise customer environments
Own system reliability across customer installations, including uptime, performance, backup/recovery, and upgrade workflows
Diagnose and resolve production incidents, performing deep root cause analysis across application, container, host, storage, and networking layers
Optimize performance for large image datasets and AI workloads running on customer-managed compute infrastructure
Improve installation automation, configuration management, and repeatability across diverse environments
Develop and refine monitoring, logging, and alerting patterns appropriate for customer-hosted deployments
Collaborate closely with Engineering, Customer Success, and Support to translate field learnings into product and operational improvements
Document best practices and create operational playbooks for internal teams and customers
Leverage AI tools (e.g., Claude, code assistants, automation frameworks) to streamline troubleshooting, scripting, and operational workflows
Requirements:
Deep hands-on experience deploying and operating containerized applications using container tools such as Docker and Docker Compose in production environments
Strong Linux systems expertise (process management, networking, storage, security hardening, performance tuning)
Expert troubleshooting skills in distributed systems across application, container, and infrastructure layers
Experience in enterprise networking technologies, and the ability to troubleshoot and suggest corrections in customer infrastructure
Familiarity with operating software in customer-managed or on-premise environments
Experience supporting data-intensive systems, ideally involving large image files or compute-heavy workloads
Working knowledge of observability practices (logs, metrics, tracing) and pragmatic monitoring approaches in non-cloud-native environments
Comfort working directly with customers or customer-facing teams to resolve high-impact issues
Demonstrated AI fluency: hands-on experience using tools like Claude, ChatGPT, GitHub Copilot, or similar AI systems to enhance productivity, automate tasks, and solve technical problems
A mindset aligned with Proscia’s values: ownership, speed, simplification, and a willingness to challenge the status quo
Nice to have:
Experience with healthcare or regulated environments
Exposure to Kubernetes (for hybrid or future-state deployments)
Experience with infrastructure automation or configuration management tools
Familiarity with database performance tuning for large datasets
Experience supporting GPU-enabled workloads
What we offer:
Competitive pay
savings, schedule, and insurance options that promote long-term health and personal growth
office environment designed for creativity and agility