This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Site Reliability Engineer (DevOps) - Netherlands Mist AI is the AI-native networking solution from HPE Juniper Networking and our Software Engineering team is seeking a Site Reliability Engineer to join our talented team and build high quality technology solutions that revolutionize networking, powered by Artificial Intelligence in the cloud. Mist AI provides services through SaaS applications to many Fortune 100 and Fortune 500 customers. You will take ops projects from concept through to launch. You will be responsible for maintaining and improving the company's production environment for rapid scaling and outstanding performance. You will be responsible to help us keep stellar uptime and reliability. The improvements you implement will be felt by the entire organization. For you to be successful, you need to have a hunger to learn and adapt to new technology quickly. We demand people who are naturally curious, can self-start and share learnings and outcomes effectively with a distributed team. You need to be a builder at heart.
Job Responsibility:
Express your passion about infrastructure as code and continuous deployment to build scalable and highly reliable systems
Define and own KPIs around system availability, quality and scale
Partner with our developers and quality engineering teams to automate the monitoring, alerting, availability and scalability of our applications and systems
Ensure system availability and business continuity by implementing redundant servers/services
Manage after-hours infrastructure updates and maintenance
Proactively research and propose the use of new concepts, processes, technologies, and tools
Partner with software developers to create Mist standards for Microservices (APIs, schemas, serialization, data stores and best practices)
Run secure and scalable applications for highly available, multi-region, AWS and GCP deployments
Ship code several times per week
Be a part of our On-Call rotation
Own disaster recovery and business continuity plans
Requirements:
An extensive background in developing and operating large-scale cloud-based distributed applications
Direct experience developing/running applications on AWS or Google Cloud
Laser focus and be able to design infrastructure solutions for scalability, reliability, high availability, performance, security, software maintainability, and operational excellence
The ability to 'fix the plane while in flight' (not just support greenfield solutions)
The ability to prioritize existing technical and infrastructure debt, and experience to build and execute a plan to pay it off
Delivering web-scale infrastructure for a global market at high release velocity
A deep understanding of distributed system design and dependency management
Must have solid experience with at least 2 of the languages: Go, Java, Python
10+ years industry experience in managing infrastructure
5 years Kubernetes administration in a large-scale SaaS environment
5 years maintaining production systems on AWS or GCP
3 years in implementing, managing, and monitoring metrics specific to SaaS applications
3 years using infrastructure as code software (eg. Terraform, AWS and Google Cloud Deployment, CloudFormation)
Experience of working with or contributing directly to Open Source projects
Understanding and experience of leading/managing technology products
Understand machine learning techniques and tools. Translate business requirements into data models and implement them for scale and production ready systems
Experience of working with failure-based testing
Experience working in a test-driven development environment