This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Senior DevOps / SRE / Platform Engineer, you will be a key technical leader responsible for the reliability, scalability, and security of the entire GEEIQ platform. You'll tackle our biggest infrastructure challenges, from scaling our Kubernetes clusters to maturing our observability stack and refining our deployment pipelines. We are looking for an experienced and pragmatic engineer who is passionate about building robust, automated, and secure systems. You will work alongside our existing DevOps engineer, mentoring them while driving our platform's technical direction. Your goal is to empower our software engineers to ship features quickly and confidently, knowing the underlying platform is rock-solid. This is a hands-on role where you will solve complex operational problems and build the foundation for our next stage of growth.
Job Responsibility:
Own, manage, and evolve our AWS cloud infrastructure, ensuring it is scalable, cost-effective, and secure
Lead the architecture and hands-on implementation of our infrastructure using Terraform, maintaining and elevating our Infrastructure as Code (IaC) standards
Take charge of managing, scaling, and securing our heavily-used Kubernetes (EKS) clusters and the microservices they run
Administer and optimize our core data services (RDS, Elasticsearch, MongoDB) from an operational perspective, focusing on performance, backups, and resilience
Re-architect and refine our CI/CD pipelines in GitHub Actions to make them faster, more reliable, and more secure
Champion developer productivity by building tools, automating workflows, and reducing friction in the development lifecycle
Lead the charge on improving our observability strategy. Design and implement a robust monitoring, logging, and alerting framework using tools like Grafana, Prometheus, and native AWS services
Enhance our incident response processes, contribute to on-call rotations, and foster a culture of blameless post-mortems
Drive infrastructure security best practices across the board, playing a critical role in our journey towards SOC2 compliance
Implement and manage security controls related to IAM, network security (VPCs, security groups), vulnerability scanning, and secrets management
Requirements:
Extensive hands-on experience in a DevOps, SRE, or Platform Engineering role, managing production systems in a cloud environment
Deep expertise with AWS and its core services (e.g., EKS, RDS, Lambda, EC2, S3, IAM, VPC)
Proven, expert-level proficiency with Terraform for managing complex infrastructure as code
Extensive experience managing production workloads on Kubernetes, including cluster management, scaling, and security
Demonstrated ability to design, build, and significantly improve CI/CD pipelines, with specific experience in GitHub Actions
A strong track record of building out and improving observability stacks (monitoring, logging, tracing)
Experience implementing security controls and working within compliance frameworks (experience with SOC2 is a major plus)
Proven ability to mentor and collaborate with other engineers
Strong proficiency in at least one scripting language (e.g., Python, Go, Bash). Familiarity with JavaScript/TypeScript is a plus
Operational knowledge of managing databases like RDS (Postgres/MySQL), MongoDB, and Elasticsearch is a huge plus
An automation-first approach to everything, with a passion for reducing manual toil
A practical, pragmatic, and hands-on approach to problem-solving
Excellent collaboration and communication skills, with the ability to work effectively with software engineers
Nice to have:
Familiarity with JavaScript/TypeScript
Operational knowledge of managing databases like RDS (Postgres/MySQL), MongoDB, and Elasticsearch
Experience with SOC2
What we offer:
GEEIQ Day - 1 extra day of paid leave per year on top of annual leave allowance
Regular Socials - paid socials
Flexible Hours - core business hours 10am to 5pm
Remote Working - 5 additional days per year (can be used to WFH or for International Working)