This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
At Tote, we’re on a mission to deliver a seamless and reliable digital experience for racing fans across the UK and beyond. As a Site Reliability Engineer (SRE), you’ll play a critical role in keeping our online platforms and infrastructure fast, stable, and scalable — especially during the most exciting moments in the racing calendar. This is an opportunity to shape how we build, monitor, and continuously improve our systems while working in a collaborative, forward-thinking engineering culture.
Job Responsibility:
Monitor live production systems, using observability tools to detect potential issues before they impact users
Take proactive steps to optimise system performance and stability
Analyse telemetry data, identify bottlenecks, and drive improvements across infrastructure and applications
Lead the development of SRE strategy, defining standards, best practices, and ways of working
Work closely with engineering, operations, and product teams to shape SLAs, SLOs, and error budgets
Design and implement performance testing strategies to simulate peak traffic
Build intuitive dashboards, refine alerting systems, and create tools that provide clear visibility into system health
Work alongside software engineers to design scalable solutions
Work with compliance teams to meet internal and regulatory standards
Work with operations to ensure smooth deployment and monitoring
Play a crucial role in incident management, from leading real-time response efforts to conducting thorough post-incident reviews
Requirements:
Deep understanding of system reliability, performance optimisation, and cloud-native architectures
Strong hands-on experience with modern observability tools such as Grafana, Prometheus, and OpenTelemetry
Solid grasp of distributed systems and networking fundamentals
Confident working with infrastructure-as-code tools (like Terraform) and container orchestration platforms such as Kubernetes
Experience in cloud environments, ideally AWS
Comfortable coding in at least one modern programming language such as Go or .NET
Calm, analytical mindset for high-pressure situations
Advocate for modern engineering practices, championing DevOps culture, CI/CD pipelines, and automation
Strong communication skills
What we offer:
Competitive Basic Salary
Discretionary Bonus Scheme
Company Shares Option Plan
Contributory pension scheme
Life insurance (4 x basic salary)
Simply Health Cash Plan
Holiday entitlement (33 days inclusive of bank holidays)
Study Support and opportunity for progression and development