This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our Site Reliability Engineering team is growing, and we are looking for a highly experienced Staff Site Reliability Engineer to help shape the future of reliability, scalability, and performance at AlphaSense. This is a hands-on, high-impact role where you will architect core reliability platforms, lead by example in incident response, and drive cultural adoption of SRE best practices across our global engineering organization. Our mission is to engineer our platform to the reliability standards of mission-critical systems, targeting 99.99% uptime, while continuously enhancing our systems and processes. This role is key to that mission and goes beyond traditional system maintenance; it’s about pioneering the platforms, practices, and culture that enable engineering to scale effectively. You will act as a force multiplier, mentoring fellow engineers, influencing architectural decisions, and setting the technical bar for reliability across the company.
Job Responsibility:
Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Requirements:
8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
at least 3+ of those years operating in a Senior+ SRE position
strong background in running production SaaS systems at scale
proficiency in at least one programming/scripting language (Python, Go, or similar)
hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
familiarity with advanced observability (OTEL, continuous profiling)
proven incident management experience, including leading high-severity incidents and postmortems
strong troubleshooting skills across the full stack