Senior Platform Reliability Engineer Job at Bsport (Barcelona)

Senior Platform Engineer, Storage

We’re looking for a Senior Platform Engineer specializing in storage services to...

Location

Ireland , Dublin

Salary:

102000.00 - 124000.00 EUR / Year

dbt Labs

Expiration Date

Until further notice

Requirements

Significant experience designing and operating relational data and object storage platforms in production
Hands-on experience with one or more cloud providers (AWS, Azure, GCP) and declarative Infrastructure as Code (Terraform preferred)
Programming/scripting ability in Python, Go, Rust or Bash
Excellent communication skills and experience working asynchronously on a fully remote, distributed team

Job Responsibility

Design, operate, and scale storage based infrastructure systems across multiple tenancy models (single vs. multi-tenant) and public clouds (AWS, Azure, and GCP)
Deepen our team’s expertise in one more areas including: relational databases, search, caching, queuing, and streaming - helping strengthen platform scalability, security, and developer experience
Partner with Architecture, Release Engineering, Network, Compute, and Security teams to provide a seamless platform for application teams
Leverage tools and languages such as Terraform, Kubernetes, Helm, Argo CD, Python, SQL, Go, Bash, and Datadog
Participate in a balanced on-call rotation in an environment that values continuous improvement, helping to improve reliability and reduce operational toil

What we offer

Equity Stake
Unlimited PTO
Excellent healthcare coverage
Paid parental leave
Wellness and home office stipends

Fulltime

Senior Platform Engineer

Glide is looking for a Senior Platform Engineer to join our Infrastructure team ...

Location

Salary:

Not provided

Glide

Expiration Date

Until further notice

Requirements

5+ years of experience as a platform engineer/SRE
3+ years experience building and maintaining highly available and scalable distributed data sources
Experience with Google Cloud Platform services like Cloud SQL, Cloud Run, AlloyDB, or equivalent
Experience orchestrating complex systems with Kubernetes
Proficiency in TypeScript development
Strong SQL skills
can speak to covering index optimization strategies
Experience designing, building and running data-intensive event-driven architectures
You are a clear and effective communicator, be it when you write code, write emails, or explain complex technical issues to non-technical co-workers
Passionate and self-motivated, with a demonstrated ability to work in a fast-paced and evolving environment

Job Responsibility

Managing our existing infrastructure in GCP
Driving our platform evolution as the complexity and sophistication of our product only increases
Managing our Github/GH Actions based build pipeline
Provide build, test, and runtime infrastructure to service teams
Ensure patterns are established (e.g., for database throttling, request rate limiting, etc…) to protect Glide’s uptime
Monitor infrastructure costs and coordinate improvements when necessary
Drive SRE tooling and best practices around observability and alerting
Write, review, and maintain code primarily in TypeScript
Write architecture briefs and proposals, carry out code experiments, and build prototypes to learn how we can achieve reliable scale with our systems
Provide technical leadership, mentorship, pairing opportunities, and code review to encourage the growth of others

What we offer

competitive salary and benefits package
a supportive and dynamic remote work environment
opportunities for career growth

Fulltime

Senior Data Engineer – Data Engineering & AI Platforms

We are looking for a highly skilled Senior Data Engineer (L2) who can design, bu...

Location

India , Chennai, Madurai, Coimbatore

Salary:

Not provided

OptiSol Business Solutions

Expiration Date

Until further notice

Requirements

Strong hands-on expertise in cloud ecosystems (Azure / AWS / GCP)
Excellent Python programming skills with data engineering libraries and frameworks
Advanced SQL capabilities including window functions, CTEs, and performance tuning
Solid understanding of distributed processing using Spark/PySpark
Experience designing and implementing scalable ETL/ELT workflows
Good understanding of data modeling concepts (dimensional, star, snowflake)
Familiarity with GenAI/LLM-based integration for data workflows
Experience working with Git, CI/CD, and Agile delivery frameworks
Strong communication skills for interacting with clients, stakeholders, and internal teams

Job Responsibility

Design, build, and maintain scalable ETL/ELT pipelines across cloud and big data platforms
Contribute to architectural discussions by translating business needs into data solutions spanning ingestion, transformation, and consumption layers
Work closely with solutioning and pre-sales teams for technical evaluations and client-facing discussions
Lead squads of L0/L1 engineers—ensuring delivery quality, mentoring, and guiding career growth
Develop cloud-native data engineering solutions using Python, SQL, PySpark, and modern data frameworks
Ensure data reliability, performance, and maintainability across the pipeline lifecycle—from development to deployment
Support long-term ODC/T&M projects by demonstrating expertise during technical discussions and interviews
Integrate emerging GenAI tools where applicable to enhance data enrichment, automation, and transformations

What we offer

Opportunity to work at the intersection of Data Engineering, Cloud, and Generative AI
Hands-on exposure to modern data stacks and emerging AI technologies
Collaboration with experts across Data, AI/ML, and cloud practices
Access to structured learning, certifications, and leadership mentoring
Competitive compensation with fast-track career growth and visibility

Fulltime

Senior Site Reliability Engineer Cloud Platform

Zilliz is a fast-growing startup developing the industry’s leading vector databa...

Location

Salary:

175000.00 - 225000.00 USD / Year

Zilliz

Expiration Date

Until further notice

Requirements

4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems
Proficiency in scripting languages such as Python, Go, or Java
Strong knowledge of container orchestration technologies like Kubernetes and Docker
Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools
Experience with infrastructure as code tools such as Terraform or Ansible
Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo
Proven ability to troubleshoot complex distributed systems and resolve issues promptly
Bachelor’s degree or above in computer science, software engineering, or other relevant disciplines
Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously

Job Responsibility

Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms
Ensure the reliability, availability, and performance of Zilliz’s distributed database systems
Develop and implement strategies for monitoring, incident management, and disaster recovery
Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention
Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness
Collaborate with software engineers to enhance system reliability, scalability, and performance
Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes
Actively contribute to the Milvus Vector Database open-source community, focusing on improving reliability and operational efficiency

Fulltime

Senior Distributed Systems Engineer - Platform Engineering

For our Platform Engineering team, we are looking for programmers with strong in...

Location

Poland

Salary:

Not provided

RTB House

Expiration Date

Until further notice

Requirements

Excellent understanding of how complex IT systems work - from the hardware level, through software, to algorithms
Ability to proactively define requirements, ask appropriate questions and draw conclusions that will combine technical constraints and business needs
Ability to lead the design and implementation of a solution
Experience in leading project teams
Willingness to be involved in topics that go beyond programming and design, such as responsibility for technical areas or communication with other teams
Proactive attitude, independence in taking action
Extensive experience in programming and readiness to implement key system elements as well as involvement in code reviews
Good knowledge of methods of creating concurrent programs and distributed systems
Ability to critically analyze created solutions in terms of performance (from estimating the theoretical performance of designed systems to detecting and removing actual performance problems in production)
C1 level in English and Polish

Job Responsibility

Plan and then hands-on lead further development within a given technical area like deployment, monitoring, databases or load balancing, in the context of existing infrastructure within RTB House
Coordinate the work of a project team of 3-4 people, also making arrangements with other teams and units within RTB House
Ensure the reliability and scalability of the solutions built

What we offer

Attractive compensation
Work in a team of enthusiasts who are willing to share their knowledge and experience
Flexible cooperation conditions - we do not have core hours, we do not have holiday limits
Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project

Senior Platform Engineer - CI/CD & AI Automation (AI-first)

Groupon is undergoing a critical platform transformation, modernizing its core d...

Location

Czechia , Prague

Salary:

Not provided

Groupon

Expiration Date

Until further notice

Requirements

5+ years of dedicated experience in Platform Engineering, DevOps, or Infrastructure roles
Deep expertise building, scaling, and migrating CI/CD systems, with strong practical experience in Jenkins and/or GitHub Actions
Expertise in scripting and automation (Python, Go, or Bash)
Solid understanding of container technologies, Kubernetes, and cloud build systems
Proven experience leveraging AI tooling (e.g., Claude Code, code analysis) to meaningfully increase developer output and optimize platform work
Excellent communication and ability to drive technical decisions across multiple platform and product teams

Job Responsibility

Platform Transformation: Lead the design, planning, and execution of the Jenkins-to-GitHub Actions migration across a large portfolio of microservices
Pipeline Engineering: Design and optimize high-performance, secure, and observable CI/CD workflows across GitHub Actions, Jenkins, and Kubernetes environments
AI-First Automation: Drive an AI-First workflow by leveraging tools (e.g., Copilot, code generation) to eliminate infrastructure toil, accelerate development, and analyze pipeline failures
Core Automation: Develop robust platform automation (e.g., Python, Go, Bash) to improve build efficiency, artifact caching, reliability, and repository hygiene
Security & Compliance: Harden CI/CD infrastructure with robust controls for secrets management, RBAC, audit logging, and secure runner design
Observability: Implement and enhance CI/CD observability using tools like Prometheus, Grafana, and OpenTelemetry to provide deep insights into performance and reliability
Technical Leadership: Mentor engineers and partner across Cloud, Security, and Developer Experience teams to define and evolve our end-to-end delivery platform architecture

Senior Distributed Systems Engineer - Ad Display Platform Engineering

The Bidding Platform organization is the core of the RTB business, processing ov...

Location

Poland

Salary:

Not provided

RTB House

Expiration Date

Until further notice

Requirements

7+ years of hands-on experience in software engineering
Proficiency in programming
Excellent understanding of how complex IT systems work (from the hardware level, through software, to algorithmics)
Very good knowledge of fundamental Internet protocols and technologies (DNS, HTTP, cookies and others)
Good knowledge of basic methods of creating concurrent programs and distributed systems (from thread level to geo-distributed clusters level)
Practical ability to observe, monitor and analyse the operation of production systems (and draw valuable conclusions from it)
The ability to critically analyze the solutions created in terms of performance (from estimating the theoretical performance of the designed systems to detecting and removing actual performance problems in production)
General knowledge of issues (typical problems and methods of solving them) in the areas of 'high scalability' and 'high availability'
C1 level in English and Polish

Job Responsibility

Implement and maintain (in all aspects, including setting up environment, writing configuration code, monitor production) high-quality backend services for displaying Ads globally, focusing on extreme performance and scalability
Develop tools (deployment, testing platforms, web performance and reliability monitoring), and critical optimizations to drive measurable improvements in critical user performance metrics for ad rendering and display
Write, test, and deploy robust, efficient, and well-documented code in Java/Python, ensuring adherence to the highest coding and performance standards
Participate in code reviews, knowledge sharing sessions, and help implement technical standards and best practices within the team

What we offer

Projects focused on extreme performance and high code quality – solid code reviews are our standard
Collaboration within an interdisciplinary, self-sufficient team (including DevOps, database experts, backend developers, product designers, and QA engineers)
Hardware and software tailored to your preferences (e.g., MacBook, AI tool licenses)
Flexible working conditions – no core hours, fully remote cooperation possible

Senior Site Reliability Engineer

As a Site Reliability Engineer, you will focus on ensuring that the Prolific pla...

Location

United Kingdom

Salary:

Not provided

Prolific

Expiration Date

Until further notice

Requirements

5+ years with Google Cloud Platform, GKE, and the Kubernetes ecosystem with experience with Terraform and Terragrunt
Strong programming skills in Python
Strong experience in observability principles and tooling
Experience in GitOps flows and platforms for Kubernetes, such as ArgoCD
Deep understanding of system architecture and scalability principles
Strong collaboration and communication skills to work with cross-functional teams

Job Responsibility

Develop and maintain highly available infrastructure using modern infra-as-code techniques, with a focus on terragrunt and terraform
Manage and optimise Kubernetes clusters and their workloads with a focus on reliability and performance
Participate in incident response and remediation, working with relevant product teams and stakeholders to resolve production issues efficiently, including creating and maintaining runbooks
Review and optimise other areas of our tooling stack, such as CICD or release strategies
Foster a culture of continuous improvement, such as enhancing documentation and upskilling teams in cloud architecture and kubernetes
Improve observability and alerting systems across our application and infrastructure, ensuring proactive detection of system degradation
Collaborate with Engineering teams to foster an SRE culture, including contributing defining SLO’s, SLA’s and error budgets
Design and implement automation strategies to ensure managed services remain up-to-date, secure, and performant
Lead and support initiatives that automate processes to improve system efficiency, resilience and reduce toil
Organising, supporting and responding to on-call incidents

What we offer

competitive salary
benefits
remote working
impactful, mission-driven culture

Fulltime

Select Country

Senior Platform Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?