Site Reliability Engineering (SRE) Job at Fyld (Lisboa)

Principal Site Reliability Engineer

We are looking for a reliability expert who is passionate about scaling Cloud se...

Location

United States , San Francisco; Mountain View

Salary:

170800.00 - 274300.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Expert-level proficiency with 8+ years experience in at least Java
Expert-level proficiency with 5+ years experience in public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
Expert-level proficiency with 5+ years experience in operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
Experience in driving large, complex, cross-organizational initiatives from inception to completion
Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
Experience in leadership positions, able to influence others and drive impactful outcomes through delegation
An ability and desire to mentor and coach engineers

Job Responsibility

Advocate for reliability methodologies
Work with a variety of platform, product and SRE teams to both build reliability into our platform and drive adoption of those practices into our products
Analyze and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency

What we offer

Health and wellbeing resources
Paid volunteer days
Equity
Bonuses
Commissions

Fulltime

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...

Location

France , Paris

Salary:

Not provided

Ledger

Expiration Date

Until further notice

Requirements

8+ years on cloud engineering at scale, on organizations operating SaaS solutions
Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
Customer focused with the ability to identify and understand both internal and external customer's needs
Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
Excellent presentation and written communication
Ability to deal with ambiguity, high level of pressure and rapidly changing environments
Engineering degree.

Job Responsibility

Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
Perform integration of platform software components
Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
Influence and create standards & best practices in support of service level objectives
Automate key SRE metrics including SLOs/SLAs and error budgets
Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
Ensure control of technical debt and promote quality practices
Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability

What we offer

Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
Flexibility: A hybrid work policy
Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
Well-being: Personal development, coaching & fitness with our dedicated partners
Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
High tech: Access to high performance office equipment and gadgets, including Apple products
Transport: Ledger reimburses part of your preferred means of transportation
Discounts: Employee discount on all our products.

Fulltime

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...

Location

India , Bangalore

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Minimum 2 years of experience managing or leading cloud operations teams
Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
Familiarity with modern CI/CD automation and tools
Excellent communication, stakeholder management, and team-building skills
Experience scaling SRE practices in high-growth or large-scale environments
Ability to balance long-term reliability initiatives with short-term delivery needs.

Job Responsibility

Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
Define and track key reliability metrics, and report on team performance and system health to leadership
Contribute to hiring, onboarding, and career development for SREs.

What we offer

Health & Wellbeing benefits for physical, financial, and emotional wellbeing
Personal & Professional Development programs
Unconditional inclusion in the workplace.

Fulltime

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team responsible for Private and Public...

Location

Singapore , Singapore

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor’s degree or equivalent work experience
6+ years of relevant work experience
Highly motivated self-starter with excellent interpersonal and communication skills
Certification or formal training in site reliability engineering concepts and practices
Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
Experience working on observability, logging and metrics toolsets
Experience of k8s and container technologies such as Docker, Openshift and EKS
Experience with public cloud technologies such as AWS, GCP or Azure
Experience with Secrets products such as HashiCorp Vault or CyberArk

Job Responsibility

Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
Architecting and building tools and platforms that provide capabilities for SRE
Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
Actively owning production level incidents till resolution.

What we offer

Equal opportunity employer
Accessibility support for persons with disabilities.

Fulltime