CrawlJobs Logo

Site Reliability Engineer 2

https://www.pagerduty.com Logo

PagerDuty

Location Icon

Location:
Portugal , Lisbon

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join us. At PagerDuty, you'll tackle complex problems, collaborate with kind and ambitious people, and help build a more equitable world—all in a flexible, award-winning workplace. We are seeking a Site Reliability Engineer 2 to join our Release Engineering team in our Lisbon office. As part of our growing tech hub in Portugal, you will be instrumental in building and maintaining our platform engineering solutions, focusing on CI/CD enablement, Kubernetes infrastructure, and developer tooling. You'll work closely with development teams to improve deployment workflows and platform reliability.

Job Responsibility:

  • Deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
  • Help maintain the overall health of the platform, including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
  • Continuously strive to improve the internal developer experience and the software development lifecycle
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Participate in a 24/7 on-call rotation

Requirements:

  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Experience with Kubernetes and container orchestration
  • Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)

Nice to have:

  • Experience with monitoring, observability, and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk, Prometheus, Grafana)
  • Knowledge of configuration management systems (e.g. Ansible, Chef, Puppet)
  • Experience in automating releases, continuous integration/delivery systems and relevant tools (e.g. Jenkins, CircleCI, Travis CI, Buildkite)
  • Experience with GitOps practices and tools like ArgoCD
What we offer:
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Paid volunteer time off: 20 hours per year
  • Company-wide hack weeks
  • Mental wellness programs

Additional Information:

Job Posted:
April 26, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer 2

New

Principal Site Reliability Engineer

We are looking for a reliability expert who is passionate about scaling Cloud se...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expert-level proficiency with 10+ years experience in one or more prominent languages such as Java, Go or Python
  • Expert-level proficiency with 7+ years experience in public cloud offerings (with at least 2+ years specifically on GCP)
  • Expert-level proficiency with 7+ years experience in operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
  • An ability and desire to mentor and coach engineers
Job Responsibility
Job Responsibility
  • Analyse and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency
  • Cross team and functional boundaries to advocate for reliability methodologies
  • Work with a variety of platform, product and SRE teams to both build reliability into our platform and drive adoption of those practices into our products
  • Be the driving force for change
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Corporate Tools is looking for a Site Reliability Engineer. You will be a tradit...
Location
Location
United States
Salary
Salary:
175000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience
  • 5+ years of experience in software engineering
  • 2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
  • Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi
  • Strong proficiency with Kubernetes, Docker, and container orchestration in production environments
  • Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic
  • Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts
  • Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached)
  • Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement
  • Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis
Job Responsibility
Job Responsibility
  • Stop problems before they start
  • Fix issues quickly and learn from them
  • Help keep systems steady, secure, and running
  • Work closely with DevOps engineers to build out tools and automation
  • Take ownership
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days
  • Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Open concept office with friendly coworkers
  • Creative environment where you can make a difference
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

We are recruiting a Junior SRE for a company that provides an advanced data, ope...
Location
Location
Portugal , Lisboa
Salary
Salary:
Not provided
https://www.precisers.pt Logo
Precise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Up to 2-3 years of experience in a Site Reliability Engineering SRE, DevOps, or Production Engineering role, with a deep understanding of SRE principles and best practices
  • Incident management expertise, including triaging, escalation, and resolution of high-severity outages
  • Proficiency in at least one coding language Python or Java) for automation and debugging
  • Hands-on experience in Kubernetes K8s for managing and orchestrating containerized applications
  • Cloud experience AWS preferred) with exposure to key services like EC2, S3, Lambda, and CloudWatch
  • Excellent communication skills to articulate technical challenges and solutions effectively
  • Strong troubleshooting and problem-solving skills, with experience diagnosing complex production issues
  • Ability to stay calm under pressure, multitask, and prioritize effectively in fast-moving environments
  • Fluency in English (spoken and written) is required
  • Must have the legal right to work in the country
  • Fulltime
Read More
Arrow Right

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 3+ years of relevant work experience
  • Highly motivated self-starter with good interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices would be beneficial
  • Prior experience working towards SLIs, SLOs and observability capabilities
  • 2+ years experience in Python alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with Secrets products such as HashiCorp Vault or CyberArk beneficial but not essential
  • Experience with CICD tools such as terraform, Jenkins, Ansible.
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer (Developer Experience)

KnowBe4’s Site Reliability Engineers help ensure that our platforms are reliable...
Location
Location
India , Kochi
Salary
Salary:
Not provided
knowbe4.com Logo
KnowBe4
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS/Ph.D. or equivalent plus 2 years experience
  • Comfortable maintaining existing scripts in one or more programming languages (e.g. Python, Ruby, Javascript)
  • Experience maintaining infrastructure in AWS
  • Experience maintaining workflows for continuous integration and continuous deployment (CI/CD) - GitLab is preferred
  • Effective communication skills
  • Ability to easily adapt while working on competing projects
  • Demonstrated ability to learn new technologies quickly
Job Responsibility
Job Responsibility
  • Work with other Site Reliability Engineers to build highly scalable and resilient applications and infrastructure in AWS
  • Maintain and improve extensible infrastructure-as-code using Terraform
  • Learn, maintain, and improve our existing deployment strategies
  • Deliver effective observability, monitoring, and alerting patterns for KnowBe4’s applications and infrastructure
  • Assist in identifying and resolving production incidents
  • Correct deficiencies in our current applications and infrastructure
  • Implement solutions to complex technical problems
What we offer
What we offer
  • Company-wide bonuses based on monthly sales targets
  • Employee referral bonuses
  • Adoption assistance
  • Tuition reimbursement
  • Certification reimbursement
  • Certification completion bonuses
  • Fulltime
Read More
Arrow Right

Staff Engineer, Site Reliability

LearnUpon is looking for a Staff Site Reliability Engineer to join our team in I...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
learnupon.com Logo
LearnUpon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in a software or Ops role
  • 5+ years of cloud engineering experience, with at least 2 years experience with AWS
  • Experience deploying Microservice environments, using containerisation technologies such as Kubernetes and Docker
  • Experience in designing and implementing Observability tech stacks
  • Have championed the benefits of Observability to Engineering teams
  • Can architect the design of SLO/SLI implementation that balances the needs of different teams
  • Familiar with cost analysis of Observability metrics gathering, Engineering effort, and tooling
  • Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security and disaster recovery
  • Experience with implementing IaaC (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
  • Able to effectively communicate technical ideas to and collaborate with both technical and non-technical peers
Job Responsibility
Job Responsibility
  • Identifying opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions
  • Leading our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management
  • Driving the processes to maintain resilient, scalable and cost-effective infrastructure
  • Working with other Engineering teams to provide infrastructure solutions that meet their ongoing requirements
  • Building tools focused on measuring, monitoring and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability
  • Reacting quickly to changing customer and business needs
  • Participate in on-call rota
  • Mentoring junior talent
What we offer
What we offer
  • Work in a fun and supportive environment with regular team events
  • Excellent career progression
  • Structured learning environment
  • Competitive salary and company ESOP
  • Private health insurance
  • 26 days annual leave
  • Fulltime
Read More
Arrow Right
New

Field Service Reliability Engineer

Founded in 1985, ATS is a company with a presence in the United States, Mexico a...
Location
Location
United States , Milwaukee, Wisconsin
Salary
Salary:
50.96 - 65.19 USD / Hour
atpchemical.com Logo
Advanced Technology Products
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in engineering (ABET accredited)
  • Eight or more years of reliability experience across 2 or more manufacturing sites
  • Demonstrates ability to perform full array of reliability tool sets
  • Strong technical understanding of electrical or mechanical components, tools, and designs
  • Ability to complete a failure mode effects analysis, cause and effect diagrams, root cause failure analysis, life-cycle costing, and risk analysis
  • Ability to research and apply new equipment technology / trends
  • Robust problem solving, mathematical, analytical, and decision making skills
  • Proficiency with computers, maintenance systems, and applications, including Microsoft Office
  • Excellent verbal communication, facilitation, and presentation skills
  • Strong reporting and technical writing capability
Job Responsibility
Job Responsibility
  • Extensive travel required. (Local, National)
  • Promotes and adheres to the ATS safety culture
  • Engages in various work environments and industries to lead reliability centered maintenance efforts
  • Mentors, coaches, and provides reliability best practices for applications in customer facilities, by customer personnel
  • Identifies top potential issues leading to lost production and preventable maintenance spending. Communicates findings with leadership
  • Provides solutions to root cause deficiencies and demonstrates economic benefits to their correction
  • Actively drives the implementation of equipment improvement projects
  • Identifies and implements current and new processes / technologies to increase equipment performance and uptime
  • Champions systems and best practice procedures towards a proactive manufacturing culture
  • Analyzes equipment performance, failure data, and corrective maintenance history to develop and deploy engineering solutions, improved maintenance strategies, preventative maintenance optimization, and other reliability techniques
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.