Senior Site Reliability Engineer Job at Atlassian

Senior Site Reliability Engineer

Baxter International is seeking a skilled Senior Principal Site Reliability Engi...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes

What we offer

Healthcare benefits
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan
Flexible Spending Accounts
Educational assistance programs
Paid holidays
Paid time off
Paid parental leave
Commuting benefits
Employee Discount Program

Fulltime

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills
Applicants must be authorized to work for any employer in the U.S.
Unable to sponsor or take over sponsorship of an employment visa at this time.

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.

What we offer

Support for Parents
Continuing Education/Professional Development
Employee Health & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
Medical and dental coverage starting day one
Insurance coverage for basic life, accident, short-term and long-term disability
Business travel accident insurance
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan

Fulltime

Senior Site Reliability Engineer

Architect, develop, and troubleshoot large-scale infrastructure, maintain and im...

Location

United States , San Francisco

Salary:

180960.00 - 230900.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Software Engineering, Information Technology or a closely related field
four years of experience as a Site Reliability Engineer architecting, developing, and troubleshooting large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash
networking technologies such as TCP/IP or security
four years of experience in automation development and infrastructure as code implementation using tools such as Terraform, AWS CloudFormation, Ansible, or Salt
knowledge of Linux and Windows systems
cloud technologies within AWS, GCP, Azure
continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices
must pass technical interview

Job Responsibility

Architect, develop, and troubleshoot large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash and networking technologies such as TCP/IP or security
provide real-time feedback on production systems
work with product family and platform developers to maintain and improve services and performance with a strong customer focus
utilize a variety of data collection, enrichment, analytics, and visualizations to support our complex systems
responsible for automation development and infrastructure-as-code implementation using tools such as Terraform, AWS CloudFormation, Ansible, and/or Salt
build solutions to enhance availability, performance, and stability for hundreds of Atlassian enterprise customers in the cloud as well as automate repetitive work
help secure the cloud architecture with penetration testing, vulnerability resolution, and compliance audit responses
responsible for continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices

What we offer

Health and wellbeing resources
paid volunteer days

Fulltime

Senior Site Reliability Engineer

As a Senior Site Reliability Engineer on the Platform team, you will identify is...

Location

United States , Denver; San Francisco

Salary:

138000.00 - 191000.00 USD / Year

Checkr

Expiration Date

Until further notice

Requirements

Degree in Computer Science (or related field)
6+ years of experience in building tools with Python (preferred), GoLang, or Ruby
6+ years of experience in maintaining and observing production customer-facing environments in AWS or Azure
6+ years of experience as a member of an incident response team
Deep understanding of the fundamental infrastructure and platform concepts behind a micro-service architecture, REST APIs, and asynchronous queueing models
Experience with observability platforms and frameworks like Datadog, Splunk, Grafana, Prometheus, or OpenTelemetry
Strong collaboration, documentation, communication, and project management skills
Experience with container orchestration using Kubernetes/Docker/Terraform
Experience driving platform adoption across engineering teams, guided by a self-service and product-first approach
A passion for customer-centricity and building relationships with other teams

Job Responsibility

Collaborate, drive, and execute architectural discussions with cross-functional teams
Lead cross-team projects and SREs' technical roadmap to enable engineering and help Checkr customers
Design, build, ship, and maintain the core observability libraries, tools, and patterns used by all of Checkr’s engineering teams
Proactively engage across teams to foster service reliability, efficiency, and scalability
Troubleshoot complex production issues across the stack, with respect to performance, availability, and data quality
Present detailed technical information and benefits of the Checkr platform to a wide array of customers, including operations, developers, technical architects, and executives

What we offer

A fast-paced and collaborative environment
Learning and development allowance
Competitive cash and equity compensation and opportunities for advancement
100% medical, dental, and vision coverage
Up to $25K reimbursement for fertility, adoption, and parental planning services
Flexible PTO policy
Monthly wellness stipend, home office stipend
In-office perks such as lunch four times a week, commuter stipend, and an abundance of snacks and beverages

Fulltime

Senior Vice President, Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...

Location

Singapore , Singapore

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor’s degree or equivalent work experience
8+ years of relevant work experience
Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority
Certification or formal training in site reliability engineering concepts and practices
Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
5+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
Experience working on observability, logging and metrics toolsets
Experience of k8s and container technologies such as Docker, Openshift and EKS.
Experience with public cloud technologies such as AWS, GCP or Azure
Experience with Secrets products such as HashiCorp Vault or CyberArk

Job Responsibility

Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
Architecting and building tools and platforms that provide capabilities for SRE
Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organization
Actively owning production level incidents till resolution.

Fulltime

Senior Site Reliability Engineer

We are seeking an experienced Senior Site Reliability Engineer (L3) to join our ...

Location

India , Chennai

Salary:

Not provided

Arcadia

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
8–10+ years of experience in SRE/DevOps/Cloud Engineering, with deep hands-on exposure to AWS and Kubernetes
Strong hands-on experience with: Terraform & Infrastructure as Code
AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
Jenkins + Groovy, GitHub Actions, ArgoCD, FluxCD
Kubernetes troubleshooting and operations
Prometheus/Grafana/Datadog observability stacks
Proven ability to operate in high-scale, high-uptime, multi-environment production systems
Experience building automation via Python/Bash and reducing operational toil
Strong understanding of incident management, root cause analysis, and reliability engineering principles

Job Responsibility

Design, build, and maintain AWS infrastructure (EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using Terraform and CloudFormation
Lead all aspects of Kubernetes operations including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
Implement and enhance observability across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
Drive FinOps initiatives, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
Manage database operations across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
Maintain and improve secret management using Vault, AWS Secrets Manager, and Parameter Store
Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems

What we offer

Competitive compensation and employee stock options
Hybrid/remote-first working model (India-based role, with global collaboration)
Flexible leave policy
Comprehensive medical insurance (self + family members)
Annual performance cycle + quarterly recognition awards
A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation

Fulltime

Senior Site Reliability Engineer

AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce. Its ...

Location

India , Hyderabad

Salary:

25.00 - 30.00 INR / Year

AutoRABIT

Expiration Date

Until further notice

Requirements

6+ years of experience in SRE, DevOps, or related roles
Solid hands-on experience with AWS services (EKS, ECS, EC2, RDS, S3, Redis, etc.)
Proficient in writing Terraform infrastructure scripts
Strong scripting skills in Python using Boto3
Deep understanding of monitoring/logging tools (ELK, CloudWatch, TrendMicro)
Experience building and managing CI/CD pipelines (CodeBuild, CodePipeline)
Knowledge of infrastructure security and incident response practices
Willing to work in rotational shifts and rotational week-offs
Bachelor’s in computers or any related field
AWS certifications is preferred

Job Responsibility

Provision and manage AWS infrastructure using Terraform
Write AWS Lambda functions (Python3 + Boto3) to automate operational tasks
Set up monitoring, logging, and alerting with ELK, TrendMicro, and AWS CloudWatch
Configure alerts for performance and security anomalies
Develop and maintain CI/CD pipelines using AWS CodeBuild and CodePipeline
Troubleshoot production issues and contribute to blameless postmortems
Contribute to system hardening and security compliance efforts
Responsibility to adhere to set internal controls

Fulltime

Senior Software Engineer, Site Reliability

Babylist is looking for a Senior Software Engineer, Site Reliability to join our...

Location

United States; Canada

Salary:

186818.00 - 224183.00 USD; CAD / Year

Babylist

Expiration Date

Until further notice

Requirements

8+ years of experience as a Site Reliability Engineer or similar role
Experience supporting high-traffic consumer-facing websites
Proficiency with Terraform
Strong experience working with AWS cloud-based infrastructure and services
Proficiency with Docker and Kubernetes
Solid understanding of cloud-native systems design
Troubleshooting and debugging skills
Experience designing and supporting CI systems
Familiar with monitoring and alerting best practices
Proven experience in on-call management best practices

Job Responsibility

Manage and build our AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform
Improve the speed and reliability of our Continuous Integration (CI) systems
Provide support to developers in troubleshooting issues
Establish, communicate, and support best practices for monitoring and alerting

What we offer

Company-paid medical, dental, and vision insurance
Retirement savings plan with company matching and flexible spending accounts
Generous paid parental leave and PTO
Remote work stipend
Perks for physical, mental, and emotional health, parenting, childcare, and financial planning

Fulltime

Senior Site Reliability Engineer

Atlassian

Location:

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Requirements:

Nice to have:

Additional Information:

Job Posted:
March 19, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Vice President, Cloud Security Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Software Engineer, Site Reliability

Senior Site Reliability Engineer

Atlassian

Location:

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Requirements:

Nice to have:

Additional Information:

Job Posted:March 19, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Vice President, Cloud Security Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Software Engineer, Site Reliability

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
March 19, 2025