CrawlJobs Logo

Senior DevOps / Site Reliability Engineer

n-ix.com Logo

N-iX

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our client is a leader in sustainable packaging solutions, leveraging cutting-edge cloud technologies to enhance production, operational excellence, and innovation. Join our team and contribute to eco-friendly advances with state-of-the-art technology and DevOps practices.

Job Responsibility:

  • Cloud Infrastructure: Architect, implement, and manage Microsoft Azure resources including App Services, Virtual Machines, Container Instances, AKS, SQL Server/Instance, and Azure SQL
  • DevOps Automation: Design and maintain CI/CD workflows using Git, Github Actions, SonarQube Cloud, Terraform, and Docker
  • SRE Practices: Develop and monitor SLOs, SLIs, and golden signals
  • instrument applications and infrastructure
  • build Datadog dashboards for real-time business and incident reporting
  • Incident Management: Lead incident response, root cause analysis, and post-mortem documentation. Maintain high availability and rapid recovery for business-critical systems
  • Monitoring & Observability: Extensive use of Datadog for monitoring, logging, and performance analytics
  • Configuration Management: Work with Shell, YAML, JSON, and Python for scripting, automation, and configuration
  • System Administration: Administer Ubuntu, RHEL, CentOS, and (entry-level) Windows Server environments
  • Collaboration: Utilize Atlassian Suite (Jira, Confluence) for documentation, ticketing, and project tracking
  • contribute to ITSM/ITIL frameworks
  • AI & Productivity Tools: Integrate and leverage tools like Claude, Github CoPilot, and other AI productivity solutions
  • Reporting: Create dashboards and business reports to provide actionable insights and drive continuous improvement

Requirements:

  • Microsoft Azure (App Services, VM, Container Instances, AKS, SQL Server, Azure SQL)
  • Git, Github, Github Actions
  • SonarQube Cloud, Terraform, Docker
  • Datadog (extensive), SRE concepts (SLOs, SLIs, golden signals, instrumentation)
  • Incident management, dashboard development, business reporting
  • Shell scripting, YAML/JSON configs, Python
  • Ubuntu, RHEL, CentOS, Windows/Server (entry)
  • Atlassian Suite (Jira/Confluence)
  • ITSM / ITIL familiarity
  • AI tools (Claude, Github CoPilot, etc.)
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 5+ years demonstrated experience in DevOps, SRE, or cloud engineering roles
  • Analytical thinker, problem solver, and proactive communicator
  • Strong collaboration skills, especially across cross-functional and remote teams
  • Ability to thrive in a fast-paced, innovative business environment

Nice to have:

  • Microsoft Azure: App Insights, IoT Hub, Azure DevOps, API Management
  • DevOps: Ansible, Argo CD, CodeRabbit, Artifactory
  • SRE Practices: Capacity planning, cost optimization
  • Programming Languages: JavaScript, PowerShell
  • Other Tools: Snowflake, PagerDuty, Salesforce MuleSoft Anypoint
What we offer:
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

Additional Information:

Job Posted:
March 19, 2026

Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior DevOps / Site Reliability Engineer

Senior Site Reliability Engineer

Baxter International is seeking a skilled Senior Principal Site Reliability Engi...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes
What we offer
What we offer
  • Healthcare benefits
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Flexible Spending Accounts
  • Educational assistance programs
  • Paid holidays
  • Paid time off
  • Paid parental leave
  • Commuting benefits
  • Employee Discount Program
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
  • Applicants must be authorized to work for any employer in the U.S.
  • Unable to sponsor or take over sponsorship of an employment visa at this time.
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.
What we offer
What we offer
  • Support for Parents
  • Continuing Education/Professional Development
  • Employee Health & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • Medical and dental coverage starting day one
  • Insurance coverage for basic life, accident, short-term and long-term disability
  • Business travel accident insurance
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are seeking an experienced Senior Site Reliability Engineer (L3) to join our ...
Location
Location
India , Chennai
Salary
Salary:
Not provided
arcadia.com Logo
Arcadia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
  • 8–10+ years of experience in SRE/DevOps/Cloud Engineering, with deep hands-on exposure to AWS and Kubernetes
  • Strong hands-on experience with: Terraform & Infrastructure as Code
  • AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
  • Jenkins + Groovy, GitHub Actions, ArgoCD, FluxCD
  • Kubernetes troubleshooting and operations
  • Prometheus/Grafana/Datadog observability stacks
  • Proven ability to operate in high-scale, high-uptime, multi-environment production systems
  • Experience building automation via Python/Bash and reducing operational toil
  • Strong understanding of incident management, root cause analysis, and reliability engineering principles
Job Responsibility
Job Responsibility
  • Design, build, and maintain AWS infrastructure (EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using Terraform and CloudFormation
  • Lead all aspects of Kubernetes operations including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
  • Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
  • Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
  • Implement and enhance observability across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
  • Drive FinOps initiatives, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
  • Manage database operations across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
  • Maintain and improve secret management using Vault, AWS Secrets Manager, and Parameter Store
  • Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
  • Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems
What we offer
What we offer
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce. Its ...
Location
Location
India , Hyderabad
Salary
Salary:
25.00 - 30.00 INR / Year
autorabit.com Logo
AutoRABIT
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in SRE, DevOps, or related roles
  • Solid hands-on experience with AWS services (EKS, ECS, EC2, RDS, S3, Redis, etc.)
  • Proficient in writing Terraform infrastructure scripts
  • Strong scripting skills in Python using Boto3
  • Deep understanding of monitoring/logging tools (ELK, CloudWatch, TrendMicro)
  • Experience building and managing CI/CD pipelines (CodeBuild, CodePipeline)
  • Knowledge of infrastructure security and incident response practices
  • Willing to work in rotational shifts and rotational week-offs
  • Bachelor’s in computers or any related field
  • AWS certifications is preferred
Job Responsibility
Job Responsibility
  • Provision and manage AWS infrastructure using Terraform
  • Write AWS Lambda functions (Python3 + Boto3) to automate operational tasks
  • Set up monitoring, logging, and alerting with ELK, TrendMicro, and AWS CloudWatch
  • Configure alerts for performance and security anomalies
  • Develop and maintain CI/CD pipelines using AWS CodeBuild and CodePipeline
  • Troubleshoot production issues and contribute to blameless postmortems
  • Contribute to system hardening and security compliance efforts
  • Responsibility to adhere to set internal controls
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

HiveWatch is seeking a Staff Site Reliability Engineer to join our Platform Team...
Location
Location
United States , El Segundo
Salary
Salary:
183000.00 - 235000.00 USD / Year
hivewatch.com Logo
HiveWatch
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of software engineering experience with strong coding skills in production environments
  • 5+ years of SRE, DevOps, or production operations experience
  • Expertise with cloud platforms (AWS preferred) and containerized applications (Docker, Kubernetes)
  • Experience with Infrastructure as Code (Terraform, CloudFormation, or similar)
  • Proficiency in at least one object oriented programming language in our tech stack (Java, Kotlin, Python)
  • Hands-on experience with relational databases and SQL performance optimization
  • Experience with monitoring and observability tools (Prometheus, Grafana, DataDog, or equivalent)
  • Strong debugging skills across distributed systems and microservices architectures
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Own the reliability of mission-critical systems including production monitoring, alerting, and capacity planning
  • Debug and resolve complex production issues across the full stack, from infrastructure to application code
  • Participate in a regular on-call rotation to provide 24/7 coverage for critical systems
  • Perform root cause analysis requiring deep code-level investigation and implement preventive measures
  • Build automation and tooling to reduce operational toil and improve system reliability
  • Maintain CI/CD pipelines, observability infrastructure, and database performance optimization
  • Increase the resiliency, scalability, and maintainability of production environments
  • Establish on-call procedures and disaster recovery processes
  • Provide technical leadership and mentorship to foster engineering excellence and reliability culture
What we offer
What we offer
  • Comprehensive health coverage: medical, dental, vision, and life insurance
  • Cutting-edge work in an emerging field with huge growth potential
  • Competitive compensation packages designed to reward top talent
  • A modern, newly renovated HQ right on Main Street in El Segundo, CA
  • 401(k) with a 4% company match to help you invest in your future (match launches in 2026)
  • Flexible paid time off so you can recharge when you need it
  • Additional benefits include ClassPass credits and a discount on pet insurance
  • A family-friendly, compassionate culture that values balance and belonging
  • Eligible to participate in HiveWatch Equity Incentive Plan
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

You'll join the team primarily responsible for making our self-hosted product of...
Location
Location
United States
Salary
Salary:
200000.00 - 220000.00 USD / Year
tines.com Logo
Tines
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years in an SRE or similar role
  • Experience architecting, maintaining, and supporting systems with containerized applications, ideally k8s
  • Experience with troubleshooting deployment issues, creating clear documentation, and designing robust escalation paths
  • Comfortable learning new technologies
  • Experience with Ruby, Rails, React, TypeScript, Postgres, Redis and Docker
  • Customer obsessed and willing to go deep into unfamiliar stacks to find root causes
  • Authorized to work for any employer in the U.S.
Job Responsibility
Job Responsibility
  • Making our self-hosted product offering as easy as possible for customers to install and operate
  • Owning all of the supporting services and tools that our self-hosted customers rely on
  • Identifying and fixing availability risks and monitoring gaps
  • Enabling software engineers to build new product features that work seamlessly across cloud and self-hosted environments
  • Using our own product extensively to automate infrastructure maintenance and to build DevOps tooling for customer deployments
  • Identifying areas for improvement in our containerized architecture and deployment strategies
  • Mentoring other engineers in container orchestration and Kubernetes best practices
  • Act as a subject matter expert for critical self-hosted customer issues
What we offer
What we offer
  • Competitive salary
  • Startup equity & extended exercise window
  • Matching retirement plans
  • Home office setup
  • Private healthcare plans
  • 25 days annual leave
  • Extra company holidays
  • Generous parental leave programs
  • Flexibility in how and where you work
  • Phone and home Internet allowance
  • Fulltime
Read More
Arrow Right

Senior Site Reliability/DevOps Engineer

AutoRABIT is looking for a Senior Site Reliability/DevSecOps Engineer to help de...
Location
Location
United States
Salary
Salary:
175000.00 - 200000.00 USD / Year
autorabit.com Logo
AutoRABIT
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Design, implement, and maintain scalable, resilient, and secure infrastructure using AWS
  • Develop and manage infrastructure as code using Terraform
  • Implement and manage CI/CD pipelines to automate deployments and ensure smooth delivery of applications
  • Monitor system performance, identify bottlenecks, and implement solutions to improve reliability and performance
  • Troubleshoot, resolve, and perform RCAs for incidents, while ensuring minimal disruption to services
  • Collaborate with development teams to ensure applications are designed for reliability and performance
  • Working Experience with Shell Scripting (Bash), Python or equivalent is required
  • Good Knowledge of programming languages such as Python, Go, or Java
  • Working Experience with configuration management tools such as Ansible or Chef
  • Implement and maintain monitoring, logging, and alerting systems to ensure the health and performance of our infrastructure
Job Responsibility
Job Responsibility
  • Contribute to the development and maintenance of frameworks for monitoring, automation and code to increase the scalability and reliability of the service
  • Assist both internal and customer facing teams with deployment of new software releases, VPN and other related security infrastructure interfacing
  • Assist with resolution of AutoRABIT service or customer issues as required
  • Participate in and practice sustainable incident response and blameless postmortems
  • Contribute to the automation of manual tasks, such as the provisioning of users in production and test environments
  • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration
  • Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve
  • Participate in a regular on-call or rotational schedule needed to support AutoRABIT servers, including weekends and holidays
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer - AWS

We’re currently looking for a skilled and enthusiastic Senior Platform Engineer ...
Location
Location
Germany , Hamburg or Berlin
Salary
Salary:
73000.00 - 90000.00 EUR / Year
aboutyou.de Logo
About You
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE), with a significant focus on cloud infrastructure
  • Fluency in scripting languages (e.g., Python, Go, Bash) for system automation, tooling development, and operational tasks
  • Deep expertise in managing and scaling production workloads within a major public cloud provider (e.g., AWS, Azure, or GCP), including strong familiarity with core services like Compute, Networking, Identity & Access Management (IAM), and Managed Database
  • Proven mastery of Infrastructure-as-Code (IaC) using AWS CloudFormation and/or Terraform in complex, multi-account environments
  • Demonstrated experience designing, implementing, and maintaining robust CI/CD pipelines
  • Solid knowledge of monitoring and logging solutions
  • Excellent communication and documentation skills, with the ability to articulate complex technical issues to technical stakeholders
Job Responsibility
Job Responsibility
  • Own and evolve the Commerce Cloud’s AWS infrastructure through the application of Infrastructure-as-Code (IaC) principles to ensure scalability, high availability, and cost efficiency
  • Design, implement, and optimize CI/CD pipelines and operational workflows utilizing tools such as GitLab CI, AWS CloudFormation, and Terraform
  • Establish and enforce comprehensive, high-quality documentation for all infrastructure, operational playbooks, and critical architecture decisions
  • Act as a subject matter expert and trusted advisor, partnering with application development teams to architect and provision infrastructure that meets their specific workload requirements
  • Drive collaborative efforts with GCP Platform Engineers on cross-cloud initiatives and work closely with Information Security Engineers to design and implement security controls and governance policies
  • Spearhead the evaluation and adoption of emerging cloud and platform technologies, continuously seeking opportunities to improve platform performance and developer experience
What we offer
What we offer
  • Hybrid working
  • Sports courses
  • Free access to code.talks
  • Exclusive employee discounts
  • Free drinks
  • Language courses
  • Laracast account for free
  • Company parties
  • Help in the relocation process
  • Mobility subsidy
  • Fulltime
Read More
Arrow Right