CrawlJobs Logo

SRE Ansible developer

realign-llc.com Logo

Realign

Location Icon

Location:
Canada , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

155000.00 USD / Year

Requirements:

  • Design and implement automation scripts using Ansible for infrastructure provisioning and configuration management
  • Develop and maintain monitoring solutions leveraging Dynatrace for application and system performance
  • Configure and optimize ITRS monitoring tools to ensure proactive alerting and incident management
  • Collaborate with development and operations teams to improve system reliability and scalability
  • Automate deployment pipelines and integrate with CICD processes for faster releases
  • Troubleshoot performance issues and implement solutions to enhance system resilience
  • Ensure compliance with security and operational standards across environments
  • Document automation workflows, monitoring configurations, and best practices for knowledge sharing
  • Total Experience: 6-8 years

Additional Information:

Job Posted:
March 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for SRE Ansible developer

Python Developer - Site Reliability Engineering (SRE)

We are seeking a skilled Python Developer with experience in the Site Reliabilit...
Location
Location
Canada , Montreal
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience with Python development
  • 6 years of experience working with Infrastructure as Code (Terraform and Ansible)
  • Experience with CI/CD pipelines, preferably GitHub Actions and Jenkins
  • Strong understanding of object-oriented design and development principles
  • Proficiency in Linux/Unix environments
  • Experience working with database technologies (preferably NoSQL), including data modeling, testing, and performance tuning
  • Ability to write reusable, optimized, maintainable, and well‑documented code following industry best practices
  • Experience implementing open-source monitoring and observability tools such as Prometheus, Grafana, Splunk or Open Telemetry
  • Strong problem‑solving skills and ability to take ownership of tasks and drive them independently to closure
  • Understanding of networking concepts (TCP/IP, DNS, Load Balancing)
Job Responsibility
Job Responsibility
  • Develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas
  • Develop, enhance, and integrate automation workflows for Public Cloud Service Providers (CSP), initially focused on Azure, and integrate with in-house tooling
  • Integrate automation workflows into CI/CD pipelines using GitHub Actions and Jenkins
  • Build proof-of-concept solutions in new areas of cloud and automation development
  • Provide technical support and debugging for application failures in both on-premises and cloud environments
  • Participate in all phases of the Software Development Life Cycle (SDLC), including analysis, design, coding, testing, and deployment
  • Evaluate, onboard, and implement emerging DevOps and automation tools to improve efficiency
  • Build and integrate observability into cloud platforms and solutions using open-source tools (Prometheus, Grafana, OpenTelemetry)
  • Identify, highlight, and reduce operational toil through automation, architectural improvements, and process optimization
  • Collaborate with global teams to understand requirements, develop high‑quality code, and deliver cloud-focused projects
Read More
Arrow Right

SRE Developer

We are looking for a proactive SRE Developer with 3–5 years of experience to man...
Location
Location
India , Bangalore South
Salary
Salary:
Not provided
votredircom.fr Logo
Wissen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience in SRE or DevOps operations
  • Expertise in CI/CD tools such as GitHub Actions, GitLab CI, Jenkins, Azure DevOps
  • Experience with monitoring and observability tools (Grafana, Prometheus, ELK, Splunk, Datadog, New Relic, etc.)
  • Good understanding of cloud platforms (AWS, Azure, or GCP)
  • Practical experience using AI tools in daily engineering workflows (CursorAI, ChatGPT, GenAI tools, automation assistants)
  • Ability to identify repetitive operational tasks and automate using AI or scripts
  • Familiarity with AI-driven troubleshooting and documentation
  • Proficiency in Python, Bash, PowerShell, or similar scripting languages
  • Exposure to Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, ARM, or Ansible
Job Responsibility
Job Responsibility
  • Handle SRE BAU operations including incident management, root cause analysis, problem resolution, and service restoration
  • Manage and maintain CI/CD pipelines and deployment automation across environments
  • Improve system reliability, scalability, and performance through automation and proactive monitoring
  • Implement and manage observability solutions including logging, metrics, alerting, and dashboards
  • Utilize AI tools (CursorAI, Generative AI, automation copilots) for faster troubleshooting, documentation, code generation, and incident analysis
  • Collaborate with engineering, product, and security teams to ensure smooth releases and secure infrastructure
  • Reduce manual operational effort through AI-assisted automation and scripting
  • Drive DevOps best practices and continuous improvement initiatives
  • Fulltime
Read More
Arrow Right

Browser Infrastructure Engineer

Infrastructure Engineer for Browser Development builds reliable, automated, and ...
Location
Location
Serbia , Belgrade
Salary
Salary:
Not provided
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years in software development infrastructure, preferably Chromium browsers
  • Hands-on DevOps and SRE experience, including monitoring and incident management
  • Proficiency in k8s, Terraform, Datadog, Sentry, AWS, Unix, TeamCity
  • Strong CI/CD implementation skills
  • Ability to thrive in Agile teams with excellent communication
Job Responsibility
Job Responsibility
  • Set up and maintain CI/CD pipelines for builds and testing (TeamCity, Jenkins, etc.)
  • Support and evolve Chromium browser development infrastructure (k8s, terraform, ansible)
  • Configure monitoring and alerting systems (Sentry, Datadog)
  • Manage cloud infrastructure (AWS), Linux servers, and virtual environments
  • Develop automation scripts in Bash, Python, and Go
  • Ensure high availability, resilience, and security of development infrastructure
  • Collaborate with developers to optimize workflows and resolve incidents
What we offer
What we offer
  • Dynamic team with growth and learning opportunities
  • Fulltime
Read More
Arrow Right

Senior+ Site Reliability Engineer

Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platf...
Location
Location
United States , San Francisco
Salary
Salary:
172000.00 - 209000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in cloud operations, SRE, or related roles
  • Background working with GPU workloads, high-performance computing, or latency/throughput-sensitive systems
  • Strong knowledge of Unix/Linux systems (kernel/user space) and networking including debugging complex issues in live systems
  • Understanding of cloud platforms and infrastructure fundamentals (Kubernetes, AWS/GCP, virtualization, distributed systems)
  • Familiarity with incident management practices and operational frameworks (SRE/ITIL/etc.)
  • Experience with monitoring and alerting tools (Prometheus, Grafana) or a strong willingness to learn
  • Familiarity with infrastructure-as-code and configuration management tools such as Terraform and Ansible
  • Basic Scripting and automation experience (Go, Python, C, C++, or similar)
  • Strong communication skills, with the ability to clearly articulate technical issues to diverse stakeholders
  • Ability to stay calm, focused, and effective in fast-moving or high-pressure situations
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to define and refine availability metrics for Crusoe’s cloud infrastructure, including establishing, tracking, and improving SLIs and SLOs
  • Assist in incident response by identifying, diagnosing, and resolving service disruptions, and support post-incident processes through RCA documentation and participation in post-incident reviews
  • Build, operate, and monitor infrastructure health using Crusoe’s observability stack (Prometheus, Grafana, Alertmanager, OpenTelemetry)
  • Identify and communicate reliability risks, performance bottlenecks, and early indicators of potential incidents that could impact service availability
  • Develop automation and tooling to reduce operational toil, minimize manual intervention, and enhance service recovery and self-healing capabilities
  • Partner with compute, network, storage, and platform teams to improve service resilience and strengthen disaster recovery readiness
  • Contribute to knowledge sharing, process improvements, and the development of operational best practices across the organization
  • Participate in ongoing training, mentorship, and professional development to grow into advanced SRE responsibilities
What we offer
What we offer
  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Lead Service Reliability Engineer

As Service Reliability Engineer (SRE) in DAMO service line, you will take a mult...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
thoughtworks.com Logo
Thoughtworks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You can program with one or more high-level languages such as Python, Golang, Shell scripting, Ruby or Java
  • You are familiar with DevOps and GitOps practices, driving the integration of observability automation into CI/CD pipelines, e.g.: GitLab, Jenkins, CircleCI or equivalent
  • You have in-depth knowledge of configuration management and Infrastructure as Code (IAC) tools such as Terraform, Ansible, ARM and CloudFormation for provisioning and managing infrastructure
  • You have an expertise in observability, logs, tracing and monitoring tools such as Grafana (Loki and Tempo), Prometheus, Graylog, Jaeger, Zipkin, ELK stack or equivalent
  • You have a strong understanding of container-based architecture and hands-on experience with orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc
  • You have in-depth experience in application and infrastructure performance tuning and scaling to handle heavy loads under different scenarios e.g.: Periodic traffic load and tsunami patterns
  • You have a good understanding of essential concepts such as quality gates encompassing SLI/SLO/SLA, chaos engineering, golden signals, blameless postmortem methodologies, synthetic monitoring, distributed tracing, end-user monitoring and performance testing
  • You have experience with network load balancing, security tech stacks, Transport Layer Security (TLS) and certificate management, and an understanding of standard networking protocols and configurations
  • You have strong communication and articulation skills, and are proficient in English
  • You are able to convey resolutions to audiences with varying degrees of technical/business proficiency and bring them to consensus
Job Responsibility
Job Responsibility
  • You will be responsible for understanding requirements or SRE goals in depth from both tech and business perspectives
  • You will provide solutions to improve reliability, including identifying and implementing mechanisms and architectures that enable fault tolerance and faster median time to respond and median time to detect
  • You will be responsible for enhancing the incident management process, including the development of an incident prioritization matrix, triage, communication, mitigation, post-mortem analysis and implementation of corrective actions
  • You will manage client stakeholder expectations and queries during production incidents, providing detailed technical analysis of issues and remediation plans for mitigation and prevention in future, and act as the interface for C-level executives, if or when needed
  • You will be a liaison with client engineering teams, build trust and productive relationships with senior client stakeholders and team leads to influence them in making better decisions
  • You will be responsible for identifying opportunities for enhancing system performance and reliability in alignment with business SLAs, SLOs, KPIs and objectives, and provide guidance and assistance to SRE teams in implementing the identified improvements
  • As an SRE expert, you will collaborate with Thoughtworks application development leads and solution architects, recommending changes in system design and adopting best practices for improved reliability from day one
  • You will oversee and mentor other SREs on the team, contributing to their growth and development
What we offer
What we offer
  • There is no one-size-fits-all career path
  • career is supported by interactive tools, numerous development programs and teammates who want to help you grow
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Location
Location
India , Putlibowli
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
May 16, 2026
Flip Icon
Requirements
Requirements
  • Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, Dynatrace
  • Build and manage CI/CD pipelines
  • Improve infrastructure provisioning and configuration through automation
  • Monitor the health, performance, and reliability of production systems and applications
  • Design, implement, and maintain automated monitoring solutions, using tools such as Datadog
  • Define and monitor service level objectives (SLOs), service level indicators (SLIs), and error budgets
  • Implement effective alerting systems
  • Lead root cause analysis (RCA) and post-mortem investigations
  • Respond to production incidents, diagnose root causes, and implement corrective actions
  • Create and maintain playbooks and documentation for incident response
Job Responsibility
Job Responsibility
  • Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, Dynatrace to automate deployment and management of infrastructure
  • Build and manage CI/CD pipelines to ensure efficient and reliable application deployments
  • Improve infrastructure provisioning and configuration through automation, minimizing manual interventions and reducing human error
  • Monitor the health, performance, and reliability of production systems and applications
  • Design, implement, and maintain automated monitoring solutions, using tools such as Datadog
  • Define and monitor service level objectives (SLOs), service level indicators (SLIs), and error budgets to ensure system reliability and availability meet customer expectations
  • Implement effective alerting systems to identify and address potential issues before they impact users
  • Lead root cause analysis (RCA) and post-mortem investigations after incidents to identify improvements and avoid recurrence
  • Respond to production incidents, diagnose root causes, and implement corrective actions
  • Create and maintain playbooks and documentation for incident response, troubleshooting, and recovery processes
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest ...
Location
Location
United States , Santa Clara
Salary
Salary:
151600.00 - 245300.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm, Kubernetes
  • Proficient in Python and/or Go
  • Expertise in managing applications in the Kubenetes cluster with autoscaling enabled
  • Experience in Production Engineering, DevOps, or Site Reliability
  • Expertise in the public cloud (GCP or AWS), especially in GCP
  • Strong Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
  • Ability to diagnose and troubleshoot complex distributed systems handling high-volume transactions
  • Excellent written and verbal communication, able to collaborate and rally support
Job Responsibility
Job Responsibility
  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build, and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate with SRE and Dev teams in the on-call rotation
  • Lead root cause analysis of critical business and production issues
What we offer
What we offer
  • restricted stock units and a bonus
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer (AIOps)

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest ...
Location
Location
United States , Santa Clara
Salary
Salary:
151600.00 - 245300.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, a related field, or equivalent professional experience
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Experience in Production Engineering, DevOps, or Site Reliability
  • Expertise in private or public cloud
  • Strong Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
  • Familiarity with CI/CD pipelines, GitLab and GitHub preferred
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
Job Responsibility
Job Responsibility
  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate with SRE and Dev teams in the on-call rotation
  • Lead root cause analysis of critical business and production issues
What we offer
What we offer
  • restricted stock units
  • bonus
  • employee benefits
  • Fulltime
Read More
Arrow Right