CrawlJobs Logo

SRE Ansible developer

Canada, Toronto 155000.00 USD / Year · Job Posted March 21, 2026
Apply Position
Job Link Share

Requirements

  • Design and implement automation scripts using Ansible for infrastructure provisioning and configuration management
  • Develop and maintain monitoring solutions leveraging Dynatrace for application and system performance
  • Configure and optimize ITRS monitoring tools to ensure proactive alerting and incident management
  • Collaborate with development and operations teams to improve system reliability and scalability
  • Automate deployment pipelines and integrate with CICD processes for faster releases
  • Troubleshoot performance issues and implement solutions to enhance system resilience
  • Ensure compliance with security and operational standards across environments
  • Document automation workflows, monitoring configurations, and best practices for knowledge sharing
  • Total Experience: 6-8 years

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

SRE Ansible developer

8 matching positions

SRE Developer

We are looking for a proactive SRE Developer with 3–5 years of experience to man...
Location
Location
India , Bangalore South
Salary
Salary:
Not provided
votredircom.fr Logo
Wissen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience in SRE or DevOps operations
  • Expertise in CI/CD tools such as GitHub Actions, GitLab CI, Jenkins, Azure DevOps
  • Experience with monitoring and observability tools (Grafana, Prometheus, ELK, Splunk, Datadog, New Relic, etc.)
  • Good understanding of cloud platforms (AWS, Azure, or GCP)
  • Practical experience using AI tools in daily engineering workflows (CursorAI, ChatGPT, GenAI tools, automation assistants)
  • Ability to identify repetitive operational tasks and automate using AI or scripts
  • Familiarity with AI-driven troubleshooting and documentation
  • Proficiency in Python, Bash, PowerShell, or similar scripting languages
  • Exposure to Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, ARM, or Ansible
Job Responsibility
Job Responsibility
  • Handle SRE BAU operations including incident management, root cause analysis, problem resolution, and service restoration
  • Manage and maintain CI/CD pipelines and deployment automation across environments
  • Improve system reliability, scalability, and performance through automation and proactive monitoring
  • Implement and manage observability solutions including logging, metrics, alerting, and dashboards
  • Utilize AI tools (CursorAI, Generative AI, automation copilots) for faster troubleshooting, documentation, code generation, and incident analysis
  • Collaborate with engineering, product, and security teams to ensure smooth releases and secure infrastructure
  • Reduce manual operational effort through AI-assisted automation and scripting
  • Drive DevOps best practices and continuous improvement initiatives
  • Fulltime
Read More
Arrow Right

Python Developer - Site Reliability Engineering (SRE)

We are seeking a skilled Python Developer with experience in the Site Reliabilit...
Location
Location
Canada , Montreal
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience with Python development
  • 6 years of experience working with Infrastructure as Code (Terraform and Ansible)
  • Experience with CI/CD pipelines, preferably GitHub Actions and Jenkins
  • Strong understanding of object-oriented design and development principles
  • Proficiency in Linux/Unix environments
  • Experience working with database technologies (preferably NoSQL), including data modeling, testing, and performance tuning
  • Ability to write reusable, optimized, maintainable, and well‑documented code following industry best practices
  • Experience implementing open-source monitoring and observability tools such as Prometheus, Grafana, Splunk or Open Telemetry
  • Strong problem‑solving skills and ability to take ownership of tasks and drive them independently to closure
  • Understanding of networking concepts (TCP/IP, DNS, Load Balancing)
Job Responsibility
Job Responsibility
  • Develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas
  • Develop, enhance, and integrate automation workflows for Public Cloud Service Providers (CSP), initially focused on Azure, and integrate with in-house tooling
  • Integrate automation workflows into CI/CD pipelines using GitHub Actions and Jenkins
  • Build proof-of-concept solutions in new areas of cloud and automation development
  • Provide technical support and debugging for application failures in both on-premises and cloud environments
  • Participate in all phases of the Software Development Life Cycle (SDLC), including analysis, design, coding, testing, and deployment
  • Evaluate, onboard, and implement emerging DevOps and automation tools to improve efficiency
  • Build and integrate observability into cloud platforms and solutions using open-source tools (Prometheus, Grafana, OpenTelemetry)
  • Identify, highlight, and reduce operational toil through automation, architectural improvements, and process optimization
  • Collaborate with global teams to understand requirements, develop high‑quality code, and deliver cloud-focused projects
Read More
Arrow Right

Technical Architect

Lead the design, modernization, and implementation of scalable, secure, and resi...
Location
Location
United States , Armonk
Salary
Salary:
247319.00 - 250000.00 USD / Year
nytimes.com Logo
The New York Times
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent in Computer Science, Information Technology, Engineering or related and five (5) years of experience as a Consultant Architect, Virtualization Architect, Senior Cloud Architect or related
  • Five (5) years of experience must include utilizing Hybrid Cloud, AWS, Azure, Red Hat Linux, Terraform, Ansible, Python, VMware Cloud Foundation (VCF) Stack
Job Responsibility
Job Responsibility
  • Lead the design, modernization, and implementation of scalable, secure, and resilient hybrid cloud and containerized infrastructure platforms
  • Define and lead the technical architecture strategy for hybrid cloud, container orchestration (Kubernetes, RedHat OpenShift, VMware Tanzu), and virtualized environments (VMware, Nutanix, RedHat)
  • Architect secure and scalable infrastructure across private, public, and hybrid cloud ecosystems
  • Evaluate, design, and implement solutions for computing, storage, networking, identity, and availability zones across global regions
  • Design and implement Kubernetes, RedHat OpenShift clusters across multi-cloud and on-prem environments, including CI/CD integration, policy enforcement, and workload orchestration
  • Define governance, observability, and security patterns for containerized workloads
  • Lead Infrastructure-as-Code (IaC) initiatives using Terraform, Ansible, GitOps, GitHub, PowerShell, and Python
  • Enable self-service infrastructure capabilities through automation frameworks and developer platforms
  • Partner with DevSecOps, SRE, Infrastructure Operations, Security, and Datacenter Operation teams to scope, define, size, and execute application onboarding, modernization, and consolidation initiatives
  • Mentor engineering teams and influence enterprise architecture (EA) roadmaps
  • Fulltime
Read More
Arrow Right
New

DevOps & Infrastructure Support Engineer

Your opportunity: At Schwab, you’re empowered to make an impact on your career. ...
Location
Location
United States , Austin
Salary
Salary:
57.21 - 67.79 USD / Hour
schwab.com Logo
Charles Schwab
Expiration Date
June 20, 2026
Flip Icon
Requirements
Requirements
  • 5+ years in production support, reliability engineering, or platform operations within an enterprise environment
  • Hands-on experience supporting business-critical systems with high uptime requirements
  • Experience with Java and/or .NET application stacks
  • WebSphere, IIS, and enterprise middleware
  • Strong Linux and Windows operational experience
  • Solid administration skills (RHEL, CentOS, or Ubuntu) with a strong grasp of file systems and permissions
  • Scripting: Strong hands-on experience writing Bash scripts
  • Development experience with PowerShell, Python, Bash, Java
  • Familiarity with SQL, NoSQL databases, Messaging platforms (RabbitMQ, IBM MQ)
  • Log Analysis: Proven ability to troubleshoot complex issues using system and application logs
Job Responsibility
Job Responsibility
  • Systems Administration: Maintain enterprise Linux environments, focusing specifically on file systems, permissions, and system configurations
  • Troubleshooting: Thoroughly analyze system and application logs to diagnose and resolve complex issues across multiple environments
  • Own production stability, availability, and performance for a portfolio of Java, .NET, batch jobs, and web-based applications running on Linux, Windows, on-prem, and PCF
  • Automation & CI/CD: Write and maintain Bash scripts to automate routine operational tasks, and support continuous integration and deployment (CI/CD) pipelines
  • Configuration Management: Utilize configuration management tools to streamline, automate, and standardize environments
  • Application Support: Support Java applications, WebSphere Application Server, and manage workloads in Cloud Foundry environments
  • Observability: Use Splunk and Grafana for log aggregation, creating dashboards, proactive monitoring, and alerting
  • Networking Integration: Configure and troubleshoot core networking components necessary for application delivery, including DNS, firewall rules, and load balancer routing
  • Automation & Toil Reduction: Design and build automation (scripts, tooling, frameworks) to eliminate repetitive operational tasks
  • Improve self-service diagnostics, alert hygiene, and recovery automation
What we offer
What we offer
  • 401(k) with company match and Employee stock purchase plan
  • Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
  • Paid parental leave and family building benefits
  • Tuition reimbursement
  • Health, dental, and vision insurance
  • bonus or incentive opportunities
  • Fulltime
Read More
Arrow Right
New

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...
Location
Location
United States , Santa Clara
Salary
Salary:
126000.00 - 203500.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
  • Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
  • Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
  • Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
  • Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
  • Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
  • Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
  • Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
  • Strong problem-solving skills and ability to work across teams
Job Responsibility
Job Responsibility
  • Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
  • Lead improvements across production systems, including performance, availability, and incident response
  • Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
  • Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
  • Partner with development teams to improve system reliability, observability, and cloud-native design patterns
  • Define and implement monitoring, alerting, and observability strategies across distributed systems
  • Lead incident response efforts, including root cause analysis and long-term remediation strategies
  • Identify and eliminate operational toil through automation and system improvements
  • Mentor engineers and contribute to raising the bar for production engineering practices
What we offer
What we offer
  • restricted stock units
  • bonus
  • Fulltime
Read More
Arrow Right
New

Sr Principal Site Reliability Engineer (Sovereign Cloud)

Palo Alto Networks runs a large infrastructure and is one of the largest GCP cus...
Location
Location
Bulgaria , Sofia
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering
  • 7+ years building high availability, scalable cloud-native applications on AWS and GCP
  • BS or MS in Computer Science, a related field, or equivalent professional experience required
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Passion for infrastructure and monitoring as code
  • Solid experience in container workloads and Kubernetes
  • Familiarity with PKI concepts, Networking concepts
  • In-depth knowledge of different security controls ( app-id, user-id, security profile, url category, content, ssl decryption, firewall MFA etc)
  • Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Golang or Python along with shell scripting to automate tasks
Job Responsibility
Job Responsibility
  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate in on-call rotations to support critical business and production systems
  • Lead root cause analysis of critical business and production issues
  • Fulltime
Read More
Arrow Right
New

Senior Ansible Automation & Platform Engineer

The Senior Ansible Automation & Platform Engineer is a strategic member of the o...
Location
Location
United States , Austin; Mountain View; Warren
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–12+ years in Architecture, DevOps, SRE, Platform Engineering, or Infrastructure Engineering
  • Expert-level proficiency with Ansible (playbooks, roles, collections, Jinja2, modules)
  • Hands-on experience designing and operating Ansible Automation Platform (AAP)
  • Strong experience with Terraform, Chef, or other IaC tools
  • Deep Linux engineering background and configuration management expertise
  • Expert in integrating automation with ServiceNow (CMDB, ITSM, workflows)
  • Exceptional scripting skills (Python, Bash, PowerShell)
  • Experience with AWS/Azure/GCP automation
  • Experience with Kubernetes, containerization, and orchestration
  • Experience with CI/CD pipelines (GitHub Actions, GitLab, Jenkins, Azure DevOps)
Job Responsibility
Job Responsibility
  • Architect, design, and operate the Ansible Automation Platform (AAP) including controller, execution environments, mesh architecture, and collections strategy
  • Define and maintain the Ansible Platform roadmap, including feature evolution, lifecycle management, scalability planning, and enterprise adoption milestones
  • Establish platform governance: coding standards, role/playbook patterns, collections, testing frameworks, and security guardrails
  • Build and maintain Execution Environments (EEs) optimized for performance, security, and dependency management
  • Lead platform upgrades, migrations, and cross-environment standardization
  • Design enterprise-grade Ansible automation frameworks with reusable roles, collections, and modular playbooks
  • Build automation for provisioning, configuration management, patching, compliance, and cloud infrastructure
  • Integrate Ansible with Terraform, CI/CD pipelines, GitOps workflows, and event-driven automation systems
  • Implement self-service automation capabilities for developers, operations, and business teams
  • Integrate Agentic AI systems to enhance automation workflows, including: AI-driven playbook generation and validation, Automated remediation recommendations, Intelligent change-impact analysis, AI-assisted troubleshooting and root-cause analysis
What we offer
What we offer
  • Relocation benefits (may be eligible)
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer, Wikimedia Enterprise

The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to jo...
Location
Location
United States
Salary
Salary:
116633.00 - 181243.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Automation & Configuration Management: Experience with Infrastructure as Code and automation tools (e.g., Terraform, Ansible) and proficiency in at least one programming language (e.g., Python, Go, or similar)
  • Cloud Infrastructure: Experience designing, operating, and optimizing cloud-based systems across platforms such as AWS, Azure, or GCP, including scalability, reliability, and cost efficiency
  • CI/CD & Deployment Practices: Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab or similar, ArgoCD), with familiarity in progressive delivery approaches such as canary and blue-green deployments
  • Incident Management & Reliability Operations: Experience with incident response, on-call practices, and leading postmortems, with a focus on continuous improvement and operational excellence
  • SRE Principles & Observability: Strong understanding of SRE best practices, including SLOs, SLIs, and error budgets, along with experience in observability (metrics, logging, and distributed tracing e.g., Prometheus, OpenTelemetry)
  • Collaboration & Communication: Ability to work effectively in a distributed, cross-functional environment, with strong documentation and communication skills
  • Proven experience operating highly available, large-scale distributed systems, with a deep understanding of reliability, scalability, and failure modes
  • Ownership mindset: Takes end-to-end responsibility for system reliability, proactively identifying and addressing risks before they impact users
  • Bias for automation: Continuously seeks to reduce operational toil through automation and scalable solutions
  • Continuous improvement mindset: Actively learns from incidents and drives improvements through blameless postmortems and iterative enhancements
Job Responsibility
Job Responsibility
  • Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets to ensure reliability targets are met
  • Build and enhance observability systems (metrics, logs, and distributed tracing) to enable proactive detection and faster troubleshooting
  • Drive reliability engineering practices, including capacity planning, load testing, and resilience validation (e.g., chaos testing)
  • Improve developer experience (DevEx) by enabling self-service infrastructure and streamlining deployment workflows
  • Partner with engineering team members to embed reliability best practices early in the development lifecycle
  • Design, implement, and optimize CI/CD and GitOps workflows using tools such as GitLab (or similar) and ArgoCD(or similar), enabling automated, reliable deployments with support for progressive delivery strategies like canary and blue-green releases
  • Implement secure-by-default infrastructure and enforce best practices (e.g., IAM, secrets management, encryption)
  • Continuously optimize infrastructure cost and efficiency using FinOps principles while maintaining performance and availability
  • Establish and track operational metrics such as MTTR, MTTD, and incident frequency to drive continuous improvement
  • Reduce operational toil by identifying repetitive work and implementing automation-first solutions
  • Fulltime
Read More
Arrow Right