CrawlJobs Logo

Manager of Runbook Automation Support

https://www.pagerduty.com Logo

PagerDuty

Location Icon

Location:
Chile , Santiago

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

PagerDuty is looking for a Manager of Runbook Automation Support, reporting directly to the Director of Customer Support Engineering and overseeing our team of Runbook Automation Technical Support Engineers. Our Technical Support Engineers help our customers with technical issues about our Runbook Automation SaaS and Self-Hosted products. Your responsibilities will include managing a distributed team across our offices while working closely with leadership and your peers to maintain a high standard of customer support across our global offices.

Job Responsibility:

  • Directly manage our team of 10 Support Engineers in Santiago and Lisbon, and provide support to the Support Engineers in Melbourne
  • Work with your peer managers and the Director of Support Engineering to maintain consistency across all teams with process and quality of Support
  • Conduct regular 1:1 meetings with all of your direct reports, providing assistance as needed and constructive feedback to help your team be successful
  • Conduct regular performance reviews of all of your direct reports
  • Onboard and train new hires, using your team as resources where necessary
  • Manage Runbook Automation support metrics and workload, including shift schedules to ensure sufficient resources and a high standard of quality
  • Manage your team's contributions to our internal support documentation
  • Recruit and hire top talent as the team and business scale

Requirements:

  • Excellent written and verbal communication skills
  • Hands-on experience in a technical support capacity, supporting customers using on-premise and SaaS solutions
  • Leadership/management experience within a support or similar environment
  • Hands on experience managing customer issues through a ticketing solution (such as Salesforce)
  • Prior experience taking calls directly from customers in a technical support capacity
  • Be willing to occasionally alter your schedule to be available for your team as needed
  • Experience writing code in a popular scripting language such as Ruby, Python, Perl, or others
  • Experience with RDBMS such as MySQL and PostgreSQL
  • Know your way around Unix systems and tools
  • The ability to be highly organized in instructing and advising others while also staying on top of your own work
  • An excellent work ethic and attention to detail

Nice to have:

  • Experience working with tools that integrate with PagerDuty, such as Nagios, Zenoss, Zabbix and ServiceNow
  • Experience working closely with team members in a different time zone
  • Experiencing collaborating with engineering teams on bug tickets and fixes
What we offer:
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Paid volunteer time off: 20 hours per year
  • Company-wide hack weeks
  • Mental wellness programs

Additional Information:

Job Posted:
November 12, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Manager of Runbook Automation Support

Practice Support Manager

As a Support Manager, you will lead a technical support team to ensure the seaml...
Location
Location
United States
Salary
Salary:
Not provided
aledade.com Logo
Aledade, Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience as a Technical Support Manager in a healthcare or technology setting
  • Strong leadership and communication skills, with an ability to collaborate and negotiate effectively with cross-functional partners
  • Experience leading a team and managing team performance
  • Demonstrated experience in a technical operations role with a deep understanding of Incident Management frameworks (e.g., ITIL)
  • Proven ability to define, implement, and manage SLAs
  • Hands-on experience with enterprise monitoring and alerting systems (e.g., Datadog, PagerDuty)
  • Experience creating and maintaining operational runbooks and documentation
  • Competency in healthcare data standards (e.g., HL7 SIU/ADT, HIPAA X12 837, or CCDA)
  • Proficiency in constructing new and complex SQL queries for troubleshooting and analysis
  • Proven experience using Python, AWS CLI, and Bash
Job Responsibility
Job Responsibility
  • Lead and Develop Support Team: Manage a team of Support Analysts and Specialists, overseeing their daily activities and promoting their professional development
  • Drive Performance and Accountability: Monitor Key Performance Indicators (KPIs) for the team, offering regular feedback and coaching to ensure continuous improvement and accountability
  • Foster Talent and Collaboration: Onboard, train, and mentor new and existing team members to cultivate a high-performing, cohesive, and collaborative team environment
  • Optimize Team Operations: Develop and implement tools to effectively monitor and manage team capacity and performance, ensuring optimal resource allocation and productivity
  • Monitor SLAs: Oversee Service Level Agreements (SLAs) for troubleshooting issues, ensuring alignment with daily operational practices and standards
  • Own Incident Management Lifecycle: Manage the complete life cycle of incident management, from initial alert through resolution and post-mortem analysis, guaranteeing timely remediation and transparent communication with all stakeholders, including practice teams
  • Develop and Operationalize Runbooks: Create and maintain detailed runbooks for recurring issues and establish standard operating procedures to promote consistent and efficient incident response
  • Optimize Monitoring Systems: Utilize and refine monitoring tools such as Datadog and PagerDuty to proactively detect and mitigate potential issues before they impact our operations
  • Lead High-Priority Incident Resolution: Facilitate high-priority incident resolution calls, guiding cross-functional teams in swiftly addressing and resolving issues
  • Serve as a Point of Escalation: Act as the escalation point for the support team, addressing complex technical issues that require in-depth knowledge of our applications, Tableau, and data interfaces
What we offer
What we offer
  • Flexible work schedules and the ability to work remotely are available for many roles
  • Health, dental and vision insurance paid up to 80% for employees, dependents and domestic partners
  • Robust time-off plan (21 days of PTO in your first year)
  • Two paid volunteer days and 11 paid holidays
  • 12 weeks paid parental leave for all new parents
  • Six weeks paid sabbatical after six years of service
  • Educational Assistant Program and Clinical Employee Reimbursement Program
  • 401(k) with up to 4% match
  • Stock options
  • Fulltime
Read More
Arrow Right

Support Operations Technical Program Manager

We are seeking a highly skilled, dynamic, and motivated Technical Program Manage...
Location
Location
United States , RTP
Salary
Salary:
Not provided
vastdata.com Logo
VAST Data
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field
  • Minimum of 5 years of experience in technical program management or project management, preferably in a customer support or operations environment
  • Experience with support tools and technologies is highly desirable
  • Technical Acumen: Solid technical background with experience in data storage, software development, or IT operations is advantageous
  • Proven track record of successfully managing complex technical programs and projects
  • Demonstrated ability to structure and assess problems, execute high-level directives, and manage program impacts
  • Proven track record in program management, execution, change management, and cross-functional stakeholder management
  • Strong executive presence, communication, and facilitation skills, with experience engaging and partnering at all levels, including executives
  • Strength in researching, understanding, distilling, and communicating complex business issues, ideas, and analyzing business impact
  • Familiarity with/drive the Agile development and release methodologies, especially for the automation of business processes and improvements/redesigns
Job Responsibility
Job Responsibility
  • Lead and manage technical support programs, helping to organize and track work via sprints or otherwise holding CS team members accountable for incremental progress
  • Manage prioritized projects outlined in the Customer Success roadmap
  • Collaborate with cross-functional teams ( including R&D, product management, and sales) to drive successful program adoption, KPIs and iterative improvements
  • Develop and maintain key performance indicators (KPIs) to measure the effectiveness of the programs you own
  • Provide regular reports and insights to senior management and the customer support team to communicate progress and blockers
  • Identify opportunities for process improvements within customer support and own follow-up where needed
  • Implement best practices to optimize efficiency, responsiveness, and overall customer satisfaction
  • Streamline and simplify manual and repeatable work for Customer Success team members and business partners who engage with us
  • Keep CS runbooks and SOPs up-to-date
  • Communicate between customer support and engineering teams
  • Fulltime
Read More
Arrow Right

Customer Support Technical Operations Manager

We’re looking for a Technical Operations Manager to build and lead the team behi...
Location
Location
United States; Canada , San Francisco; New York; Portland; Remote
Salary
Salary:
168900.00 - 211100.00 USD / Year
mercury.com Logo
Mercury
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in a technical operations, systems, or tooling engineering role (or adjacent)
  • 5+ years of hands-on people management (ideally leading technical roles/teams)
  • Deep expertise with Zendesk configuration and administration
  • Strong technical literacy (APIs, integrations, cloud services, data pipelines, internal tooling)
  • Demonstrated experience managing sprints, backlogs, and coordinating cross-team delivery
  • Experience with incident management, root cause analysis, and reliability engineering concepts
  • Excellent communication skills, able to translate technical complexity to non-technical audiences
  • Vendor / third-party management experience
  • Strong problem-solving mindset, bias for action, and proven track record of scaling systems and teams
  • Brings fresh, creative thinking to complex operational challenges
Job Responsibility
Job Responsibility
  • Lead and grow the Technical Ops team
  • own the domain from intraday management to long term planning
  • Define and maintain actionable metrics, dashboards, and SLAs to drive performance and accountability
  • Architect, configure, and manage the lifecycle of core systems (e.g. Zendesk, internal tools), including change control, QA, and rollback strategies
  • Stay ahead of industry trends to evolve tooling and operational processes - continuously seek improvements in efficiency and scale
  • Detect and remediate system weaknesses and single points of failure
  • lead incident response, post-mortems, and escalation practices
  • Oversee sprint planning and execution for Ops initiatives & tasks
  • manage the backlog, dependencies, prioritization, and cross-functional coordination
  • Ensure documentation, runbooks, and internal processes are up to date
What we offer
What we offer
  • base salary
  • equity (stock options)
  • benefits
  • Fulltime
Read More
Arrow Right

Application Support Engineer

This is a fantastic opportunity to join our Managed Services Practice, deliverin...
Location
Location
United Kingdom , Bristol; London; Manchester; Swansea
Salary
Salary:
36000.00 - 50800.00 GBP / Year
madetech.com Logo
Made Tech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Eligibility for SC (security check) clearance (requires 5 years' UK residency and 5 years' employment history)
  • Experience of common IT Service Management (ITSM) tooling (e.g., ServiceNow, ZenDesk, PagerDuty, JIRA Service Desk)
  • Experience of working with agile methodologies and agile ways of working
  • Experience of incident management
  • Experience of cloud technologies (e.g., AWS, Azure, GCP)
  • Experienced with at least one programming language
  • Demonstrable knowledge of SOLID principles, Object-Oriented programming and TDD
  • Familiarity with IaC (Infrastructure as Code) such as Terraform
Job Responsibility
Job Responsibility
  • Taking part in proactive knowledge transfer activities with incumbent suppliers
  • Code review and quality analysis including the review of complete services and implementation of code scanning tooling
  • Reviewing and improving technical documentation (architecture overviews, deployment process definition, incident resolution runbooks)
  • Ensuring all requests for support are dealt with according to set standards and procedures, and suggesting process improvements
  • Participating in incident investigation/root cause analysis and delivering technical solutions within agreed SLAs
  • Implementing application enhancements to improve business performance
  • Providing out of hours support via on-call rota
  • Automating and improving the monitoring of application performance including setting up cloud and application level monitoring tooling
  • Updating documentation (knowledge base articles, playbooks, service definitions)
  • Applying test-driven development, ensuring appropriate test coverage
What we offer
What we offer
  • 30 days Holiday
  • Flexible Parental Leave
  • Remote Working (part time remote working for all staff)
  • Paid counselling (as well as financial and legal advice)
  • Flexible benefit platform (includes Smart Tech scheme, Cycle to work scheme, individual benefits allowance for Health care cash plan or Pension plan)
  • Optional social and wellbeing calendar of events
  • Additional compensation payment for Out of Hours on-call rota
  • Fulltime
Read More
Arrow Right

Systems Administrator Specialist

The Systems Specialist supports the IT team by managing and maintaining cloud in...
Location
Location
United States , Bloomington
Salary
Salary:
Not provided
turnermining.com Logo
Turner Mining Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in IT operations or systems administration
  • Hands-on experience with Azure cloud infrastructure
  • AWS or GCP familiarity a plus
  • Strong scripting skills in PowerShell, Python, and/or JavaScript
  • Experience managing identity platforms (Azure AD/Entra), SSO, SAML, LDAP, and user provisioning
  • Proficiency managing Windows and Apple devices
  • Intune or MDM experience preferred
  • Strong documentation and communication skills
  • Comfortable working in a small, collaborative team environment
Job Responsibility
Job Responsibility
  • Support user provisioning, license management, and access reviews in Entra ID
  • Maintain SSO configurations and permission mappings for SaaS platforms (Salesforce, Slack, ADP, etc.)
  • Assist with periodic security and compliance audits
  • Develop and maintain automation scripts and tools using PowerShell, Python, and/or JavaScript
  • Deploy and manage automation workflows (Azure Automation, Power Automate, etc.)
  • Document automation processes, scripts, and runbooks
  • Support cloud infrastructure, primarily in Azure
  • Monitor system performance, assist with patching, backups, and incident troubleshooting
  • Contribute to implementing secure, scalable infrastructure solutions
  • Maintain connectors and integrations between core business systems (Salesforce, Sage Intacct, ADP Workforce Now, Slack, etc.)
  • Fulltime
Read More
Arrow Right
New

Ai Azure Enterprise Automation Engineer

Baptist Health Information Services is looking for an Enterprise Automation Engi...
Location
Location
United States , Jacksonville
Salary
Salary:
Not provided
baptistjax.com Logo
Baptist Health (Florida)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree or Equivalent Experience
  • Over 5 years of Information Technology Experience Required
  • Experience designing or implementing AI-driven automation agents that support IT operations, observability, or cloud management by autonomously identifying and resolving issues
  • Familiarity with Large Language Model (LLM) integration (e.g., OpenAI, Claude, Gemini) for code generation, decision support, or infrastructure recommendations
  • Exposure to multi-agent orchestration frameworks such as LangChain, AutoGen, or Microsoft Autonomous Agents for coordinating complex, layered workflows
  • Integration of AI agents into DevOps workflows or incident response tooling
  • Understanding of prompt engineering, retrieval-augmented generation (RAG), or vector database utilization (e.g., Azure Cognitive Search, Weaviate) in the context of enterprise systems
  • Contributions to open-source automation or AI platforms that demonstrate thought leadership or technical innovation
  • Familiarity with healthcare IT standards and constraints (e.g., HIPAA compliance, identity management in clinical workflows) as they apply to automation and AI integration
  • Azure VMs, Virtual Networks, Storage Accounts, Azure AD
Job Responsibility
Job Responsibility
  • Expert level engineering skills across a broad range of technology stacks and programming languages
  • As an SRE at Baptist Health you will be a member of a team dedicated to improving our resiliency, reliability, observability, and scalability through different methodologies and tools
  • You will have the drive to improve and define how we automate, observe, scale, and operate enterprise services
  • Design and build infrastructure & systems that provide high levels of scalability, reliability, performance, and security across Azure and on-prem environments
  • Automate manual processes by designing and implementing end-to-end automation pipelines that reduce operational friction, eliminate repetitive tasks, and enforce consistency through Infrastructure-as-Code and CI/CD practices
  • Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for all core services
  • Improve observability of all enterprise services with actionable monitoring, logging, and alerting using tools like Azure Monitor, Application Insights, and SolarWinds
  • Develop playbooks and runbooks to guide operations teams and support staff in managing infrastructure efficiently and safely
  • Partner with Digital Cloud Development Operations, Application Development, and Product teams to ensure new systems are designed for reliability and maintainability
  • Work closely with vendors and cloud providers (Azure, AWS, GCP) to optimize infrastructure and troubleshoot escalated issues
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right

AWS Cloud Infra Module Lead

Candidate should be able to work with Global operations & Core Practices teams t...
Location
Location
India , Noida
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in container orchestration and server automation tools such as Kubernetes, Docker, AWS EKS, Ansible & Terraform
  • Experience in deploying and managing highly scalable fault resilient systems
  • Infrastructure as code (IAC) using Terraform
  • CI/CD pipeline automation using Jenkins/GitLab CI/Travis
  • Scripting - Automation using Shell, Python, Groovy scripts
  • Strong Knowledge and experience of AWS services: Compute Services (EC2 Creation), AWS KeyPair creation, Route 53, Storage / IAM, VPN setup, ELB Creation, CloudWatch, CloudTrail, Cloud Formation
  • In depth knowledge of Monitoring tools like ICINGA, Prometheus
  • Able to design, maintain & support DR & Failover architectures, OS/APP Patching & Upgrades
  • Incident management experience using runbooks & Troubleshooting
Job Responsibility
Job Responsibility
  • Operate, industrialize & strengthen operational practices
  • Lead tasks independently & mentor junior team members
  • Design, maintain & support DR & Failover architectures, OS/APP Patching & Upgrades
  • Manage Incident troubleshooting using runbooks
  • Work with Global operations & Core Practices teams
What we offer
What we offer
  • Inclusive and respectful work environment
  • Open to people with disabilities
  • 24*7 rotational shift including night shift as per business needs
  • Availability outside business hours for on-call/standby weekday/weekend
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.