CrawlJobs Logo

Manager of Runbook Automation Support

https://www.pagerduty.com Logo

PagerDuty

Location Icon

Location:
Chile , Santiago

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

PagerDuty is looking for a Manager of Runbook Automation Support, reporting directly to the Director of Customer Support Engineering and overseeing our team of Runbook Automation Technical Support Engineers. Our Technical Support Engineers help our customers with technical issues about our Runbook Automation SaaS and Self-Hosted products. Your responsibilities will include managing a distributed team across our offices while working closely with leadership and your peers to maintain a high standard of customer support across our global offices.

Job Responsibility:

  • Directly manage our team of 10 Support Engineers in Santiago and Lisbon, and provide support to the Support Engineers in Melbourne
  • Work with your peer managers and the Director of Support Engineering to maintain consistency across all teams with process and quality of Support
  • Conduct regular 1:1 meetings with all of your direct reports, providing assistance as needed and constructive feedback to help your team be successful
  • Conduct regular performance reviews of all of your direct reports
  • Onboard and train new hires, using your team as resources where necessary
  • Manage Runbook Automation support metrics and workload, including shift schedules to ensure sufficient resources and a high standard of quality
  • Manage your team's contributions to our internal support documentation
  • Recruit and hire top talent as the team and business scale

Requirements:

  • Excellent written and verbal communication skills
  • Hands-on experience in a technical support capacity, supporting customers using on-premise and SaaS solutions
  • Leadership/management experience within a support or similar environment
  • Hands on experience managing customer issues through a ticketing solution (such as Salesforce)
  • Prior experience taking calls directly from customers in a technical support capacity
  • Be willing to occasionally alter your schedule to be available for your team as needed
  • Experience writing code in a popular scripting language such as Ruby, Python, Perl, or others
  • Experience with RDBMS such as MySQL and PostgreSQL
  • Know your way around Unix systems and tools
  • The ability to be highly organized in instructing and advising others while also staying on top of your own work
  • An excellent work ethic and attention to detail

Nice to have:

  • Experience working with tools that integrate with PagerDuty, such as Nagios, Zenoss, Zabbix and ServiceNow
  • Experience working closely with team members in a different time zone
  • Experiencing collaborating with engineering teams on bug tickets and fixes
What we offer:
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Paid volunteer time off: 20 hours per year
  • Company-wide hack weeks
  • Mental wellness programs

Additional Information:

Job Posted:
November 12, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Manager of Runbook Automation Support

Practice Support Manager

As a Support Manager, you will lead a technical support team to ensure the seaml...
Location
Location
United States
Salary
Salary:
Not provided
aledade.com Logo
Aledade, Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience as a Technical Support Manager in a healthcare or technology setting
  • Strong leadership and communication skills, with an ability to collaborate and negotiate effectively with cross-functional partners
  • Experience leading a team and managing team performance
  • Demonstrated experience in a technical operations role with a deep understanding of Incident Management frameworks (e.g., ITIL)
  • Proven ability to define, implement, and manage SLAs
  • Hands-on experience with enterprise monitoring and alerting systems (e.g., Datadog, PagerDuty)
  • Experience creating and maintaining operational runbooks and documentation
  • Competency in healthcare data standards (e.g., HL7 SIU/ADT, HIPAA X12 837, or CCDA)
  • Proficiency in constructing new and complex SQL queries for troubleshooting and analysis
  • Proven experience using Python, AWS CLI, and Bash
Job Responsibility
Job Responsibility
  • Lead and Develop Support Team: Manage a team of Support Analysts and Specialists, overseeing their daily activities and promoting their professional development
  • Drive Performance and Accountability: Monitor Key Performance Indicators (KPIs) for the team, offering regular feedback and coaching to ensure continuous improvement and accountability
  • Foster Talent and Collaboration: Onboard, train, and mentor new and existing team members to cultivate a high-performing, cohesive, and collaborative team environment
  • Optimize Team Operations: Develop and implement tools to effectively monitor and manage team capacity and performance, ensuring optimal resource allocation and productivity
  • Monitor SLAs: Oversee Service Level Agreements (SLAs) for troubleshooting issues, ensuring alignment with daily operational practices and standards
  • Own Incident Management Lifecycle: Manage the complete life cycle of incident management, from initial alert through resolution and post-mortem analysis, guaranteeing timely remediation and transparent communication with all stakeholders, including practice teams
  • Develop and Operationalize Runbooks: Create and maintain detailed runbooks for recurring issues and establish standard operating procedures to promote consistent and efficient incident response
  • Optimize Monitoring Systems: Utilize and refine monitoring tools such as Datadog and PagerDuty to proactively detect and mitigate potential issues before they impact our operations
  • Lead High-Priority Incident Resolution: Facilitate high-priority incident resolution calls, guiding cross-functional teams in swiftly addressing and resolving issues
  • Serve as a Point of Escalation: Act as the escalation point for the support team, addressing complex technical issues that require in-depth knowledge of our applications, Tableau, and data interfaces
What we offer
What we offer
  • Flexible work schedules and the ability to work remotely are available for many roles
  • Health, dental and vision insurance paid up to 80% for employees, dependents and domestic partners
  • Robust time-off plan (21 days of PTO in your first year)
  • Two paid volunteer days and 11 paid holidays
  • 12 weeks paid parental leave for all new parents
  • Six weeks paid sabbatical after six years of service
  • Educational Assistant Program and Clinical Employee Reimbursement Program
  • 401(k) with up to 4% match
  • Stock options
  • Fulltime
Read More
Arrow Right

Support Operations Technical Program Manager

We are seeking a highly skilled, dynamic, and motivated Technical Program Manage...
Location
Location
United States , RTP
Salary
Salary:
Not provided
vastdata.com Logo
VAST Data
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field
  • Minimum of 5 years of experience in technical program management or project management, preferably in a customer support or operations environment
  • Experience with support tools and technologies is highly desirable
  • Technical Acumen: Solid technical background with experience in data storage, software development, or IT operations is advantageous
  • Proven track record of successfully managing complex technical programs and projects
  • Demonstrated ability to structure and assess problems, execute high-level directives, and manage program impacts
  • Proven track record in program management, execution, change management, and cross-functional stakeholder management
  • Strong executive presence, communication, and facilitation skills, with experience engaging and partnering at all levels, including executives
  • Strength in researching, understanding, distilling, and communicating complex business issues, ideas, and analyzing business impact
  • Familiarity with/drive the Agile development and release methodologies, especially for the automation of business processes and improvements/redesigns
Job Responsibility
Job Responsibility
  • Lead and manage technical support programs, helping to organize and track work via sprints or otherwise holding CS team members accountable for incremental progress
  • Manage prioritized projects outlined in the Customer Success roadmap
  • Collaborate with cross-functional teams ( including R&D, product management, and sales) to drive successful program adoption, KPIs and iterative improvements
  • Develop and maintain key performance indicators (KPIs) to measure the effectiveness of the programs you own
  • Provide regular reports and insights to senior management and the customer support team to communicate progress and blockers
  • Identify opportunities for process improvements within customer support and own follow-up where needed
  • Implement best practices to optimize efficiency, responsiveness, and overall customer satisfaction
  • Streamline and simplify manual and repeatable work for Customer Success team members and business partners who engage with us
  • Keep CS runbooks and SOPs up-to-date
  • Communicate between customer support and engineering teams
  • Fulltime
Read More
Arrow Right

Application Support Engineer

This is a fantastic opportunity to join our Managed Services Practice, deliverin...
Location
Location
United Kingdom , Bristol; London; Manchester; Swansea
Salary
Salary:
36000.00 - 50800.00 GBP / Year
madetech.com Logo
Made Tech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Eligibility for SC (security check) clearance (requires 5 years' UK residency and 5 years' employment history)
  • Experience of common IT Service Management (ITSM) tooling (e.g., ServiceNow, ZenDesk, PagerDuty, JIRA Service Desk)
  • Experience of working with agile methodologies and agile ways of working
  • Experience of incident management
  • Experience of cloud technologies (e.g., AWS, Azure, GCP)
  • Experienced with at least one programming language
  • Demonstrable knowledge of SOLID principles, Object-Oriented programming and TDD
  • Familiarity with IaC (Infrastructure as Code) such as Terraform
Job Responsibility
Job Responsibility
  • Taking part in proactive knowledge transfer activities with incumbent suppliers
  • Code review and quality analysis including the review of complete services and implementation of code scanning tooling
  • Reviewing and improving technical documentation (architecture overviews, deployment process definition, incident resolution runbooks)
  • Ensuring all requests for support are dealt with according to set standards and procedures, and suggesting process improvements
  • Participating in incident investigation/root cause analysis and delivering technical solutions within agreed SLAs
  • Implementing application enhancements to improve business performance
  • Providing out of hours support via on-call rota
  • Automating and improving the monitoring of application performance including setting up cloud and application level monitoring tooling
  • Updating documentation (knowledge base articles, playbooks, service definitions)
  • Applying test-driven development, ensuring appropriate test coverage
What we offer
What we offer
  • 30 days Holiday
  • Flexible Parental Leave
  • Remote Working (part time remote working for all staff)
  • Paid counselling (as well as financial and legal advice)
  • Flexible benefit platform (includes Smart Tech scheme, Cycle to work scheme, individual benefits allowance for Health care cash plan or Pension plan)
  • Optional social and wellbeing calendar of events
  • Additional compensation payment for Out of Hours on-call rota
  • Fulltime
Read More
Arrow Right

Systems Administrator Specialist

The Systems Specialist supports the IT team by managing and maintaining cloud in...
Location
Location
United States , Bloomington
Salary
Salary:
Not provided
turnermining.com Logo
Turner Mining Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in IT operations or systems administration
  • Hands-on experience with Azure cloud infrastructure
  • AWS or GCP familiarity a plus
  • Strong scripting skills in PowerShell, Python, and/or JavaScript
  • Experience managing identity platforms (Azure AD/Entra), SSO, SAML, LDAP, and user provisioning
  • Proficiency managing Windows and Apple devices
  • Intune or MDM experience preferred
  • Strong documentation and communication skills
  • Comfortable working in a small, collaborative team environment
Job Responsibility
Job Responsibility
  • Support user provisioning, license management, and access reviews in Entra ID
  • Maintain SSO configurations and permission mappings for SaaS platforms (Salesforce, Slack, ADP, etc.)
  • Assist with periodic security and compliance audits
  • Develop and maintain automation scripts and tools using PowerShell, Python, and/or JavaScript
  • Deploy and manage automation workflows (Azure Automation, Power Automate, etc.)
  • Document automation processes, scripts, and runbooks
  • Support cloud infrastructure, primarily in Azure
  • Monitor system performance, assist with patching, backups, and incident troubleshooting
  • Contribute to implementing secure, scalable infrastructure solutions
  • Maintain connectors and integrations between core business systems (Salesforce, Sage Intacct, ADP Workforce Now, Slack, etc.)
  • Fulltime
Read More
Arrow Right

Ai Azure Enterprise Automation Engineer

Baptist Health Information Services is looking for an Enterprise Automation Engi...
Location
Location
United States , Jacksonville
Salary
Salary:
Not provided
baptistjax.com Logo
Baptist Health (Florida)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree or Equivalent Experience
  • Over 5 years of Information Technology Experience Required
  • Experience designing or implementing AI-driven automation agents that support IT operations, observability, or cloud management by autonomously identifying and resolving issues
  • Familiarity with Large Language Model (LLM) integration (e.g., OpenAI, Claude, Gemini) for code generation, decision support, or infrastructure recommendations
  • Exposure to multi-agent orchestration frameworks such as LangChain, AutoGen, or Microsoft Autonomous Agents for coordinating complex, layered workflows
  • Integration of AI agents into DevOps workflows or incident response tooling
  • Understanding of prompt engineering, retrieval-augmented generation (RAG), or vector database utilization (e.g., Azure Cognitive Search, Weaviate) in the context of enterprise systems
  • Contributions to open-source automation or AI platforms that demonstrate thought leadership or technical innovation
  • Familiarity with healthcare IT standards and constraints (e.g., HIPAA compliance, identity management in clinical workflows) as they apply to automation and AI integration
  • Azure VMs, Virtual Networks, Storage Accounts, Azure AD
Job Responsibility
Job Responsibility
  • Expert level engineering skills across a broad range of technology stacks and programming languages
  • As an SRE at Baptist Health you will be a member of a team dedicated to improving our resiliency, reliability, observability, and scalability through different methodologies and tools
  • You will have the drive to improve and define how we automate, observe, scale, and operate enterprise services
  • Design and build infrastructure & systems that provide high levels of scalability, reliability, performance, and security across Azure and on-prem environments
  • Automate manual processes by designing and implementing end-to-end automation pipelines that reduce operational friction, eliminate repetitive tasks, and enforce consistency through Infrastructure-as-Code and CI/CD practices
  • Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for all core services
  • Improve observability of all enterprise services with actionable monitoring, logging, and alerting using tools like Azure Monitor, Application Insights, and SolarWinds
  • Develop playbooks and runbooks to guide operations teams and support staff in managing infrastructure efficiently and safely
  • Partner with Digital Cloud Development Operations, Application Development, and Product teams to ensure new systems are designed for reliability and maintainability
  • Work closely with vendors and cloud providers (Azure, AWS, GCP) to optimize infrastructure and troubleshoot escalated issues
  • Fulltime
Read More
Arrow Right

Public Cloud Support Engineer

Join us as a “Public Cloud Support Engineer " at Barclays, where you'll spearhea...
Location
Location
India , Pune
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must have bachelor’s degree
  • Must have AWS Or Azure
  • Secondary Mandate skill is GCP
  • AWS: Advanced administration of EC2 (Linux/Windows), ELB, RDS (backup/restore), S3 lifecycle policies, VPC peering, Route53 DNS, CloudFormation IaC, CloudWatch custom metrics, IAM policies, ECS/EKS orchestration
  • Azure: Expertise in VM scale sets, Storage Accounts (Blob/File/Table), AZLAN networking, AKS cluster management, Azure CLI scripting, Entra ID integration, Azure Automation runbooks, monitoring/log analytics, and RBAC
  • GCP: Proficiency in Compute Engine instance groups, Cloud Storage bucket lifecycle management, VPC subnetting, IAM roles/service accounts, GKE cluster deployment, Stackdriver custom logging, and Cloud Functions automation
  • Any 2 Certification: AWS: AWS Solution Architect (Associate or Professional) / Azure: Azure Administrator (AZ-104) or equivalent/ GCP: GCP Associate Cloud Engineer (ACE) or equivalent
Job Responsibility
Job Responsibility
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients
  • Develop the support model and service offering to improve the service to customers and stakeholders
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
  • Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right

IT Workspace & Collaboration Technology Lead

Alter Domus is seeking an IT Workspace & Collaboration Technology Lead to join o...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
alterdomus.com Logo
Alter Domus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10+ years of relevant IT experience with 5+ years in management, with strong exposure to modern workplace technologies
  • Excellent verbal and written communication skills
  • Detail-oriented, organized, and comfortable operating in a fast-paced, global environment
  • Strong sense of ownership, accountability, and customer service excellence
  • Collaborative team player with strong interpersonal skills
  • Self-motivated and able to thrive in a project- and outcome-driven environment
  • Advanced engineering expertise in Microsoft 365 collaboration services (Teams, SharePoint Online, OneDrive, Exchange Online)
  • Strong experience implementing and integrating collaboration platforms, including project and work management tools, ITSM and service request platforms, knowledge management systems
  • Proven experience leading technical implementations of large-scale collaboration tooling programs
  • Identity & Access Management (Entra ID, Conditional Access, MFA, SSO)
Job Responsibility
Job Responsibility
  • Platform Engineering & Architecture: Drive and deliver the design, implementation, and evolution of the technical architecture for enterprise collaboration and work management platforms
  • Act as a hands-on technical authority, leading complex implementations, troubleshooting, and architectural decisions
  • Set architectural direction and guardrails to engineer scalable, secure, and supportable configurations aligned with enterprise standards
  • Large-Scale Implementations & Migrations: Technically lead enterprise-wide programs such as platform rollouts, tenant or system migrations, tooling consolidation, and decommissioning
  • Define and execute technical rollout strategies, migration approaches, cutover plans, and validation processes
  • Ensure operational readiness, performance, and supportability at scale
  • Automation, Integration & Workflows: Drive the automation and integration strategy by building and maintaining automation and orchestration using PowerShell, Microsoft Graph API, and platform-native tooling
  • Develop and implement integrations across collaboration, project management, ITSM, and knowledge platforms
  • Develop reusable provisioning, governance, and compliance automation
  • Digital Employee Experience & Reliability: Design and implement monitoring and telemetry to measure platform health, adoption, and user experience
What we offer
What we offer
  • Support for professional accreditations such as ACCA and study leave
  • Flexible arrangements, generous holidays, plus an additional day off for your birthday
  • Continuous mentoring along your career progression
  • Active sports, events and social committees across our offices
  • 24/7 support available from our Employee Assistance Program
  • The opportunity to invest in our growth and success through our Employee Share Plan
  • Plus additional local benefits depending on your location
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right