Incident Commander, Program Manager Job at Block (Bay Area)

Staff Program Manager, Incident Management - AV Development

Staff Program Manager, Incident Management – Autonomous Vehicle Development At ...

Location

United States , Milford, Michigan; Remote; San Francisco, California

Salary:

134700.00 - 245000.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Deep knowledge of incident response frameworks in AV, mobility, or other complex, safety-critical environments
7+ years in incident management, including experience as an Incident Commander or lead responder
3+ years of people leadership experience
Proven ability to stay calm, clear, and precise during high-pressure escalations
Experience briefing senior leadership in real time
Demonstrated experience training teams and reinforcing safety culture
Ability and willingness to participate in a 24/7 on-call rotation

Job Responsibility

Own the incident response vision, roadmap, playbooks, escalation models, and quality standards for AV operations
Align engineering, operations, safety, legal, and executive partners around clear, disciplined incident response processes
Manage and mentor Incident Response Specialists
drive accountability, growth, and team cohesion
Foster a culture rooted in safety, precision, continuous improvement, and psychological safety
Oversee the response for major incidents affecting the AV fleet to minimize safety and operational impacts
Provide executive-level clarity during fast-moving events while ensuring disciplined, real-time response operations
Serve as an escalation point for critical, fleet-impacting, or reputationally sensitive incidents
Partner closely with technical experts to diagnose issues across software, hardware, and systems
Coordinate response efforts with engineering, operations, safety, legal, communications, and executives

What we offer

Health and wellbeing benefit programs
medical
dental
vision
Health Savings Account
Flexible Spending Accounts
retirement savings plan
sickness and accident benefits
life insurance
paid vacation & holidays

Fulltime

Data Center Incident Program Manager

The Data Center Incident Program Manager is responsible for designing, operating...

Location

United States

Salary:

125600.00 - 228000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

7+ years in mission-critical infrastructure, data center operations, or reliability engineering
Direct experience leading major incidents (P1/P0 equivalent)
Strong familiarity with facilities systems, hardware operations, or network infrastructure
Demonstrated experience running war rooms and executive updates
Experience conducting root cause analysis and corrective action tracking
Ability to remain calm and decisive under high-pressure conditions

Job Responsibility

Define and maintain incident severity levels (SEV definitions), classification criteria, and escalation thresholds
Establish end-to-end incident response standards: protocols, lifecycle stages (declare → stabilize → mitigate → recover → close), and operating cadence
Build and maintain governance artifacts: runbooks, war room formats, reporting templates, and decision/communication standards
Create and operationalize notification trees, stakeholder comms templates (initial, periodic updates, recovery/closure), and executive escalation criteria
Define clear RACI across Facilities, Hardware Ops, Network, Security, and vendor/partner teams, including handoffs and accountability paths
Set and manage SLAs/OLAs for acknowledgment, escalation, containment, mitigation, and reporting
Implement and run incident management tooling (ticketing, paging, logging) and ensure integrations with monitoring and workflow systems
Establish dashboards and program health metrics to track incident performance and readiness
Lead readiness activities: tabletop exercises, cross-functional simulations, IC/Deputy training, and a rotating on-call IC bench with certification standards
Serve as Incident Commander as needed: declare severity, stand up the war room, assign functional leads, and drive structured execution under pressure

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Sr. Incident Commander

The Cloud & AI organization accelerates Microsoft’s mission and bold ambitions t...

Location

United States , Multiple Locations

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Doctorate in Statistics, Mathematics, Computer Science, or related field OR Master's Degree in Statistics, Mathematics, Computer Science, or related field AND 3+ years experience in software development lifecycle, large-scale computing, threat modeling, cyber security, anomaly detection, Security Operations Center (SOC) detection, threat analytics, security incident and event management (SIEM), information technology (IT), or operations incident response OR Bachelor's Degree in Statistics, Mathematics, Computer Science, or related field AND 4+ years experience in software development lifecycle, large-scale computing, threat modeling, cyber security, anomaly detection, Security Operations Center (SOC) detection, threat analytics, security incident and event management (SIEM), information technology (IT), or operations incident response OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Citizenship & Citizenship Verification: This role will require access to information that is controlled for export under export control regulations
Citizenship & Citizenship Verification: This position requires verification of citizenship due to citizenship-based legal restrictions
7+ years of experience in software development lifecycle, large-scale computing, modeling, cyber security, and anomaly detection OR Master's Degree or Doctorate in Statistics, Mathematics, Computer Science or related field
CISSP, CISA, CISM, SANS, GCIA, GCIH, OSCP, and/or Security+ certification Strong program management skills
5+ years of experience in software development lifecycle, large-scale computing, modeling, cyber security, anomaly detection, Security Operations Center (SOC) detection, threat analytics, security incident and event management (SIEM), information technology (IT), and operations incident response OR Bachelor's Degree in Statistics, Mathematics, Computer Science or related field
5+ years of experience in information security incident handling and/or security operations
5+ years of experience triaging security vulnerabilities and driving product and/or service response

Job Responsibility

Perform cyber defense incident and/or vulnerability triage to determine scope, urgency, and potential risk impact
Make high-stake decisions that enable expeditious remediation of risk to protect customers and Microsoft
Track and document cyber defense incidents from initial escalation through final resolution
Provide tactical security decisions and coordinate enterprise-wide cyber defenders to resolve incidents
Send timely and clear executive updates explaining the risk to customers and Microsoft
Advise and validate customer notifications and/or authoritative security guidance for customers
Conduct incident analysis, produce reports, and briefs informing threat landscape trends and future investment areas to improve security
Embody our culture and values

Fulltime

Incident Manager

This is an incredible opportunity for a progressive, pragmatic, and service-orie...

Location

Philippines , Manila

Salary:

Not provided

Apex Clearing

Expiration Date

Until further notice

Requirements

5 years of relevant work experience, designing, implementing and executing incident management programs
5 years of experience in partnering with Support/Client Partners/Engineering/Product teams and customers to deliver incident response outcomes
Leadership presence with the ability to command and control highly stressful situations with a calming influence
Ability to effectively communicate multi-functionally with both internal stakeholders and external customers or partners
Evidence of a bias to action with strong attention to detail and data-driven decision making
Ability to make logical, quick decisions to progress investigations
Prior experience in documenting and collecting relevant data for accurate metrics and reporting
Handle majority of IM planning and coordination (PD admin, documentation, training, processes, readiness, proactivity, reporting)
Own incident management as a practice and report into ITSM and Tech-Ops leadership. Oversee mentorship and onboarding of new incident manager
Provide the depth of Incident Management experience developed working incidents, conducting lessons learned reviewed, coordinating changes and constantly iterating on the process

Job Responsibility

Deliver results. Use ticket data, client feedback, and experiences to influence and drive improvements in our processes. Produce reports displaying service metrics on key service measures such as response and resolution time
Collaborate with engineering and product teams. As a member of the IT Service Management Team you’ll work closely with other support teams to triage, investigate and restore critical service outages
Focus on continuous improvement. You'll be expected to identify and report on the frequency and severity of technical incidents which negatively impact internal and external customers
Support our world class client base. Promote a culture of quick and effective response to client impacting situations
Identify smart and creative ways to solve issues and client challenges
Stay updated on new technologies and tools. You’re in tune at all times with new functionality within our current tool kit as well as opportunities using 3rd party tools to improve our level of service to our clients

What we offer

market-leading salary with an annual bonus
20 days of vacation leave plus regular and special non-working holidays
training and development budget
private health insurance for medical and dental
life insurance
flexible working hours
parental leave
modern city center office
hybrid work schedule
monthly team lunch-outs

Fulltime

Incident Manager

This is an incredible opportunity for a progressive, pragmatic, and service-orie...

Location

United Kingdom , Belfast

Salary:

Not provided

Apex Clearing

Expiration Date

Until further notice

Requirements

5 years of relevant work experience, designing, implementing and executing incident management programs
5 years of experience in partnering with Support/Client Partners/Engineering/Product teams and customers to deliver incident response outcomes
Leadership presence with the ability to command and control highly stressful situations with a calming influence
Ability to effectively communicate multi-functionally with both internal stakeholders and external customers or partners
Evidence of a bias to action with strong attention to detail and data-driven decision making
Ability to make logical, quick decisions to progress investigations
Prior experience in documenting and collecting relevant data for accurate metrics and reporting
Handle majority of IM planning and coordination (PD admin, documentation, training, processes, readiness, proactivity, reporting).
Own incident management as a practice and report into ITSM and Tech-Ops leadership. Oversee mentorship and onboarding of new incident manager.
Provide the depth of Incident Management experience developed working incidents, conducting lessons learned reviewed, coordinating changes and constantly iterating on the process.

Job Responsibility

Deliver results. Use ticket data, client feedback, and experiences to influence and drive improvements in our processes. Produce reports displaying service metrics on key service measures such as response and resolution time.
Collaborate with engineering and product teams. As a member of the IT Service Management Team you'll work closely with other support teams to triage, investigate and restore critical service outages.
Focus on continuous improvement. You'll be expected to identify and report on the frequency and severity of technical incidents which negatively impact internal and external customers.
Support our world class client base. Promote a culture of quick and effective response to client impacting situations
Identify smart and creative ways to solve issues and client challenges.
Stay updated on new technologies and tools. You're in tune at all times with new functionality within our current tool kit as well as opportunities using 3rd party tools to improve our level of service to our clients.

What we offer

market-leading salary with an annual bonus
28 days of annual leave plus 10 Northern Ireland national holidays
training and development budget
pension matched up to 7%
private health insurance for medical, dental, and optical care
life insurance
flexible working hours
parental leave
modern city center office
hybrid work schedule

Fulltime

Global Crisis Incident Manager

The Global Crisis Incident Manager for the Command Center in the Microsoft CE&S ...

Location

United States , Redmond

Salary:

96500.00 - 188400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in technology, business, or related field AND 3+ years technology industry, customer service, or related experience
OR Bachelor's Degree in technology, business, or related field AND 4+ years technology industry, customer service, or related experience
OR 7+ years technology industry, customer service, or related experience
OR equivalent experience
Extensive experience in incident, escalation, or crisis management in a 24x7 operational environment
Experienced leading high‑severity, business‑critical incidents
Fluency in English (written and verbal)
Hands‑on experience with Incident, Problem, and Change Management processes
Proven ability to lead post‑incident reviews and drive corrective actions
Proven stakeholder management skills across technical and non‑technical audiences

Job Responsibility

Lead all Severity and Crisis incidents from initiation through stabilization
Ensure incidents are assigned to the correct resolver teams and progress against SLAs
Drive clear, timely stakeholder communications and executive updates
Manage all outstanding actions until an acceptable workaround or resolution is in place
Lead Major Incident Reviews and CSS Live Site Reviews
Own Post Incident Reviews (PIRs) for S500 customers in the assigned time zone
Identify root causes, systemic gaps, and improvement opportunities
Ensure high‑quality documentation and follow‑through on corrective actions
Own CCG (Crisis Command Group) initiative execution across the time zone
Plan and lead crisis drills and program iterations

Fulltime

Manager II, Security Incident Command

Uber’s Incident Command team, part of the Threat Defense and Response (TDR) orga...

Location

United States , New York; Seattle; San Francisco; Sunnyvale

Salary:

232000.00 - 258000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

8+ years of experience in one or more of the following: Security incident response
Production incident management (e.g., SRE, Ring0, reliability engineering)
Security or infrastructure operations
Experience leading or coordinating high-severity incidents in a complex, distributed environment
Experience serving as an incident commander, incident lead, or equivalent leadership role during critical incidents
Strong systems thinking: ability to navigate incidents across infrastructure, applications, and services
Excellent communication and stakeholder management skills, especially under pressure
Experience mentoring or managing engineers or operational responders

Job Responsibility

Lead a global team of incident commanders managing Uber’s highest severity security incidents
Drive structured, effective coordination across engineering, security, and business teams during high-impact events
Partner with Security, Legal, and Privacy on sensitive incidents requiring careful judgment and handling
Evolve incident management practices by integrating security IR and SRE/Ring0 disciplines
Own postmortem, premortem, and incident simulation programs to improve resilience and organizational readiness
Translate external incidents and emerging threats into actionable risk reduction across Uber
Build and integrate automation and AI-driven capabilities into incident response, postmortems, premortems, and incident simulations
Translate incident processes into scalable systems, defining safe automation boundaries and human-in-the-loop decision frameworks
Mentor and grow incident commanders in leadership, decision-making, engineering, and operational excellence
Foster an inclusive, high-performing culture grounded in accountability, learning, and continuous improvement

What we offer

Eligible to participate in Uber's bonus program
May be offered an equity award & other types of comp
All full-time employees are eligible to participate in a 401(k) plan
Eligible for various benefits

Fulltime

Software Development Senior Specialist

We are seeking an experienced L3 Support Engineer and proactive individual with ...

Location

Mexico , guadalajara

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

5+ years of experience in L3 support or similar roles
Strong proficiency in JAVA programming and application support
Good Troubleshooting skills in analyzing System and Application logs and working experience of support function in distributed systems
Proficiency with Linux-based operating system, commands and utilities

Job Responsibility

Provide escalated technical support for LeX platform application issues, diagnosing and resolving complex problems through in-depth analysis. Escalate unresolved issues to development teams with detailed documentation
Perform system monitoring, maintenance, and optimization to ensure platform stability and performance. Identify and address performance bottlenecks, software bugs, and system errors
Assist in the deployment and release checkouts, configuration changes like rate limits and user onboardings
Develop and maintain technical documentation, including troubleshooting guides and standard operating procedures (SOP)
Build and improve runbooks to minimize operational errors and improve operational efficiency
Manage incidents, communicate effectively with users, application owners, and senior stakeholders across all areas
Conduct root cause analysis (RCA) for recurring issues and recommend preventive measures or process improvements
Analyze patterns of failures and issues to share the tickets for strategic and permanent fixes to App Dev, thereby improving infrastructure stability and performance

Fulltime

Select Country

Incident Commander, Program Manager

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?