CrawlJobs Logo

Senior Incident Manager

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , San Antonio

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Microsoft Cloud Infrastructure and Operations (CO+I) is the engine that powers Microsoft's cloud services. The group is responsible for designing, building, and operating Microsoft’s global datacenters; managing the programmatic delivery of our critical infrastructure design, equipment procurement, construction delivery, infrastructure innovation, demand planning and capacity utilization of our unified infrastructure; and responsible for all operations needed to run the physical infrastructure. We focus on smart growth with an emphasis on automation, data-driven engineering, cost‐effectiveness, and environmental sustainability. We deliver the core infrastructure and foundational technologies for Microsoft's 200+ online businesses including Azure, Office 365, Bing, Xbox Live, Skype, and OneDrive. Our portfolio is built and managed by a team of subject matter experts working 24x7x365 to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. Within CO+I, the Data Center Incident Management Team (DCIM) is responsible for 24 x 7 x 365 incident management for Microsoft data centers worldwide. Within the DCIM Team, we are seeking a highly motivated and experienced Senior Incident Manager to join our team. If you are a strategic thinker with a passion for driving business success, we encourage you to apply for this exciting opportunity.

Job Responsibility:

  • Shares insights and best practices that can be applied to improve development and operations across related sets of the systems, services, platforms, and/or products
  • Mentors and coaches other engineers to help them identify and propose relevant solutions
  • Collaborates within and across teams by proactively and systematically sharing information with an appropriate level of detail for their audience
  • Overcomes obstacles by resolving conflicts and issues across interdependent teams and engages with partners and stakeholders so issues can be resolved and mutual objectives are met
  • Develops, leverages, and drives sharing of information and knowledge base across teams
  • Leverages advanced technical expertise, judgment, and decision making to coordinate multiple work streams and resources in crisis situations to drive mitigation plan and resolve, reduce, or mitigate the impact of a crisis by engaging necessary teams and escalating to appropriate stakeholders
  • Independently conducts root cause analyses and participates in post-incident reviews based on incidences/crises for the purposes of leading continuous improvement
  • Applies diagnostic expertise
  • Provides guidance to other engineers working to mitigate and resolve issues
  • Communicates customer impact and other relevant information with key stakeholders, leadership, and customers
  • Develops and drives projects and programs to improve crisis response by creating standard practices for consistent response across engineering teams
  • Fosters increased service stability
  • Reduces future noise by participating in optimization of telemetry and alarming
  • Influences key stakeholders to adopt new standards and practices to broadly improve crisis and problem management
  • Creates, monitors, and takes action on telemetry data and influences telemetry analytics to better identify patterns that reveal errors and unexpected problems that are affecting the system's availability, reliability, performance, and/or efficiency
  • Develops scripts and/or automation and leverages an understanding of solutions to define, develop, measure, track, change, and improve the quality of telemetry pipelines that support automated monitoring and incident response
  • Identifies and develops telemetry collaborations that result in better-together services
  • Responds to incidents during regular on-call rotations, including complex incidents with major customer or business impact, by identifying the level of impact, troubleshooting, contributing to difficult decisions based on business impact, deploying appropriate fixes to resolve root cause(s), and implementing automations for prevention of recurring incidents through coordinating resources required for incident resolution
  • Escalates resolution of highly complex, ambiguous, and impactful incidents as needed
  • Contributes to postmortems and shares details related to incidents and their resolution through post-mortem reports and regular review meetings
  • Provides expert incident response assistance to other Service Engineers as needed, and develops incident response and resolution guidance
  • Adheres to and promotes prescriptive guidance for security, privacy, and compliance standards in alignment with direction from the business and technical experts
  • Works with security, privacy, and compliance teams to identify and address issues relevant to their services and resolve them within the service level agreement (SLA)
  • Provides assistance to other service engineers as needed
  • Independently implements reliable, scalable, and high-performance solutions across teams
  • Contributes to design documents
  • Owns implementation and rollback plans
  • Maintains quality checklist and related documentation
  • Quantifies and ensures the health and compliance of a service according to Engineering and industry standards
  • Monitors and maintains security by addressing security vulnerabilities through patches, reconfigurations, and/or settings updates
  • Identifies, prioritizes, and targets solutions to complex security issues that may impact customers and partners, and drives action to promote the adoption of relevant mitigations
  • Drives program and process of mitigation, troubleshoots system issues, and partners closely with internal customers and engineering teams to conduct root cause analyses, share end-to-end expertise in services, and to mitigate and resolve issues
  • Communicates and drives adherence to security policies and procedures
  • Takes ownership of service design by driving efforts within an organization to identify, define, recommend, and build optimal configurations of technology solutions with considerations for cost management, and service health, security, resiliency, and reliability, while taking into account scalability of services
  • Develops end-to-end expertise in service and/or system design, interactions between technology layers and components, functions of infrastructure, and dependencies at scale
  • Independently adjusts configurations and defines infrastructures to improve the availability, reliability, efficiency, observability, and/or performance of supported products and services
  • Drives collaborative reviews with the engineering teams that develop and/or manage services and other stakeholders, identifying opportunities for efficiencies in operations and sharing learnings and recommendations across engineering teams and other stakeholders working on related services within their organization
  • Independently designs a service/system in a manner that allow for robust and scalable measurement of quantifiable metrics for assessing health, quality, and functionality
  • Stays current in knowledge and expertise as technology landscape evolves, maintaining awareness of industry norms
  • Uses knowledge to drive the adoption of new solutions across engineering teams working with related products within an organization
  • Provides guidance to others through sharing, coaching, conferences, and other means to drive improvements across teams

Requirements:

  • Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 3+ years technical experience in data center or critical environment space OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Nice to have:

  • Master's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 6+ years technical experience in data center or critical environment space OR equivalent experience OR Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 8+ years technical experience in in data center or critical environment space OR equivalent experience OR equivalent experience
  • 3+ years technical experience working with large-scale cloud or distributed systems

Additional Information:

Job Posted:
March 01, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Incident Manager

Senior Product Manager - Incident Response

At Corelight, we believe that the best approach to cybersecurity risk starts wit...
Location
Location
United States
Salary
Salary:
182000.00 - 219000.00 USD / Year
https://corelight.com/ Logo
Corelight
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in cybersecurity, with a strong focus on enterprise security workflows, policy management, or asset intelligence
  • 3+ years in product management or a similar role, driving roadmap and feature execution
  • Experience with security operations (SOC), including detection tuning, policy frameworks, and compliance needs
  • Strong understanding of network security monitoring, intrusion detection, and enterprise security architecture
  • Familiarity with CMDB, CAASM, or asset intelligence tools and their role in security operations
  • Strong knowledge of SOC workflows and security event triage processes
  • Experience working with enterprise IT/security leaders (CISO, SOC Managers, Compliance Teams) to align security policies with operational needs
  • Ability to work cross-functionally with engineering, UX, and customers to deliver scalable solutions
Job Responsibility
Job Responsibility
  • Own the policy and asset database roadmap within the Investigator platform, ensuring device groups and policy assignment work seamlessly together
  • Develop tuning mechanisms that max granular tuning of policy quick and easy
  • Develop custom prioritization engines with great defaults but a focus on putting the power in the customer’s hands
  • Build out powerful CMDB/CAASM-like asset management capabilities to improve everything from policy assignment to triage context
  • Work with SOC teams and CISOs to validate policy workflows and ensure the platform meets oversight and compliance needs
  • Collaborate with sales and customers to prioritize features that have the biggest impact on security operations
  • Write detailed product requirements, ensuring engineering has a clear understanding of expectations
  • Work closely with team members to ensure policy workflows support effective detection and investigation processes
  • Drive executive reporting to support SOC leadership in tracking detection effectiveness
What we offer
What we offer
  • Equity
  • Additional benefits
  • Fulltime
Read More
Arrow Right

Incident Manager

We are seeking a proactive and detail-oriented Incident Manager to take ownershi...
Location
Location
United States , Princeton
Salary
Salary:
82.35 USD / Hour
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, or a related field
  • 3–5 years of experience in IT service management or incident management roles
  • Strong understanding of ITIL framework
  • ITIL certification preferred
  • Excellent communication, leadership, and problem-solving skills
  • Ability to perform under pressure in a fast-paced, 24/7 environment
  • Experience with service management tools (e.g., ServiceNow, BMC Remedy, Jira Service Management)
Job Responsibility
Job Responsibility
  • Manage and coordinate the response to high-impact incidents, ensuring timely resolution and communication
  • Act as the central point of contact during major incidents, coordinating cross-functional teams and technical resources
  • Drive root cause analysis (RCA) and post-incident reviews to identify corrective and preventive actions
  • Maintain detailed incident logs, timelines, and reports for transparency and compliance
  • Develop and maintain incident management policies, procedures, and workflows
  • Provide regular updates to senior management and stakeholders on incident status and progress
  • Collaborate with Change and Problem Management teams to ensure a seamless ITIL service management approach
  • Lead the continual improvement of incident management processes, tools, and performance metrics
Read More
Arrow Right

Manager / Senior Manager of EMR Integrations & Interoperability

We are seeking an experienced and hands-on Manager / Senior Manager of EMR Integ...
Location
Location
United States
Salary
Salary:
147841.00 - 195361.00 USD / Year
billiontoone.com Logo
BillionToOne
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in healthcare IT, EMR integration, or clinical interoperability
  • At least 2–3 years in a people or project leadership role
  • Solid technical expertise in major EMR platforms and interoperability standards (e.g., HL7, FHIR, CDA, SMART on FHIR, APIs)
  • Proven track record in delivering complex integration projects on time and within scope
  • Experience managing small-to-mid-sized technical teams
  • Strong communication and stakeholder management skills across technical and non-technical groups
  • Familiarity with agile project management and SDLC best practices
  • Bachelor’s degree in Computer Science, Health Informatics, Biomedical Engineering, or related field preferred
Job Responsibility
Job Responsibility
  • Define and execute the enterprise-wide EMR integration strategy, aligning with clinical, commercial, and product goals
  • Develop and own the long-term roadmap for scalable, secure, and interoperable EMR integration infrastructure
  • Serve as a thought leader on EMR interoperability, standards (e.g., HL7, FHIR, SMART), and vendor ecosystems
  • Manage and mentor a team of EMR integration engineers, analysts, and/or project managers
  • Support hiring, onboarding, and development of team members
  • Foster a collaborative, accountable, and high-performance team culture
  • Establish and evolve team processes, performance standards, and professional development frameworks
  • Oversee the full lifecycle of EMR integrations across Epic, Cerner, Athena, and other major platforms—from initial scoping to go-live and long-term support
  • Lead the team in designing, configuring, and optimizing EMR workflows, data exchange protocols, and custom interfaces
  • Set and enforce best practices for security, scalability, and compliance (e.g., HIPAA, HITRUST)
What we offer
What we offer
  • Working alongside brilliant, kind, passionate and dedicated colleagues, in an empowering environment, toward a global vision, striving for a future in which transformative molecular diagnostics can help millions of patients
  • Open, transparent culture that includes weekly Town Hall meetings
  • The ability to indirectly or directly change the lives of hundreds of thousands patients
  • Multiple medical benefit options
  • employee premiums paid 100% of select plans, dependents covered up to 80%
  • Extremely generous Family Bonding Leave for new parents (16 weeks, paid at 100%)
  • Supplemental fertility benefits coverage
  • Retirement savings program including a 4% Company match
  • Increase paid time off with increased tenure
  • Latest and greatest hardware (laptop, lab equipment, facilities)
  • Fulltime
Read More
Arrow Right

Senior Program Manager, Crisis Management

As a Sr. Program Manager, Crisis Management, People Resilience at Atlassian, you...
Location
Location
Poland , Gdańsk
Salary
Salary:
232000.00 - 278000.00 PLN / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of relevant experience in crisis management, risk management, or business resilience
  • A collaborative, flexible, and self-motivated attitude with a passion for problem-solving
  • Strong communication skills and an inclusive approach to teamwork
  • A “Get S$#@ Done” (GSD) attitude, with a proven track record of delivering results
  • Comfort working in remote and hybrid teams across global time zones
  • Ability to manage multiple work streams and thrive in a dynamic, fast-paced environment
  • Enthusiasm for Atlassian’s mission and values
Job Responsibility
Job Responsibility
  • Respond to and support the management of unexpected disruptive incidents affecting Atlassian through the entire crisis management lifecycle and maintain incident tracking
  • Take ownership by consistently reviewing strategies and taking corrective actions to ensure success in preventing, responding to, and recovering from crises affecting people
  • Manage results by skilfully communicating risk assessment goals to teams and prioritizing tasks to ensure high-quality mitigation strategies
  • Contribute to knowledge management by improving training programs based on past experiences and managing knowledge distribution across teams
  • Actively contribute to decisions impacting team resilience and use data to measure the impact of implemented strategies
  • Occasional international travel
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Senior Program Manager, Emergency Management

As a Manager, People Resilience at Atlassian, you will play a vital role in fost...
Location
Location
United States , San Francisco
Salary
Salary:
116100.00 - 186500.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of relevant experience in emergency management, risk assessment, or business resilience
  • Bachelor’s degree or higher preferred
  • A collaborative, flexible, and self-motivated attitude with a passion for problem-solving
  • Strong communication skills and an inclusive approach to teamwork
  • A “Get S$#@ Done” (GSD) attitude, with a proven track record of delivering results
  • Comfort working in remote and hybrid teams across global time zones
  • Ability to manage multiple work streams and thrive in a dynamic, fast-paced environment
  • Enthusiasm for Atlassian’s mission and values, along with a sense of humor and adaptability
Job Responsibility
Job Responsibility
  • Respond to and support the management of no-notice disruptive incidents affecting Atlassian through the entire emergency management lifecycle and maintain incident tracking
  • Take ownership by consistently reviewing strategies and taking corrective actions to ensure success in preventing, responding to, and recovering from disruptions to people
  • Manage results by skillfully communicating risk assessment goals to teams and prioritizing tasks to ensure high-quality mitigation strategies
  • Improve compliance management by contributing to cross-team projects to improve compliance processes and communicate findings related to deficiencies
  • Support risk management by leading assessment identification across multiple domains and communicating potential risks, developing comprehensive risk response plans, and anticipating barriers by harnessing data analytics for risk trends to ensure agility in response to new risks
  • Contribute to knowledge management by improving training programs based on past experiences and managing knowledge distribution across teams
  • Actively contribute to decisions impacting team resilience and uses data to measure the impact of implemented strategies
  • Develop creative and cultural sensitive solutions to challenges in fostering a culture of personal preparation and resilience
  • Occasional international travel
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Third Party Risk Management Senior Expert

The Third Party Risk Management Expert manages the run of Third Party Risk Manag...
Location
Location
Romania , Bucharest
Salary
Salary:
Not provided
https://www.allianz.com Logo
Allianz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • University degree (Legal, Business, Economics, Computer Science or similar)
  • 2-5 years of relevant working experience in Compliance, Vendor Management, Risk Management, Audit or Contract Management domains
  • Familiarity with industry frameworks like ISO 27001, Cybersecurity Framework, SOC 2 and overall understanding of regulations such as GDPR, DORA, etc
  • Knowledge of risk assessment methodologies, including inherent risk and residual risk assessments
  • Strong customer service orientation, developed social skills and cross-cultural experience and ability to operate within a global team environment / work within global virtual teams
  • Fluent English is necessary, knowledge of German or other languages is a plus and high quality of oral and written communication skills
  • Self-motivated, proactive and customer-centric working style
  • Experience in setting priorities and work to tight deadlines
  • Ability to deliver high-quality results and takes ownership of initiatives
Job Responsibility
Job Responsibility
  • Manage and oversee efficient and effective implementation of Allianz Third Party Risk Management Standard and Outsourcing Policy across Allianz Operating Entities to ensure compliance related to DORA and other regulatory requirements
  • Perform vendor service classification and evaluate vendor security practice, including cloud security, data protection and incident response
  • Plan and facilitate completion of all Risk and Control Assessments for vendor population
  • Enable operational execution of activities related to vendor risk management and of the overall TPRM process using the internal tools and platforms (RSA Archer, ServiceNow)
  • Collaborate with relevant departments and stakeholders involved in the process
  • Develop and implement a TPRM strategy that aligns with business goals
  • Independently track progress of TPRM actions of operational entities and pro-actively communicate with stakeholders
  • Prepare Third Party Vendor Management related reports / dashboards and report to senior management
  • Support in remediation actions required to ensure compliance with the Digital Operational Resilience Act and other regulatory requirements
What we offer
What we offer
  • Fixed salary compensation along with fixed benefits
  • Flexible benefits that can be individually customized
  • Additional vacation days (work tenure, Allianz tenure, special events, Paid day for child medical check-up)
  • Rewards and Recognition Program (Team Excellence Award, Anniversary Awards, Above & Beyond Awards, Thank you for your contribution!)
  • Complete training curricula available (tailored courses): International Certifications (Agile, Lean Six Sigma, Prince, ITIL, IFOA, ACCA, IACCM etc.), Comprehensive Leadership Programs, LinkedIn Learning, German Language Courses for any level
  • All you can read with Bookster
  • Share Purchase Plan
  • Allowances for special events (Birth Allowance, Losing a Family Member)
  • Flexible working environment (work from home, hybrid)
  • Medical services, Private pension, Internal Tourism, Meal Tickets and many other benefits of your choice
  • Fulltime
Read More
Arrow Right

Risk and Compliance Senior Manager

From day one at Unobravo, we’ve been on a mission to make mental health support ...
Location
Location
Italy , Milan
Salary
Salary:
Not provided
unobravo.com Logo
Unobravo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in senior compliance roles, with mandatory experience in a regulated market
  • healthcare sector (digital and/or physical) experience is a plus
  • Strong knowledge of European regulations, including data protection, healthcare, digital marketing, and consumer protection
  • Ability to anticipate and address evolving AI regulations, ensuring training, compliance, and organisational readiness
  • Global or pan-European experience, with ability to balance local compliance needs with a worldwide strategy
  • Excellent communication skills to translate complex compliance topics into practical solutions for diverse stakeholders
  • Proactive and hands-on, able to balance strategic initiatives with operational needs
  • Fluency in Italian and English, with international experience
  • presence in Italy is a strong advantage
Job Responsibility
Job Responsibility
  • Strategic Compliance Leadership: Define and implement a practical compliance framework across products, marketing, and infrastructure, balancing scale-up needs with risk management
  • Clinical Collaboration: Ensure compliance with healthcare regulations relevant to our role as a medical center
  • Compliance Management: Partner with product, marketing, and security to ensure GDPR, healthcare advertising, and NIS2 compliance. Provide strategic advice on privacy and health regulation, enabling Privacy by Design and Compliance by Design
  • Cross-functional Collaboration: Work closely with legal, IT, finance, HR, clinical, operations, and leadership to integrate compliance into all business decisions
  • Risk Management: Identify and mitigate risks across privacy, data, marketing, and communications. Lead DPIAs, LIAs, and other assessments
  • Global & Local Balance: Develop a compliance strategy that ensures our global product meets local regulatory requirements
  • Policies & Training: Create internal policies, deliver training, and build a culture of compliance and privacy awareness
  • Audit & Incident Response: Lead audits, monitor compliance, manage incidents, and oversee whistleblowing and reporting processes
  • Stakeholder Communication: Represent compliance priorities to leadership and advocate for key initiatives
  • Regulatory Monitoring: Track regulatory changes and best practices, updating company policies as needed
What we offer
What we offer
  • Flexibility to work from anywhere within your country of hire
  • Home workstation budget
  • Up to two coworking sessions a month
  • Exclusive discounts on psychotherapy sessions
  • Company retreats, team-building experiences, aperitivo parties
  • Free online language training
  • Birthday day off
  • Additional day off on World Mental Health Day
  • Inclusive parental leave
  • Fulltime
Read More
Arrow Right

Command and Control Senior Manager

The C3 Senior Manager is responsible for the day-to-day management and operation...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
cloudcarib.com Logo
Cloud Carib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5 to 10+ years of previous operations experience as a Senior Manager with direct experience in a Data Centre, Cloud Computing, Managed Services, or Hosting environment as associated to ITIL and ITSM practices
  • Bachelor’s degree in computer science or relevant area or accumulated on the job experience within the IT/Technology Services industry
  • Demonstrated leadership, communication, and technical writing skills
  • Advanced skills in a wide array of technologies to ensure operational expertise and technologies that are aligned to the Services offered by the company
  • Ability to budget, multi-task, prepare reports, and measure results
  • Must speak and write fluently in the English language
Job Responsibility
Job Responsibility
  • The Senior Manager leads the C3 (Command and Control Centre) Service Desk function and is accountable for process mapping between staff and controls in relation to Event Management, Incident Management, and Problem Resolution
  • The Senior Manager is accountable to guarantee both passive and active monitoring tools are in place and fully functional 24x7x365 to maintain 100% compliance for data capture for any change in any Configuration Item (CI) or Service under management
  • The Senior Manager is responsible to ensure that monitoring systems and practices are constantly tuned to guarantee Event management is focused on generating and detecting meaningful notifications about the status of the IT infrastructure and Services
  • The Senior Manager is responsible for generating daily, weekly, monthly, quarterly, and annual compliance reports that Event Management is functioning within a complaint state
  • where variations occur, clear documentation is present to show remediation timelines and plans for audit purpose
  • The Senior Manager is accountable and responsible for ensuring end-to-end compliance to established Service Level Agreements (SLAs) and Service Level Objectives (SLOs) for all aspects of the service desk function
  • The Senior Manager is accountable for planning, management, and operations of all tools, processes, and people involved in the Incident Management process
  • The Senior Manager is responsible for coordinating all interfaces between Incident Management and other Service Management Processes
  • The Senior Manager is responsible for generating daily, weekly, monthly, quarterly, and annual compliance reports that Incident Management is achieving greater than the 90th percentile in efficiency, cost, and Client satisfaction
  • The Senior Manager is accountable and responsible for the end-to-end management, oversight, escalation (technical and management), and communications for all Major Incidents
  • Fulltime
Read More
Arrow Right