This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Problem Management Engineer is responsible for leading and maturing the Problem Management function to prevent incident recurrence, reduce operational risk, and improve service resiliency. This role owns the quality and effectiveness of root cause analysis, ensures permanent fixes are validated, and drives continuous improvement across IT services in alignment with ITIL best practices. This position requires a strong technical background to credibly engage engineering teams, challenge root cause conclusions, and ensure solutions are durable, evidence based, and measurable.
Job Responsibility:
Lead and oversee the end-to-end Problem Management lifecycle, including detection, logging, classification, investigation, resolution, validation, and closure
Ensure problems are closed only when defined closure criteria are met, including validated resolution, preventive controls, and monitoring improvements
Prevent premature or superficial closure of problems by enforcing quality and evidence standards
Lead and validate structured Root Cause Analysis (RCA) using methodologies such as 5 Whys, Fishbone, and fault tree analysis
Challenge assumptions and ensure true root causes are identified for major incidents and recurring issues
Review and validate the technical feasibility and effectiveness of permanent fixes
Partner closely with Incident Management, Change Management, Resiliency/Reliability Engineering, and Service Owners
Coordinate permanent fixes through formal change processes
Work with vendors and external partners to track dependencies and ensure accountability
Establish and enforce Problem Management governance and quality standards
Track and report on key metrics, including overdue problems, SLA compliance, recurrence trends, and systemic risks
Provide clear, actionable updates and insights to senior leadership and executive forums
Maintain and improve the Known Error Database (KEDB) and Problem related Knowledge Articles
Identify opportunities for proactive problem management, automation, and improved monitoring and alerting
Continuously refine Problem Management processes, tools, and standards to increase effectiveness and efficiency
Requirements:
Strong understanding of ITIL Problem Management processes and best practices
Proven experience leading or performing Root Cause Analysis in complex technical environments
Technical background (infrastructure, applications, cloud, or enterprise platforms) sufficient to engage and challenge engineering teams
Hands on experience with ServiceNow or comparable enterprise ITSM platforms
Strong communication and stakeholder management skills, including executive level communication
Ability to analyze trends, identify systemic risk, and drive proactive improvements
Nice to have:
ITIL Foundation certification or higher
Experience in large scale enterprise environments
Experience supporting Major Incident or executive outage review forums
Familiarity with automation, observability, and proactive problem management techniques
Experience working with vendors and external service providers
What we offer:
Career advancement opportunities
Extensive training
Excellent benefits including paying for health and dental premiums for salaried employees