CrawlJobs Logo

Problem and Incident Manager - IT Infrastructure Maintenance

nttdata.com Logo

NTT DATA

Location Icon

Location:
France , Strasbourg

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Problem and Incident Manager – IT Infrastructure Maintenance is responsible for overseeing the end-to-end management of infrastructure-related incidents and problems. The role ensures the stability, performance, and resilience of core IT infrastructure systems, including servers, networks, storage, and data center operations.

Job Responsibility:

  • Lead the response and resolution of critical infrastructure incidents (e.g., server outages, network failures, storage disruptions) during the normal business hours ( possibility to extend this activity to out of hours)
  • Coordinate response efforts across infrastructure, networking, security, and vendor support teams
  • Monitor incident queues and ensure adherence to SLA response/resolution times
  • Communicate effectively with stakeholders during infrastructure outages, providing regular updates and estimated time to resolution (ETR)
  • Perform and document root cause analyses following high-impact incidents
  • Identify recurring infrastructure failures, performance bottlenecks, or chronic outages
  • Conduct detailed root cause analysis using structured methods (5 Whys, Fishbone Diagram, Fault Tree Analysis)
  • Collaborate with infrastructure engineers and architects to implement permanent corrective actions
  • Develop and maintain the Known Error Database (KEDB) for infrastructure-related issues
  • Analyze incident and problem data to identify trends and drive service improvements
  • Work closely with Change Management to ensure preventive actions are implemented without introducing new risks
  • Contribute to infrastructure reliability initiatives such as monitoring improvements, failover testing, and capacity planning
  • Ensure all processes align with ITIL best practices and are continuously improved

Requirements:

  • More than 2 years of experience in a combined incident/problem management or IT operations role with a focus on infrastructure
  • Technical understanding of infrastructure domains including: Windows/Linux servers, Networking (LAN/WAN, firewalls, routers, switches), Virtualization (VMware, Hyper-V), Storage and backup systems, Data center operations
  • Hands-on experience with ITSM platforms (e.g., ServiceNow, BMC Remedy)
  • ITIL v3/v4 Foundation certification is highly valuable
  • Strong communication skills and the ability to coordinate across technical and non-technical teams
  • Fluent in English
  • Must be eligible for EU Security Clearance (at least 5 years of EU nationality is required)
What we offer:
  • Monthly reimbursement of transportation costs
  • Sustainable mobility allowance
  • Medical & life insurance partially covered
  • Relocation allowance (if applicable)
  • Company phone
  • Meal vouchers
  • Internet allowance
  • 25 days paid annual leave + RTT days
  • Career development
  • Training path and access to learning opportunities
  • Yearly performance reviews
  • Mentorship program
  • Work-life balance and flexibility
  • Casual clothing
  • Decide your working hours
  • Hybrid working model
  • Talent Friends referral bonus
  • Access to a platform with certified psychologists & mental health workshops
  • Online fitness and well-being sessions

Additional Information:

Job Posted:
January 24, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Problem and Incident Manager - IT Infrastructure Maintenance

Major Incident / Problem Manager

The Major Incident / Problem Manager will report to the ITSM Manager. The primar...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Professional degree with 5+ years related IT experience
  • Hands on experience in Managing major incidents
  • Analyzed incident and problem reports to proactively identify potential issues, proposing and implementing resolutions to reduce incident volume
  • Proficient in knowledge of the IT infrastructure (hardware, databases, operating systems, Network, Cloud, Virtualization etc) and future IT trends
  • ITIL 4 Foundation certification mandatory
  • Has a broad knowledge and understanding of IT concepts and architectures, coupled with proven experience of successfully managing incidents and problems
  • Has general awareness of the nature of business-critical incidents, and of their implications for the business
  • Relevant ITIL knowledge and certifications
  • Experience in managed service preferred
Job Responsibility
Job Responsibility
  • Ensures post-review of major problems
  • Ensures reactive and proactive management of IT problems and known errors
  • Coordinates efforts of all Problem Analysts, including suppliers and external teams, to ensure timely resolution of problems
  • Closes all problem records
  • Owns the Known Error Database and ensures its maintenance
  • Carries out the Process Manager responsibilities for the Problem Management process
  • Define and maintain the problem management procedure
  • Periodically review effectiveness and efficiency of the problem management process
  • Continuously improve the problem management process
  • Coordinate between various support teams to identify the root cause of a problem and find a workaround or solution
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right
New

Service Delivery Manager – Infrastructure Operations

The Service Delivery Manager (SDM) is responsible for end-to-end service deliver...
Location
Location
United States , Mahwah
Salary
Salary:
180000.00 - 190000.00 USD / Year
techmahindra.com Logo
Tech Mahindra
Expiration Date
April 13, 2026
Flip Icon
Requirements
Requirements
  • 10+ years in IT Infrastructure Operations or Service Delivery Management
  • Experience in Command center services and Datacenter operations
  • Strong background in Network, Midrange servers, and Mainframe operations
  • Proven experience managing 24x7 global operations
  • Technical Knowledge: High level understanding of Network infrastructure (circuits, routers, switches, APs)
  • Technical Knowledge: High level understanding of Server and midrange platforms
  • Technical Knowledge: High level understanding of Mainframe operations (LPARs, IPL, storage, hardware maintenance)
  • Technical Knowledge: High level understanding of ITSM tools (ServiceNow or equivalent)
  • Technical Knowledge: High level understanding of Incident, Change, and Problem Management frameworks
  • Leadership & Soft Skills: Strong stakeholder and executive communication
Job Responsibility
Job Responsibility
  • Service Delivery & Operations Management: Own and manage L1/L1.5 operations delivery across Midrange Servers, Network and Mainframe platforms
  • Service Delivery & Operations Management: Ensure adherence to SLAs and KPIs across all supported technology towers
  • Service Delivery & Operations Management: Drive RAG-based service health reporting and execution of continuous improvement plans
  • Service Delivery & Operations Management: Lead daily, weekly, and monthly service reviews with stakeholders
  • Network Operations Control (NOC): Oversee L1/L1.5 support of global network infrastructure across data centers
  • Network Operations Control (NOC): Ensure event monitoring and incident management for: Data circuits, Routers, switches, access points (APs), Internal and GNS-procured network hardware
  • Network Operations Control (NOC): Coordinate incident remediation with applicable internal teams and external providers
  • Network Operations Control (NOC): Manage troubleshooting and dispatch of Technology Support Group (TSG) via Service Orders
  • Network Operations Control (NOC): Ensure timely escalation and restoration for business-critical network events
  • Midrange Operations (MRO): Manage L1/1.5 support of midrange infrastructure, including: Open systems servers, Critical workstations, Globally deployed services
What we offer
What we offer
  • medical
  • vision
  • dental
  • life
  • disability insurance
  • paid time off (including holidays, parental leave, and sick leave, as required by law)
  • Fulltime
Read More
Arrow Right

Information Security Lead

We are offering an exciting opportunity in the Financial Services industry, base...
Location
Location
United States , Bensalem
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Lead the daily maintenance and automation of the SOC dashboard
  • Monitor and manage daily security alerts and logs, including Central Log, Virus, IPS, DLP, Web Content, Secure Email, and Active Directory Changes
  • Conduct regular security device and configuration reviews
  • Generate monthly security metrics and dashboards
  • Ensure comprehensive and efficient security patching in partnership with the IS team
  • Evaluate and suggest improvements to our SOC and Automation systems
  • Support both external and internal audit processes
  • Document security incidents as part of the CSIRT team
  • Engage outside contractors with proper technical expertise when necessary
  • Manage and monitor security staff to build a reliable, high-performing infrastructure team
Job Responsibility
Job Responsibility
  • Lead the daily maintenance and automation of the SOC dashboard
  • Monitor and manage daily security alerts and logs, including Central Log, Virus, IPS, DLP, Web Content, Secure Email, and Active Directory Changes
  • Conduct regular security device and configuration reviews
  • Generate monthly security metrics and dashboards
  • Ensure comprehensive and efficient security patching in partnership with the IS team
  • Evaluate and suggest improvements to our SOC and Automation systems
  • Support both external and internal audit processes
  • Document security incidents as part of the CSIRT team
  • Engage outside contractors with proper technical expertise when necessary
  • Manage and monitor security staff to build a reliable, high-performing infrastructure team
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

L2 Support Engineer & Release Manager

The L2 Support Engineer & Release Manager is responsible for overseeing producti...
Location
Location
Romania , Bucuresti
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field
  • At least 4 years of experience as Technical Support Engineer, with at least 2 years working directly as a Release Manager
  • Good knowledge of Oracle database
  • Good knowledge of Rhel OS – Linux OS
  • Shell scripting knowledge
  • Network connectivity troubleshooting
  • Previous experience in and knowledge of ETL / data sourcing techniques
  • Previous experience with Control M, Tomcat / WebLogic/ Apache/ Fabric highly desirable
  • Previous production support experience, with can do mind-set and attitude and hands-on approach
  • Ability to analyse business requirements, defects and propose hot fixes
Job Responsibility
Job Responsibility
  • Drive on different tech adoption initiatives (GCP, EXACC, TRC etc)
  • Provide support for technical infrastructure components (e.g. databases, middleware and user interfaces)
  • Provide Support and remediation on any issues pertaining to the above applications by providing detailed code analysis of applications’ production platform
  • Estimate time required to implement remediation actions which are under direct control of RTB
  • Support and contribute to all relevant documentation following DB internal Standards, Procedures and Guidelines
  • Ensure appropriate vendor interaction in a multi-vendor environment
  • Conduct incident and problem management activities
  • Conduct scheduled Problem Management meetings with infrastructure groups, problem managers and incident managers OR from Agile world with SMs, POs to track progress and highlight issues
  • Perform detailed technology analyses to highlight weaknesses and make recommendations for improvement
  • Perform releases/DR exercises/application maintenance activities over weekends (usually once every 3 weeks there is a weekend activity required)
What we offer
What we offer
  • Smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Projects have different working hours to suit your needs
  • Sponsored certifications, trainings and top e-learning platforms
  • Private Health Insurance
  • Individual coaching sessions or joining our accredited Coaching School
  • Epic parties or themed events
  • Fulltime
Read More
Arrow Right

Systems Engineer

This role will serve as a subject matter expert with enterprise accountability f...
Location
Location
United States , Las Vegas
Salary
Salary:
Not provided
beacontechinc.com Logo
Beacon Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Four-year degree in Computer Science or related field or equivalent experience
  • 10-15 years of experience required in the following areas: Windows Server 2008R2-2012 R2, 2016, and 2019 support
  • Microsoft Active Directory
  • Managing a VMware environment
  • Windows infrastructure services include GPO, DFS, File/Print, DNS, WINS, replication, certificate, and ADFS
  • IP Level experience with VLANs and Subnets
  • PowerShell/Python Scripting/task automation experience
  • MS Exchange/O365
  • Industry Standard Enterprise backup/recovery technology
  • Experience with configuration of Windows servers in a data center environment is required
Job Responsibility
Job Responsibility
  • Responsible for the design, standardization, and ongoing management of our enterprise Windows server/Active Directory, including Windows system administration, ongoing checks on expected server operations, storage space, event logs, etc.
  • Windows server 2008-2022 support (on-premises, remote, and cloud-based systems)
  • Implementation and management of enterprise backup solutions
  • PowerShell scripting and task automation
  • Participate in infrastructure management, remediation, and auditing processes that meet the PCI Data Security Standard
  • Ensure standard IT preventative maintenance/management functions are taking place in alignment with enterprise procedures and standards
  • Provide remote assistance as needed to personnel who are outside of the primary work location
  • Planning and coordination of changes in the context of change management
  • Creation and updating of system documentation
  • Develop, publish, and adhere to standards, policies, and procedures
What we offer
What we offer
  • Career advancement opportunities
  • Extensive training
  • Excellent benefits including paying for health and dental premiums for salaried employees
  • Fulltime
Read More
Arrow Right

Service Operations Specialist

To assure SITA's competitive strength and business growth through the provision ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
sita.aero Logo
SITA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 3 -5 years of proven experience in the network and/or application/system support domain, IT System Administrator and application support role, or in a similar infrastructure-focused role
  • Must have dealt directly with external customers delivering to SLAs
  • A background in hybrid IT environments (on-premises and cloud), with practical knowledge of virtualization platforms (e.g., VMware) and cloud services (e.g., AWS)
  • Strong hands-on experience in managing and troubleshooting servers, network infrastructure, enterprise applications, and client systems in complex IT environments
  • Experience in operation and maintenance of airport IT systems, networking and airline-specific applications is highly preferred
  • A background in Airport IATA standards, airline infrastructure/applications, SBD, E-Gates, and airport passenger/baggage (Pax/Bags) systems would be an added advantage
  • Proficiency in Windows and Linux server environments, including installation, configuration, and administration
  • Strong knowledge of networking concepts and protocols such as TCP/IP, DNS, DHCP, and VPN
  • Strong hardware knowledge such as server, router, switch etc.
  • Knowledge on web server such as Apache, Tomcat
Job Responsibility
Job Responsibility
  • Provide Service Operations support to internal and external customers in accordance with the terms of the customer contract and Service Level Agreements (SLAs)
  • Ensure the correct functioning and maintenance of all internal and external systems and products serviced by Service Operations
  • When required act as the customer SPOC and co-ordinate the scheduling of intervention with Customer's internal resolver groups and the Service Desk ensuring the highest level of customer services and communications are maintained to resolve the fault and incident within the prescribed SLA
  • Carry out incident and problem management support to the highest standards and co-ordinate the resolution with the appropriate resolver groups
  • Ensure shortest restoral times possible initiating the timely escalations to specialized resolver groups inside and outside SITA according to the customer contracts SLAs and monitoring requirements
  • To ensure the Service Operations team adheres to the highest working standards for all incidents and problems by providing guidance support and direct management
  • Proactively detect problems related to service and infrastructure operations and delivery services conduct diagnostics and provide service request ownership to ensure resolution of customer problems
  • Support the senior team members in the management reporting and co-ordination of day-day tasks during absence of the Lead Engineer
  • Adhere to installation guidelines and industry best practices in order to deliver quality service and infrastructure operations
  • Use the appropriate tools and equipment to perform the installation intervention and repairs in accordance with Service Operations and Delivery guidelines and instructions where provided
What we offer
What we offer
  • Flex Week: Work from home up to 2 days/week (depending on your team's needs)
  • Flex Day: Make your workday suit your life and plans
  • Flex-Location: Take up to 30 days a year to work from any location in the world
  • Employee Wellbeing: Employee Assistance Program (EAP), for you and your dependents 24/7, 365 days/year
  • Champion Health - a personalized platform that supports a range of wellbeing needs
  • Professional Development: Level up your skills with our training platforms, including LinkedIn Learning
  • Competitive Benefits: Competitive benefits that make sense with both your local market and employment status
  • Fulltime
Read More
Arrow Right

Executive Principal, Site Reliability Engineering (SRE) – DevOps

The Executive Principal of Infra Engineering is a senior leader responsible for ...
Location
Location
United States , Irvine
Salary
Salary:
180000.00 - 210000.00 USD / Year
haeaus.com Logo
Hyundai AutoEver America
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in IT/IS or equivalent experience
  • 10 years of infrastructure engineering experience
  • 8+ years of management experience required
  • High availability, fault tolerance, and incident management
  • Automation of infrastructure and operations
  • CI/CD pipeline design and maintenance
  • Monitoring, metrics, and performance tuning
  • Multi-platform expertise (Windows, Linux, VMware, cloud)
  • Security, audit, and identity/access management
  • Change control and risk management
Job Responsibility
Job Responsibility
  • Guide the Site Reliability Engineering (SRE) function, integrating DevOps principles to drive operational excellence, reliability, and innovation across infrastructure platforms
  • Lead multiple technical teams, including Platform Engineering, Data Center Management, Infrastructure Planning & Architecture and Network & Telecommunications, ensuring 24x7 support and continuous improvement within a complex, hybrid environment
  • Mentor and develop infrastructure managers and SMEs
  • Lead onshore/offshore teams and manage service providers
  • Oversee 24x7 operations, incident response, and problem management
  • Manage OpEx/CapEx, SLAs, KPIs, and OKRs
  • Ensure reliability, disaster recovery, and lifecycle management
  • Champion automation, CI/CD, and Infrastructure as Code
  • Direct monitoring, observability, and performance optimization
  • Align with security and compliance requirements
  • Fulltime
Read More
Arrow Right

Systems Engineer

The Systems Engineer is responsible for supporting technological infrastructure ...
Location
Location
United States , Tucker
Salary
Salary:
99360.00 - 147000.00 USD / Year
gasoc.com Logo
Georgia System Operations
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, Engineering, Information Systems, or a related field from an accredited college or university
  • Minimum of 4 years in designing and managing technology infrastructure environments, cloud-based solutions, conducting system assessments, and troubleshooting (for Systems Engineer III)
  • Minimum of 6 years in designing and managing technology infrastructure environments, cloud-based solutions, conducting system assessments, and troubleshooting (for Systems Engineer IV)
  • Extensive experience with Active Directory, Windows Server (2016, 2019, 2022) and Linux distributions (Ubuntu, CentOS, Red Hat)
  • Extensive experience with Microsoft Windows desktop OS and related desktop management systems including MS Intune, WSUS, etc.
  • Proficiency in cloud platforms (AWS, Azure, GCP) and associated services including Microsoft 365
  • Strong knowledge of virtualization technologies (VMware, Hyper-V, Nutanix, Proxmox, etc.)
  • Experience with Microsoft Exchange Online and On-premises
  • Experience with configuration management tools and patch management tools
  • Familiarity with containerization technologies (Docker, Kubernetes)
Job Responsibility
Job Responsibility
  • System Design and Implementation: Design, deploy, and manage Windows and Linux desktop and server environments and related storage infrastructure
  • Architect and implement cloud-based solutions (e.g., AWS, Azure, Google Cloud Platform, Microsoft 365) to support business needs
  • Ensure system scalability, reliability, and security
  • Infrastructure Management: Maintain and optimize physical and virtual servers and workstation infrastructure
  • Monitor system performance, conduct regular system audits, and troubleshoot issues
  • Implement automation and configuration management
  • Plan and perform system upgrades
  • Perform routine systems maintenance activities like backup/recovery, patching, file management
  • Security and Compliance: Ensure systems are secure and compliant with industry standards and regulations
  • Implement and manage system security measures and configurations
What we offer
What we offer
  • comprehensive medical, dental, and vision coverage
  • a strong retirement program
  • career development
  • flexible work schedules
  • Fulltime
Read More
Arrow Right