This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Senior Incident Operations & Optimization Specialist for Mainframe & Batch is a specialized technical leadership role requiring deep expertise in mainframe operations, batch job scheduling, and enterprise-scale processing environments. This position is critical to the success of the Incident Reduction Program, providing delivery of solutions which optimize and automate operations workflows. You will be responsible for building automated incident remediation workflows and achieving measurable incident reduction through intelligent alert optimization, correlation, and automation while preserving the critical observability required for business-critical mainframe applications and batch processing. This role offers the unique opportunity to modernize event management for legacy systems using cutting-edge AIOps platforms and automation technologies.
Job Responsibility:
Conduct in-depth analysis of mainframe and batch processing alerts to identify chronic issues, reduce operational noise, and develop strategies to address high-volume incident generators, including recurring job failures
Design and implement domain-specific correlation, de-duplication, and suppression rules on AIOps and event management platforms
Develop logic that understands mainframe subsystem relationships and cascading batch job dependencies
Architect and develop automation playbooks for incident data enrichment, automated job restarts, and self-healing capabilities for common mainframe and batch processing failures
Assess monitoring gaps in mainframe and batch environments, proposing enhancements to ensure critical business processes have appropriate alerting coverage and align with enterprise standards
Partner closely with mainframe operations, batch scheduling, and application development teams to validate correlation logic, define automation initiatives, and provide expert guidance on modern event management practices
Continuously validate the effectiveness of implemented rules and automation
Establish feedback loops with operational teams to conduct post-implementation reviews and iterative improvements
Requirements:
Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field
A minimum of 8+ years of hands-on experience in mainframe operations, batch processing, or enterprise workload automation
Proven track record in event management, alert tuning, and incident reduction within complex mainframe and batch environments, with quantifiable results
Direct, hands-on experience with modern AIOps and event management platforms is required
Deep understanding of mainframe architecture, operating systems, and subsystems
Expertise in enterprise workload automation, including job design, scheduling, and dependency management
Hands-on experience developing robust automation solutions using relevant scripting languages and modern automation frameworks
Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms
Excellent analytical abilities with a systematic approach to troubleshooting complex batch dependencies and failure propagation scenarios
Exceptional communication skills with the ability to bridge mainframe/legacy and modern technology teams, influence collaboration, and present technical concepts to diverse audiences
Nice to have:
An advanced degree in a relevant technical field
Relevant industry certifications (e.g., Mainframe, Workload Automation, Automation, ITIL)
Experience with mainframe modernization initiatives, DevOps, and CI/CD pipelines
Familiarity with specialized financial systems
Background in large-scale financial services or other regulated environments, including knowledge of disaster recovery and high-availability patterns