This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Senior Incident Operations & Optimization Specialist for Data & Middleware is a specialized technical leadership role requiring deep expertise in database technologies, messaging platforms, and application middleware. This position is essential to the Incident Reduction Program, as database and middleware systems generate significant operational incidents while serving as critical infrastructure for enterprise applications. You will be responsible for building automated incident remediation workflows and achieving measurable incident reduction through intelligent correlation, threshold optimization, and automation while ensuring the health and performance of business-critical data and middleware platforms remain visible and protected. This role offers the opportunity to modernize observability and event management for the data layer and integration tier of enterprise architecture.
Job Responsibility:
Analyze and optimize monitoring across all database and middleware platforms to address high-volume, low-value alerts, identify patterns in incident generation, and determine root causes
Develop and implement domain-specific correlation, de-duplication, and suppression rules on AIOps and event management platforms
Create logic that understands database cluster relationships, messaging dependencies, and application-to-database connections
Architect and develop automation playbooks for incident data enrichment and automated remediation of common database and middleware issues, such as connection pool resets or service restarts
Identify monitoring gaps across the data and middleware landscape, proposing enhancements to ensure comprehensive health monitoring and address blind spots in transactional flows
Partner closely with Database Administration (DBA), middleware engineering, and application teams to validate correlation logic, build consensus on threshold changes, and provide expert guidance on event management best practices
Continuously validate the effectiveness of implemented rules and automation, ensuring critical health indicators remain highly visible
Lead post-implementation reviews and drive iterative improvements
Requirements:
A minimum of 8+ years of hands-on experience in database administration, middleware engineering, or enterprise data platform operations
Proven experience in event management, alert tuning, and incident reduction for data and middleware services, with measurable results
Direct, hands-on experience with modern AIOps and event management platforms is required
Deep knowledge of both relational (e.g., Oracle, SQL Server) and NoSQL (e.g., MongoDB) database technologies, including clustering, replication, and performance tuning
Expertise in middleware platforms, including messaging technologies (e.g., MQ, Kafka) and application servers (e.g., WebSphere, Tomcat)
Hands-on experience developing robust automation solutions using relevant scripting languages (e.g., Python, Shell) and modern automation frameworks
Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms
Excellent analytical abilities with a systematic approach to troubleshooting complex data platform architectures and correlating infrastructure issues with application impact
Exceptional communication skills with the ability to collaborate effectively with DBAs, middleware engineers, and application teams, and to present technical concepts to diverse audiences
Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field
Nice to have:
An advanced degree (Master's) in a relevant technical field
Relevant industry certifications (e.g., Database, Middleware, Cloud, Automation, ITIL)
Experience with Database as a Service (DBaaS) platforms and other database technologies
Knowledge of data governance, security, and compliance requirements in a regulated environment
Background in large-scale financial services environments
Experience with modern observability platforms, distributed tracing, and infrastructure-as-code (IaC) principles
What we offer:
medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
discretionary and formulaic incentive and retention awards