CrawlJobs Logo

Monitoring Engineer / Incident Manager

Netherlands, Amsterdam Employment contract · Job Posted May 29, 2026
Apply Position
Job Link Share

Job Description

A team within Engineering under the Platform Excellence pillar exhibits an unwavering attention to detail and a deep understanding of the platform wide monitoring implications to all merchants. In this role, you will be on-call monitoring platform performance, coordinating and commanding incidents, communicating with our customers, working on monitoring frameworks, providing feedback to product engineering teams to improve the reliability of the platform. You will initiate and lead initiatives across our platform offerings prioritizing merchant impact to proactively detect any issues, inform merchants quickly, and increase the reliability of our platform.

Job Responsibility

  • Participate in 24/7 on-call monitoring and observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
  • Coordinate the mitigation, recovery, and resolution of high-impact incidents, ensuring a rapid and effective response across teams
  • Represent the customer perspective during incidents, maintaining a strong customer-centric approach
  • Communicate with merchants real time during an incident and present the most accurate and updated information to keep them informed
  • Escalate critical incidents when needed and provide structured communication to senior management
  • Go beyond reactive incident response by analyzing incident trends to identify recurring issues and systemic weaknesses and partner with engineering and product teams to advocate for long-term fixes
  • Work together with Operations, Product, and Engineering teams to integrate, grow, and continuously improve monitoring strategy and increase reliability
  • Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture
  • Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams and contribute to the monitoring playbook by documenting learnings
  • Improve operations by leading/project managing initiatives and tools development of automation for effective monitoring
  • Focus on prioritizing, automating, and scaling every aspect of detection capabilities

Requirements

  • At least 5 years of experience with incident management, problem management, incident client communication, and platform monitoring operations
  • Experience with problem management practices - identifying trends across incidents, conducting root cause investigations and driving preventative action
  • Solid communication skills and the ability to develop strong working relationships throughout the organization, able to translate technical situations clearly and concisely to a diverse audience via data-visualizing dashboards and written documents
  • Willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
  • Experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc.
  • Experience with observability platforms like Datadog, Dynatrace, Splunk
  • Excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
  • Thrive in an environment where collaboration is crucial and where a global approach is key for successful implementation of processes and projects
  • Passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non technical audiences
  • Natural ability for handling complex situations and multiple responsibilities simultaneously
  • Strong team player and thrive in a dynamic environment
  • Work schedule: The shifts are from 9.00AM - 6.00PM with a 6-day workweek at least twice a month (Sunday–Friday or Monday–Saturday)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Monitoring Engineer / Incident Manager

8 matching positions

Monitoring / Release & Incident Management Support Engineer

Location
Location
Philippines , Manila
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Comfortable with both Linux and Windows administration
  • Working in agile teams, build, test and maintain aspects of CICD Pipeline
  • Manage UI visual of license consumption & performance
  • Evangelize with Engineering, Security, and cross functions on Ops Best Practices
  • Firmware release - OTA (over the air)
  • Launch new the mobile app / release new version of the existing mobile app - Appstore / Playstore
Job Responsibility
Job Responsibility
  • Release Management of new software via Tools
  • Understand release management SOP = QA -> Load Test -> Stage Environment -> PROD
  • Create/Manage monitoring and alerting systems and as needed to meet SLA’s
  • Fulltime
Read More
Arrow Right

Monitoring / Release & Incident Management Support Engineer

The Monitoring / Release & Incident Management Support Engineer will oversee sof...
Location
Location
Philippines , Manila
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in Linux and Windows administration
  • Experience in agile methodologies
  • Experience with CICD pipelines
  • Strong background in release management
  • Strong background in incident response
Job Responsibility
Job Responsibility
  • Release Management of new software via Tools
  • Understand release management SOP = QA -> Load Test -> Stage Environment -> PROD
  • Create/Manage monitoring and alerting systems and as needed to meet SLA’s
  • Working in agile teams, build, test and maintain aspects of CICD Pipeline
  • Manage UI visual of license consumption & performance
  • Evangelize with Engineering, Security, and cross functions on Ops Best Practices
  • Firmware release - OTA (over the air)
  • Launch new the mobile app / release new version of the existing mobile app - Appstore / Playstore
  • Fulltime
Read More
Arrow Right
New

Senior Site Reliability Engineer Manager

RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on b...
Location
Location
United Kingdom of Great Britain and Northern Ireland , London
Salary
Salary:
Not provided
Remotestar
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services.
  • Expertise in incident management, including incident response, resolution, and post-mortem analysis.
  • Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog.
  • Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation.
  • Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go.
  • Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment.
  • Demonstrated leadership capabilities, with a passion for mentoring and developing team members.
Job Responsibility
Job Responsibility
  • Take full ownership of the production estate from both a technical and process perspective.
  • Provide a consistent smooth operation of live systems and drive all on-call support issues.
  • Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team.
  • Create and maintain high end monitoring and automation tooling.
  • Drive automation initiatives to streamline operational workflows and improve efficiency.
  • Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability.
  • Build a first class SRE team.
  • Through a combination of leading by example, coaching and mentoring, mould the team would want to have around you.
  • Provide leadership and guidance to the SRE team, fostering a culture of collaboration, innovation, and continuous improvement.
What we offer
What we offer
  • Dynamic working environment in an extremely fast-growing company
  • Work in an international environment
  • Work in a pleasant environment with very little hierarchy
  • Intellectually challenging, play a massive role in client’s success and scalability
  • Flexible working hours
  • Fulltime
Read More
Arrow Right

Incident Manager - Technical Customer Operations

We're growing our Customer Operations team and looking for an Incident Manager f...
Location
Location
France , Paris
Salary
Salary:
Not provided
efficy.com Logo
efficy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of experience in technical customer support or incident management in a B2B SaaS or enterprise software environment
  • Customer-facing mindset: you're comfortable communicating with clients under pressure and know how to keep them confident
  • Strong coordinator, able to align multiple internal teams quickly and clearly
  • Rigorous and closure-oriented: open issues get resolved, not left open
  • Solid technical understanding, able to engage meaningfully with R&D and Cloud teams without being an engineer
  • Quick to get up to speed on product behaviour and business logic
  • Native level in French
  • Excellent command of English, written and spoken
Job Responsibility
Job Responsibility
  • Own production incidents from qualification to closure, coordinating all involved teams
  • Be the main point of contact for external clients during active incidents, keeping them informed at every step
  • Deliver structured post-incident reports and follow-up communications to external clients
  • Ensure every incident has a visible owner and clear progress at all times
  • Participate in steering committees and crisis meetings as needed
  • Track incident KPIs including MTTR, SLA compliance, and escalation rates
  • Monitor ticket progress across teams and escalate blockers when needed
What we offer
What we offer
  • Direct impact on customer satisfaction and service quality
  • High-visibility role connecting Support, R&D, and Cloud teams
  • Career growth opportunities and internal mobility
  • Modern offices in 11 European locations
  • Fun team events & continuous learning
  • Competitive salary with bonus system
  • Hybrid working policy
Read More
Arrow Right

Incident Engineer

A team within Global Platform Operations under the Monitoring Engineering pillar...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You have at least 5 to 10 years of experience with incident client communication and platform monitoring operations
  • You're willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
  • You have experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc
  • You have experience with observability platforms like Datadog, Dynatrace, Splunk
  • You have excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
  • You thrive in an environment where collaboration is crucial and where a global approach is key for are you successful implementation of processes and projects
  • You have a passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non technical audiences
  • You have a natural ability for handling complex situations and multiple responsibilities simultaneously
  • You're a strong team player and thrive in a dynamic environment
Job Responsibility
Job Responsibility
  • Participate in 24/7 on-call monitoring
  • Observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
  • Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information to keep them informed
  • Working together with Operations, Product, Engineering, and reliability teams to integrate, grow, and continuously improve our monitoring strategy and increase our reliability
  • Improve operations by leading/project managing initiatives and, or tools—development of automation for effective monitoring
  • Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture
  • Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams, and contribute to the monitoring playbook by documenting your learnings
  • Focus on ruthlessly prioritizing, automating, and scaling every aspect of our detection capabilities
  • Fulltime
Read More
Arrow Right

Monitoring Engineer

The Monitoring Engineer role at NTT DATA involves ensuring the operational integ...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Entry-level experience with troubleshooting and support in security, network, data centre, systems, or storage within a medium to large ICT organization
  • Basic knowledge of management agents, redundancy concepts, and ITIL processes
  • Highly disciplined in handling of tickets on day-to-day basis
  • Good understanding of using ITSM tools
  • Skill in planning activities and projects in advance and adapting to changing circumstances
  • A client-centric approach
  • Ability to communicate and work across different cultures and social groups
  • Proficiency in active listening techniques
  • A positive outlook at work, even in pressurized environments
  • Willingness to work hard and put in longer hours when necessary
Job Responsibility
Job Responsibility
  • Monitor client infrastructure and solutions, identifying problems and errors before or as they occur
  • Investigate first-line incidents assigned, pinpointing the root causes
  • Provide telephonic, ITSM ticket or chat support to clients
  • Perform maintenance activities, such as patching and configuration changes
  • Work across two or more technology domains (e.g., Cloud, Security, Networking, Applications, Collaboration)
  • Update existing knowledge articles or create new ones
  • Seek opportunities for work optimization
  • Support project work as needed
  • Contribute to disaster recovery functions and tests
  • Ensure careful handovers during shift changes
  • Fulltime
Read More
Arrow Right

Platform Monitoring Engineer

A team within Global Platform Operations under the Monitoring Engineering pillar...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 to 10 years of experience with incident client communication and platform monitoring operations
  • Willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
  • Experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc.
  • Experience with observability platforms like Datadog, Dynatrace, Splunk
  • Excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
  • Thrives in an environment where collaboration is crucial and where a global approach is key for successful implementation of processes and projects
  • Passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non-technical audiences
  • Natural ability for handling complex situations and multiple responsibilities simultaneously
  • Strong team player and thrive in a dynamic environment
Job Responsibility
Job Responsibility
  • Participate in 24/7 on-call monitoring
  • Observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
  • Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information to keep them informed
  • Working together with Operations, Product, Engineering, and reliability teams to integrate, grow, and continuously improve our monitoring strategy and increase our reliability
  • Improve operations by leading/project managing initiatives and/or tools—development of automation for effective monitoring
  • Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture
  • Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams, and contribute to the monitoring playbook by documenting learnings
  • Focus on ruthlessly prioritizing, automating, and scaling every aspect of our detection capabilities
  • Fulltime
Read More
Arrow Right

Incident Manager - Public Sector

We are looking for an Incident Manager to spearhead incident management operatio...
Location
Location
United States
Salary
Salary:
142500.00 - 197000.00 USD / Year
wiz.io Logo
Wiz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience leading crisis management and incident response programs in FedRAMP High, IL5, or NIST 800-53 environments
  • Direct experience in managing and leading major incidents
  • Direct experience working cloud environments, AWS required (other clouds a plus)
  • Experience working with cloud native technologies like containers and container orchestration platforms like Kubernetes
  • Ability to interpret metrics and logs in observability and security event management tools such as Grafana, Prometheus, DataDog, Splunk, etc.
  • Experience with incident management platforms such as PagerDuty, ServiceNow, or Jira, including experience building automated notification trees and dashboards
  • Strategic thinking and a risk focused mindset on reliability improvements
  • Ability to identify systemic gaps that feed back into program design and operations teams
  • Strong writing and documentation skills to effectively communicate with both technical and business audiences
  • Ability to maintain composure and exercise sound judgement while navigating high-stake decision making during complex and ambiguous incidents
Job Responsibility
Job Responsibility
  • Serve as the lead incident coordinator for high-severity events, activating playbooks, declaring incident severity, and coordinating with functional leads to drive a structured response
  • Define, operationalize, and document the end-to-end incident response lifecycle that aligns to FedRAMP High, IL5, and NIST 800-53 requirements
  • Drive readiness activities by designing and facilitating cross-functional tabletop exercises, hands-on simulations exercises, incident response team training, and review of playbooks to validate response protocols
  • Facilitate Root Cause Analysis by leading post-incident reviews using structured methodologies and documentation to separate root causes from contributing factors and drive business-wide corrective actions to closure
  • Serve as the primary liaison between technical and business units by translating incident details into business impact assessments that drive informed decision-making for legal, compliance, and operational teams
  • Bridge technical and operational responses by building communication paths between engineering, operations, legal, compliance, and customer facing teams to translate complex incidents into actionable updates for leadership
  • Establish centralized reporting, dashboards, and KPIs to monitor response efficiency, trend analysis, and program maturity
  • Manage and optimize incident response tools like ServiceNow, PagerDuty, and Jira to ensure
What we offer
What we offer
  • Medical, dental and vision insurance
  • Home Office Setup reimbursement
  • Flexible Spending Accounts
  • Monthly Connectivity reimbursement
  • Employee Assistance Program (EAP)
  • Short- and Long-term Disability Insurance
  • Life & Accident Insurance
  • 401(k) Retirement Savings Plan (with employer match)
  • Flexible paid time off + 11 paid holidays
  • Paid leave programs, including parental, pregnancy health, medical and bereavement leave
  • Fulltime
Read More
Arrow Right