Monitoring Engineer / Incident Manager Job at Adyen (Amsterdam)

Monitoring / Release & Incident Management Support Engineer

Location

Philippines , Manila

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Comfortable with both Linux and Windows administration
Working in agile teams, build, test and maintain aspects of CICD Pipeline
Manage UI visual of license consumption & performance
Evangelize with Engineering, Security, and cross functions on Ops Best Practices
Firmware release - OTA (over the air)
Launch new the mobile app / release new version of the existing mobile app - Appstore / Playstore

Job Responsibility

Release Management of new software via Tools
Understand release management SOP = QA -> Load Test -> Stage Environment -> PROD
Create/Manage monitoring and alerting systems and as needed to meet SLA’s

Fulltime

Monitoring / Release & Incident Management Support Engineer

The Monitoring / Release & Incident Management Support Engineer will oversee sof...

Location

Philippines , Manila

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Proficiency in Linux and Windows administration
Experience in agile methodologies
Experience with CICD pipelines
Strong background in release management
Strong background in incident response

Job Responsibility

Release Management of new software via Tools
Understand release management SOP = QA -> Load Test -> Stage Environment -> PROD
Create/Manage monitoring and alerting systems and as needed to meet SLA’s
Working in agile teams, build, test and maintain aspects of CICD Pipeline
Manage UI visual of license consumption & performance
Evangelize with Engineering, Security, and cross functions on Ops Best Practices
Firmware release - OTA (over the air)
Launch new the mobile app / release new version of the existing mobile app - Appstore / Playstore

Fulltime

New

Senior Site Reliability Engineer Manager

RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on b...

Location

United Kingdom of Great Britain and Northern Ireland , London

Salary:

Not provided

Remotestar

Expiration Date

Until further notice

Requirements

Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services.
Expertise in incident management, including incident response, resolution, and post-mortem analysis.
Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog.
Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation.
Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go.
Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment.
Demonstrated leadership capabilities, with a passion for mentoring and developing team members.

Job Responsibility

Take full ownership of the production estate from both a technical and process perspective.
Provide a consistent smooth operation of live systems and drive all on-call support issues.
Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team.
Create and maintain high end monitoring and automation tooling.
Drive automation initiatives to streamline operational workflows and improve efficiency.
Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability.
Build a first class SRE team.
Through a combination of leading by example, coaching and mentoring, mould the team would want to have around you.
Provide leadership and guidance to the SRE team, fostering a culture of collaboration, innovation, and continuous improvement.

What we offer

Dynamic working environment in an extremely fast-growing company
Work in an international environment
Work in a pleasant environment with very little hierarchy
Intellectually challenging, play a massive role in client’s success and scalability
Flexible working hours

Fulltime

Incident Manager - Technical Customer Operations

We're growing our Customer Operations team and looking for an Incident Manager f...

Location

France , Paris

Salary:

Not provided

efficy

Expiration Date

Until further notice

Requirements

At least 5 years of experience in technical customer support or incident management in a B2B SaaS or enterprise software environment
Customer-facing mindset: you're comfortable communicating with clients under pressure and know how to keep them confident
Strong coordinator, able to align multiple internal teams quickly and clearly
Rigorous and closure-oriented: open issues get resolved, not left open
Solid technical understanding, able to engage meaningfully with R&D and Cloud teams without being an engineer
Quick to get up to speed on product behaviour and business logic
Native level in French
Excellent command of English, written and spoken

Job Responsibility

Own production incidents from qualification to closure, coordinating all involved teams
Be the main point of contact for external clients during active incidents, keeping them informed at every step
Deliver structured post-incident reports and follow-up communications to external clients
Ensure every incident has a visible owner and clear progress at all times
Participate in steering committees and crisis meetings as needed
Track incident KPIs including MTTR, SLA compliance, and escalation rates
Monitor ticket progress across teams and escalate blockers when needed

What we offer

Direct impact on customer satisfaction and service quality
High-visibility role connecting Support, R&D, and Cloud teams
Career growth opportunities and internal mobility
Modern offices in 11 European locations
Fun team events & continuous learning
Competitive salary with bonus system
Hybrid working policy

Incident Engineer

A team within Global Platform Operations under the Monitoring Engineering pillar...

Location

India , Bengaluru

Salary:

Not provided

Adyen

Expiration Date

Until further notice

Requirements

You have at least 5 to 10 years of experience with incident client communication and platform monitoring operations
You're willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
You have experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc
You have experience with observability platforms like Datadog, Dynatrace, Splunk
You have excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
You thrive in an environment where collaboration is crucial and where a global approach is key for are you successful implementation of processes and projects
You have a passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non technical audiences
You have a natural ability for handling complex situations and multiple responsibilities simultaneously
You're a strong team player and thrive in a dynamic environment

Job Responsibility

Participate in 24/7 on-call monitoring
Observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information to keep them informed
Working together with Operations, Product, Engineering, and reliability teams to integrate, grow, and continuously improve our monitoring strategy and increase our reliability
Improve operations by leading/project managing initiatives and, or tools—development of automation for effective monitoring
Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture
Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams, and contribute to the monitoring playbook by documenting your learnings
Focus on ruthlessly prioritizing, automating, and scaling every aspect of our detection capabilities

Fulltime