This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
A team within Global Platform Operations under the Monitoring Engineering pillar exhibits an unwavering attention to detail and a deep understanding of the platform wide monitoring implications to all merchants. In this role, you will be on-call monitoring platform performance, communicating with merchants, working on monitoring frameworks, providing feedback to product engineering teams to improve the reliability of the platform. You will initiate and lead initiatives across our platform offerings prioritizing merchant impact to proactively detect any issues and inform merchants quickly.
Job Responsibility:
Participate in 24/7 on-call monitoring
Observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information to keep them informed
Working together with Operations, Product, Engineering, and reliability teams to integrate, grow, and continuously improve our monitoring strategy and increase our reliability
Improve operations by leading/project managing initiatives and, or tools—development of automation for effective monitoring
Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture
Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams, and contribute to the monitoring playbook by documenting your learnings
Focus on ruthlessly prioritizing, automating, and scaling every aspect of our detection capabilities
Requirements:
You have at least 5 to 10 years of experience with incident client communication and platform monitoring operations
You're willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
You have experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc
You have experience with observability platforms like Datadog, Dynatrace, Splunk
You have excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
You thrive in an environment where collaboration is crucial and where a global approach is key for are you successful implementation of processes and projects
You have a passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non technical audiences
You have a natural ability for handling complex situations and multiple responsibilities simultaneously
You're a strong team player and thrive in a dynamic environment