CrawlJobs Logo

Platform Monitoring Engineer

adyen.com Logo

Adyen

Location Icon

Location:
India , Bengaluru

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

A team within Global Platform Operations under the Monitoring Engineering pillar exhibits an unwavering attention to detail and a deep understanding of the platform wide monitoring implications to all merchants. In this role, you will be on-call monitoring platform performance, communicating with merchants, working on monitoring frameworks, providing feedback to product engineering teams to improve the reliability of the platform. You will initiate and lead initiatives across our platform offerings prioritizing merchant impact to proactively detect any issues and inform merchants quickly.

Job Responsibility:

  • Participate in 24/7 on-call monitoring
  • Observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
  • Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information to keep them informed
  • Working together with Operations, Product, Engineering, and reliability teams to integrate, grow, and continuously improve our monitoring strategy and increase our reliability
  • Improve operations by leading/project managing initiatives and/or tools—development of automation for effective monitoring
  • Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture
  • Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams, and contribute to the monitoring playbook by documenting learnings
  • Focus on ruthlessly prioritizing, automating, and scaling every aspect of our detection capabilities

Requirements:

  • At least 5 to 10 years of experience with incident client communication and platform monitoring operations
  • Willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
  • Experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc.
  • Experience with observability platforms like Datadog, Dynatrace, Splunk
  • Excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
  • Thrives in an environment where collaboration is crucial and where a global approach is key for successful implementation of processes and projects
  • Passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non-technical audiences
  • Natural ability for handling complex situations and multiple responsibilities simultaneously
  • Strong team player and thrive in a dynamic environment

Additional Information:

Job Posted:
April 23, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 29494 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Platform Monitoring Engineer

Senior Distributed Systems Engineer - Platform Engineering

For our Platform Engineering team, we are looking for programmers with strong in...
Location
Location
Poland
Salary
Salary:
Not provided
rtbhouse.com Logo
RTB House
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Excellent understanding of how complex IT systems work - from the hardware level, through software, to algorithms
  • Ability to proactively define requirements, ask appropriate questions and draw conclusions that will combine technical constraints and business needs
  • Ability to lead the design and implementation of a solution
  • Experience in leading project teams
  • Willingness to be involved in topics that go beyond programming and design, such as responsibility for technical areas or communication with other teams
  • Proactive attitude, independence in taking action
  • Extensive experience in programming and readiness to implement key system elements as well as involvement in code reviews
  • Good knowledge of methods of creating concurrent programs and distributed systems
  • Ability to critically analyze created solutions in terms of performance (from estimating the theoretical performance of designed systems to detecting and removing actual performance problems in production)
  • C1 level in English and Polish
Job Responsibility
Job Responsibility
  • Plan and then hands-on lead further development within a given technical area like deployment, monitoring, databases or load balancing, in the context of existing infrastructure within RTB House
  • Coordinate the work of a project team of 3-4 people, also making arrangements with other teams and units within RTB House
  • Ensure the reliability and scalability of the solutions built
What we offer
What we offer
  • Attractive compensation
  • Work in a team of enthusiasts who are willing to share their knowledge and experience
  • Flexible cooperation conditions - we do not have core hours, we do not have holiday limits
  • Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project
Read More
Arrow Right

Senior AWS Data Engineer / Data Platform Engineer

We are seeking a highly experienced Senior AWS Data Engineer to design, build, a...
Location
Location
United Arab Emirates , Dubai
Salary
Salary:
Not provided
northbaysolutions.com Logo
NorthBay
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in data engineering and data platform development
  • Strong hands-on experience with: AWS Glue
  • Amazon EMR (Spark)
  • AWS Lambda
  • Apache Airflow (MWAA)
  • Amazon EC2
  • Amazon CloudWatch
  • Amazon Redshift
  • Amazon DynamoDB
  • AWS DataZone
Job Responsibility
Job Responsibility
  • Design, develop, and optimize scalable data pipelines using AWS native services
  • Lead the implementation of batch and near-real-time data processing solutions
  • Architect and manage data ingestion, transformation, and storage layers
  • Build and maintain ETL/ELT workflows using AWS Glue and Apache Spark on EMR
  • Orchestrate complex data workflows using Apache Airflow (MWAA)
  • Develop and manage serverless data processing using AWS Lambda
  • Design and optimize data warehouses using Amazon Redshift
  • Implement and manage NoSQL data models using Amazon DynamoDB
  • Utilize AWS DataZone for data governance, cataloging, and access management
  • Monitor, log, and troubleshoot data pipelines using Amazon CloudWatch
  • Fulltime
Read More
Arrow Right

Senior Distributed Systems Engineer - Ad Display Platform Engineering

The Bidding Platform organization is the core of the RTB business, processing ov...
Location
Location
Poland
Salary
Salary:
Not provided
rtbhouse.com Logo
RTB House
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of hands-on experience in software engineering
  • Proficiency in programming
  • Excellent understanding of how complex IT systems work (from the hardware level, through software, to algorithmics)
  • Very good knowledge of fundamental Internet protocols and technologies (DNS, HTTP, cookies and others)
  • Good knowledge of basic methods of creating concurrent programs and distributed systems (from thread level to geo-distributed clusters level)
  • Practical ability to observe, monitor and analyse the operation of production systems (and draw valuable conclusions from it)
  • The ability to critically analyze the solutions created in terms of performance (from estimating the theoretical performance of the designed systems to detecting and removing actual performance problems in production)
  • General knowledge of issues (typical problems and methods of solving them) in the areas of 'high scalability' and 'high availability'
  • C1 level in English and Polish
Job Responsibility
Job Responsibility
  • Implement and maintain (in all aspects, including setting up environment, writing configuration code, monitor production) high-quality backend services for displaying Ads globally, focusing on extreme performance and scalability
  • Develop tools (deployment, testing platforms, web performance and reliability monitoring), and critical optimizations to drive measurable improvements in critical user performance metrics for ad rendering and display
  • Write, test, and deploy robust, efficient, and well-documented code in Java/Python, ensuring adherence to the highest coding and performance standards
  • Participate in code reviews, knowledge sharing sessions, and help implement technical standards and best practices within the team
What we offer
What we offer
  • Projects focused on extreme performance and high code quality – solid code reviews are our standard
  • Collaboration within an interdisciplinary, self-sufficient team (including DevOps, database experts, backend developers, product designers, and QA engineers)
  • Hardware and software tailored to your preferences (e.g., MacBook, AI tool licenses)
  • Flexible working conditions – no core hours, fully remote cooperation possible
Read More
Arrow Right

Platform Engineer

Motorica is at a breakthrough moment. We’ve built a generative AI animation plat...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
motorica.ai Logo
Motorica
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in Platform Engineering, SRE, or DevOps, ideally in high-growth or AI/ML-heavy environments
  • Strong grasp of CI/CD systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes)
  • Familiarity with observability, monitoring, and incident response best practices
  • Security mindset with hands-on experience in audits, compliance (ISO 27001, SOC2, etc.), and vulnerability management
  • Strong communication skills
  • you’ll be interfacing with developers daily and need to translate infrastructure into clarity, not complexity
  • A proactive, solution-oriented mindset: you anticipate friction before others feel it
Job Responsibility
Job Responsibility
  • Provide common infrastructure guidance, reusable patterns, and automated tooling to engineering teams
  • Own the “paved road” for developers, reducing friction and cognitive load
  • Champion and implement security best practices across the entire platform
  • Play a key role in achieving ISO 27001 certification through technical implementation and evidence gathering
  • Build and operate a highly reliable and cost-efficient platform, with particular focus on optimizing GPU-heavy AI/ML workloads
  • Manage CI/CD systems (GitHub Actions, GitLab CI) and track key metrics like build times, deployment frequency, and failure rates
  • Oversee cloud environments (AWS, GCP), including health, security, and cost reporting
  • Lead security scans, audits, and vulnerability remediation
  • Maintain observability stack (Prometheus, Grafana, Datadog, GCP Logging), ensuring meaningful dashboards and alerts
  • Act as point-of-contact for ML Research team’s infra requests (GPU access, specialized pipelines)
What we offer
What we offer
  • Stock Options program
  • Retirement Plan
  • Health Benefits (5000 SEK/year)
  • Life Insurance / Health Insurance / Injury Insurance
  • Competitive compensation
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

The Client Environments team is the bridge between SpotOn’s cloud and the physic...
Location
Location
United States , Detroit
Salary
Salary:
170000.00 - 210000.00 USD / Year
mytennislessons.com Logo
MyTennisLessons
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Lead and mentor engineers across Network and Android (Elo) systems
  • Drive GitOps adoption for network and device configuration
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds)
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays)
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns
  • Own service quality metrics — uptime, response time, issue recurrence
Job Responsibility
Job Responsibility
  • Lead and mentor engineers across Network and Android (Elo) systems — building a strong culture of ownership and reliability
  • Drive GitOps adoption for network and device configuration, ensuring deployments are consistent, testable, and reversible
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds), ensuring clean provisioning and policy enforcement
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination through system changes, automation, and better visibility
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays) to eliminate variance across restaurant networks
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns through controllers
  • Own service quality metrics — uptime, response time, issue recurrence — and report progress on reliability improvements
What we offer
What we offer
  • Medical, Dental and Vision Insurance
  • 401k with company match
  • RSUs
  • Paid vacation, 10 company holidays, sick time, and volunteer time off
  • Employee Resource Groups to build community and inclusion at work
  • Monthly cell phone and internet stipend
  • Tuition reimbursement for up to $2,000 per calendar year to assist with your professional development
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

The Client Environments team is the bridge between SpotOn’s cloud and the physic...
Location
Location
United States , Austin
Salary
Salary:
170000.00 - 210000.00 USD / Year
mytennislessons.com Logo
MyTennisLessons
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Lead and mentor engineers across Network and Android (Elo) systems
  • Drive GitOps adoption for network and device configuration
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds)
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays)
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns
  • Own service quality metrics — uptime, response time, issue recurrence
Job Responsibility
Job Responsibility
  • Lead and mentor engineers across Network and Android (Elo) systems — building a strong culture of ownership and reliability
  • Drive GitOps adoption for network and device configuration, ensuring deployments are consistent, testable, and reversible
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds), ensuring clean provisioning and policy enforcement
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination through system changes, automation, and better visibility
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays) to eliminate variance across restaurant networks
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns through controllers
  • Own service quality metrics — uptime, response time, issue recurrence — and report progress on reliability improvements
What we offer
What we offer
  • Medical, Dental and Vision Insurance
  • 401k with company match
  • RSUs
  • Paid vacation, 10 company holidays, sick time, and volunteer time off
  • Employee Resource Groups to build community and inclusion at work
  • Monthly cell phone and internet stipend
  • Tuition reimbursement for up to $2,000 per calendar year to assist with your professional development
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer

Glide is looking for a Senior Platform Engineer to join our Infrastructure team ...
Location
Location
Salary
Salary:
Not provided
glideapps.com Logo
Glide
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as a platform engineer/SRE
  • 3+ years experience building and maintaining highly available and scalable distributed data sources
  • Experience with Google Cloud Platform services like Cloud SQL, Cloud Run, AlloyDB, or equivalent
  • Experience orchestrating complex systems with Kubernetes
  • Proficiency in TypeScript development
  • Strong SQL skills
  • can speak to covering index optimization strategies
  • Experience designing, building and running data-intensive event-driven architectures
  • You are a clear and effective communicator, be it when you write code, write emails, or explain complex technical issues to non-technical co-workers
  • Passionate and self-motivated, with a demonstrated ability to work in a fast-paced and evolving environment
Job Responsibility
Job Responsibility
  • Managing our existing infrastructure in GCP
  • Driving our platform evolution as the complexity and sophistication of our product only increases
  • Managing our Github/GH Actions based build pipeline
  • Provide build, test, and runtime infrastructure to service teams
  • Ensure patterns are established (e.g., for database throttling, request rate limiting, etc…) to protect Glide’s uptime
  • Monitor infrastructure costs and coordinate improvements when necessary
  • Drive SRE tooling and best practices around observability and alerting
  • Write, review, and maintain code primarily in TypeScript
  • Write architecture briefs and proposals, carry out code experiments, and build prototypes to learn how we can achieve reliable scale with our systems
  • Provide technical leadership, mentorship, pairing opportunities, and code review to encourage the growth of others
What we offer
What we offer
  • competitive salary and benefits package
  • a supportive and dynamic remote work environment
  • opportunities for career growth
  • Fulltime
Read More
Arrow Right

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...
Location
Location
United States , Mountain View, California
Salary
Salary:
180000.00 - 280000.00 USD / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 years of experience in software engineering
  • 5 years of experience with infrastructure-as-code
  • Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
  • Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week)
Job Responsibility
Job Responsibility
  • Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
  • Identify and implement opportunities to enhance engineering speed and efficiency
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
  • Develop and share best practices to improve automation and efficiency across our engineering teams
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right