Platform Monitoring Engineer Job at Adyen (Bengaluru)

New

Platform Monitoring & Incident Engineer

A team within Engineering under the Platform Excellence pillar exhibits an unwav...

Location

United States , San Francisco

Salary:

100000.00 - 154500.00 USD / Year

Adyen

Expiration Date

Until further notice

Requirements

At least 5 years of experience with incident management, problem management, incident client communication, and platform monitoring operations
Experience with problem management practices - identifying trends across incidents, conducting root cause investigations and driving preventative action
Solid communication skills and the ability to develop strong working relationships throughout the organization, able to translate technical situations clearly and concisely to a diverse audience via data-visualizing dashboards and written documents
Willing to participate in the on-call rotation and work in a fast-paced, dynamic environment
Experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, etc.
Experience with observability platforms like Datadog, Dynatrace, Splunk
Excellent analytical and problem-solving skills, with the ability to analyze complex systems and spot the root cause of issues
Thrives in an environment where collaboration is crucial and where a global approach is key for successful implementation of processes and projects
Passion for defining and standardizing processes to drive strategic improvement and able to translate complex technical concepts with ease for all non technical audiences
Natural ability for handling complex situations and multiple responsibilities simultaneously

Job Responsibility

On-call monitoring platform performance
Coordinating and commanding incidents
Communicating with our customers
Working on monitoring frameworks
Providing feedback to product engineering teams to improve the reliability of the platform
Initiating and leading initiatives across our platform offerings prioritizing merchant impact to proactively detect any issues, inform merchants quickly, and increase the reliability of our platform
Incident Management: Coordinate the mitigation, recovery, and resolution of high-impact incidents
Represent the customer perspective during incidents
Communication: Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information
Escalate critical incidents when needed and provide structured communication to senior management

Fulltime

Sr. Platform Engineer - Design (Platform Engineering Team)

Our Client’s Platform engineering team is looking for a Senior Platform Engineer...

Location

United States , Greenwich

Salary:

175000.00 - 220000.00 USD / Year

Solomon Page

Expiration Date

Until further notice

Requirements

Programming and Scripting
Proficiency in languages such as Ansible, Python, or Ruby
Scripting skills in Bash, Perl, or similar languages
Experience with continuous integration/continuous deployment tools like Jenkins, GitLab
Experience with tools like Terraform, Ansible, Puppet, or Chef
Strong knowledge with containerization technologies like Docker and orchestration tools like Kubernetes
Familiarity with cloud platforms such as AWS, Google Cloud Platform (GCP), or Azure
Strong knowledge of Linux/Unix system administration
Experience with system performance tuning and monitoring
Understanding of TCP/IP, DNS, HTTP/HTTPS, and other network protocols

Job Responsibility

Design automated solutions leveraging cutting edge tools
Provides direct support to technical and non-technical entities to define requirements and deliver solutions
Implement and utilize asset inventory, configuration management database, capacity management, performance management, resource optimization, and security
Define requirements, perform research, evaluate vendors/solutions, design/implement solutions, and provide ongoing support
Drive consistent standardized solutions across IB for all hardware, software, configurations, and processes
Implement tools and processes for efficient and effective operational management of the environment
Schedule and provide after-hours or weekend support when necessary
Participate in defining and executing on a roadmap of projects
Interact with internal teams to provide solutions and resolve problems
Ability to communicate complex technical concepts to individuals of various technical ability

Fulltime

Senior ML Platform Engineer, AI Platform

We are seeking a skilled and passionate ML Platform Engineer to join our team an...

Location

Singapore , Singapore

Salary:

Not provided

Airwallex

Expiration Date

Until further notice

Requirements

5+ years in backend software development
at least 2+ years focus on AI/ML Platform or MLOps infrastructure
deep expertise in MLOps practices, including automated deployment pipelines, model optimization, and production lifecycle management
proven experience designing and implementing low-latency model serving solutions
proficiency in Python
skill in writing high-quality, maintainable code
experience in design and development of large-scale distributed, high concurrency, low-latency inference, high availability systems
excellent communication and mentoring abilities
a relevant degree in Computer Science, Mathematics or related fields

Job Responsibility

Platform Development: Design, build, and maintain the end-to-end MLOps platform using Kubernetes and Cloud Services
Infrastructure as Code (IaC): Use Terraform or similar tools to manage, provision, and scale all ML-related infrastructure securely and efficiently
Pipeline Automation: Implement and optimize CI/CD/CT (Continuous Integration, Delivery, Training) pipelines to automate model training, testing, packaging, and deployment using tools like Argo and Kubeflow Pipelines
Serving Infrastructure: Build highly available, low-latency, and high-throughput model serving infrastructure
Observability: Implement robust monitoring, alerting, and logging solutions to track infrastructure health, model performance, and data/model drift
Tooling & Support: Evaluate, integrate, and support ML tools such as Feature Stores and distributed model training pipelines
Security & Compliance: Ensure platform security, implement RBAC (Role-Based Access Control), and manage secrets for sensitive data and production environments
Collaboration: Work closely with Data Scientists and ML Engineers to understand their needs and provide technical guidance on best practices for scaling their models

Fulltime

Staff Platform Engineer – Taikun Platform Automation

At Cloudera, we empower people to transform complex data into clear and actionab...

Location

Czech Republic , Prague

Salary:

Not provided

Cloudera

Expiration Date

Until further notice

Requirements

6+ years proven experience with Kubernetes and related tooling
Strong knowledge of Helm for application packaging and deployment
Experience with Terraform and Infrastructure-as-Code practices
Familiarity with GitOps workflows, especially Flux
Solid Linux system administration skills
Ability to write clear, maintainable automation code
Good collaboration and communication skills
Bsc/MSc in related field or equivalent experience

Job Responsibility

Develop and maintain Helm charts for deploying and managing Taikun components
Automate platform installation, configuration, and upgrades using Flux
Create Kubernetes manifests, scripts, and tooling to simplify platform management
Improve and document installation and operational workflows for Taikun
Collaborate with product and engineering teams to ensure smooth integration with backend services
Build and maintain Tekton pipelines for CI/CD and automation workflows
Improve monitoring / alerting of Taikun components
Optimize automation for speed, security, and maintainability
Stay up to date with cloud-native automation trends and best practices
Work on CI/CD pipelines in collaboration with other engineers

What we offer

Generous PTO Policy
Support work life balance with Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Paid Volunteer Time
Employee Resource Groups

Fulltime

New

Junior Platform Engineer

Location

United Kingdom , London

Salary:

30000.00 - 40000.00 GBP / Year

ITV

Expiration Date

July 16, 2026

Requirements

Knowledge and proven experience of secure cloud practices, and building highly performance and scalable cloud infrastructure (ideally AWS)
Knowledge and proven experience of Infrastructure as Code technologies (ideally Terraform) and using and implementing version control strategies with these technologies
Knowledge and proven experience of containerisation and deploying container workloads, ideally on Kubernetes
Knowledge and proven experience with Continuous Delivery / Continuous Integration patterns and platforms e.g. Jenkins, GitHub actions etc
Knowledge and proven experience with monitoring and alerting tools
Knowledge and proven experience responding to production incidents to diagnose and resolve issues related to the infrastructure supporting the services

Job Responsibility

Implementing IaC with Terraform and utilising functions of the Common Platform
Working within Core Platform to prioritise work for upgrades and business delivery and supporting engineers with deployment and production issues
Reviewing and approving pull requests from engineers across ITV to support deployment of new infrastructure, permission changes and access
Maintaining and supporting infrastructure for the Core Platform Team including the Common Platform Infrastructure e.g. Jenkins
Managing upgrades of the Kubernetes deployments in AWS
Defining and designing the infrastructure for the future - whether this be for a new application or re-design of an existing application
Sharing best practice within the Common Platform team and contributing back ideas, improvements and designs

What we offer

Flexible working with a range of options
Generous holiday allowance, plus you can buy more
Annual bonus opportunity
Competitive pension contribution
Save as you earn - with an opportunity to buy ITV shares
Wellbeing and volunteering days plus a wide range of opportunities to help you live a balanced and healthy life

Fulltime

New

Azure Data Science Platform Engineer (AI)

WFH flexibility! Up to 4 days/week! Global Environment! Competitive salary! We a...

Location

Japan , Tokyo 23 wards

Salary:

7000000.00 - 12000000.00 JPY / Year

Randstad

Expiration Date

February 29, 2028

Requirements

Bilingual proficiency in Japanese and English is preferred (English is a MUST)
Day-to-day communication will primarily be in English, with occasional interaction with Japanese-speaking stakeholders
6+ years of experience in data science, machine learning, advanced analytics, or applied AI, with demonstrated business results
Strong experience taking solutions from development into production and supporting them in live environments
Strong Python programming skills and solid engineering discipline
Experience with GenAI / LLM use cases or solution delivery
Hands-on experience with Azure for deploying, supporting, or operating production workloads
Strong experience with Terraform and Infrastructure as Code in enterprise cloud environments
Experience with CI/CD, GitHub Actions, deployment automation, and DevOps practices
Experience with SRE, MLOps, or LLMOps, particularly in monitoring, incident handling, reliability, and operational support

Job Responsibility

We are looking for a candidate who combines strong data science delivery capability with practical production and operational ownership
This is not a pure research role and not a pure platform role. It is intended for someone who can build business-facing AI/ML solutions and also help ensure those solutions are deployable, stable, and maintainable
The ideal candidate is proactive, technically hands-on, comfortable working independently, and effective in cross-functional enterprise environments.

What we offer

WFH flexibility! Up to 4 days/week!
Global Environment!
Competitive salary!
健康保険
厚生年金保険
雇用保険
土曜日
日曜日
祝日

Fulltime

New

Junior Platform Engineer

As a DevOps / Platform Engineer in the Core Team you are part of our Group Techn...

Location

United Kingdom , London

Salary:

30000.00 - 40000.00 GBP / Year

Influx Search

Expiration Date

July 16, 2026

Requirements

Knowledge and proven experience of secure cloud practices, and building highly performance and scalable cloud infrastructure (ideally AWS)
Knowledge and proven experience of Infrastructure as Code technologies (ideally Terraform) and using and implementing version control strategies with these technologies
Knowledge and proven experience of containerisation and deploying container workloads, ideally on Kubernetes
Knowledge and proven experience with Continuous Delivery / Continuous Integration patterns and platforms e.g. Jenkins, GitHub actions etc
Knowledge and proven experience with monitoring and alerting tools
Knowledge and proven experience responding to production incidents to diagnose and resolve issues related to the infrastructure supporting the services

Job Responsibility

Implementing IaC with Terraform and utilising functions of the Common Platform
Working within Core Platform to prioritise work for upgrades and business delivery and supporting engineers with deployment and production issues
Reviewing and approving pull requests from engineers across ITV to support deployment of new infrastructure, permission changes and access
Maintaining and supporting infrastructure for the Core Platform Team including the Common Platform Infrastructure e.g. Jenkins
Managing upgrades of the Kubernetes deployments in AWS
Defining and designing the infrastructure for the future - whether this be for a new application or re-design of an existing application
Sharing best practice within the Common Platform team and contributing back ideas, improvements and designs

What we offer

Flexible working with a range of options
Generous holiday allowance, plus you can buy more
Annual bonus opportunity
Competitive pension contribution
Save as you earn - with an opportunity to buy ITV shares
Wellbeing and volunteering days plus a wide range of opportunities to help you live a balanced and healthy life

Fulltime

New

Machine Learning Platform Engineer I

We are looking for a Machine Learning Platform Engineer to join Mollie's Machine...

Location

Portugal , Lisbon

Salary:

Not provided

Mollie

Expiration Date

Until further notice

Requirements

1+ year of experience deploying and maintaining ML models in production
Good understanding of MLOps principles, including experiment tracking, reproducibility, pipeline automation, model versioning, and monitoring in production
Strong hands-on Python programming skills, with proficiency across common ML and data libraries such as scikit-learn, pandas, NumPy, XGBoost, LightGBM, and MLflow
Familiarity with a major cloud platform, preferably GCP
Experience with containerization (Docker), with preferred familiarity in container orchestration tools such as Kubernetes and Kubeflow
Strong context-switching ability with sharp attention to detail
Preferably familiarity with infrastructure-as-code (IaC) tools such as Terraform
Experience building and maintaining CI/CD pipelines for ML workflows

Job Responsibility

Collaborate closely with ML Platform Engineers, Machine Learning Scientists, and engineers across Mollie's domain teams to deliver scalable Machine Learning solutions
Deploy and operationalize ML models to production in partnership with Machine Learning Scientists
Enhance and maintain our cloud-based ML Platform on GCP, writing production-grade Python and Terraform daily
Build and maintain CI/CD pipelines for ML model training and inference
Deploy, manage, and scale model serving endpoints on Kubernetes
Assist in extending, developing, and hosting custom and open-source AI tooling
Champion MLOps best practices
Ensure platform reliability by setting up observability, monitoring, and alerting
Maintain and enhance open-source AI tooling hosted at Mollie

What we offer

Noise cancelling headphones
MacBook
Birthday off
Complimentary baby days
20 days working from abroad
22 holiday days
Commute allowance
Work from home budget
Bike lease plan
Internet allowance

Fulltime

Select Country

Platform Monitoring Engineer

Job Description

Job Responsibility

Requirements

Looking for more opportunities?