CrawlJobs Logo

Monitoring / Release & Incident Management Support Engineer

nttdata.com Logo

NTT DATA

Location Icon

Location:
Philippines , Manila

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Monitoring / Release & Incident Management Support Engineer will oversee software releases and manage monitoring systems to ensure compliance with SLAs. Proficiency in Linux and Windows administration is essential, along with experience in agile methodologies and CICD pipelines. The role involves collaboration with engineering and security teams to promote best practices in operations. Candidates should have a strong background in release management and incident response.

Job Responsibility:

  • Release Management of new software via Tools
  • Understand release management SOP = QA -> Load Test -> Stage Environment -> PROD
  • Create/Manage monitoring and alerting systems and as needed to meet SLA’s
  • Working in agile teams, build, test and maintain aspects of CICD Pipeline
  • Manage UI visual of license consumption & performance
  • Evangelize with Engineering, Security, and cross functions on Ops Best Practices
  • Firmware release - OTA (over the air)
  • Launch new the mobile app / release new version of the existing mobile app - Appstore / Playstore

Requirements:

  • Proficiency in Linux and Windows administration
  • Experience in agile methodologies
  • Experience with CICD pipelines
  • Strong background in release management
  • Strong background in incident response

Additional Information:

Job Posted:
January 30, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Monitoring / Release & Incident Management Support Engineer

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

Technical Program Manager, Quality and Release

As a Quality and Release Technical Program Manager (TPM), you’ll drive the coord...
Location
Location
United States , Detroit
Salary
Salary:
76905.00 - 106100.00 USD / Year
canopy.security Logo
Canopy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree Engineering, Computer Science, Business Administration, or a related technical discipline
  • 2–5 years of experience in Technical Program Management, Release Management, or a related technical operations/support coordination role
  • Strong organizational and communication skills, with a bias for action and an ability to drive clarity during fast-moving release cycles and support incidents
  • Solid expertise with program and release management methodologies, including managing cross-functional readiness, risk, and quality gates
  • Proven ability to manage multiple concurrent releases and operational workflows in a high-paced engineering environment
  • Familiarity with Agile development practices, Jira, CI/CD pipelines, and standard software deployment and support processes
  • Comfort with data-driven decision-making, including analyzing release metrics, incident trends, and support data to identify risks and improvement opportunities
Job Responsibility
Job Responsibility
  • Lead the end-to-end software release lifecycle across firmware, app, cloud, platform services, and external partner systems ensuring compatibility, reliability, high-quality, and well-communicated rollouts
  • Drive product quality and release reliability by defining and enforcing release criteria, gates, approval processes, and coordinating cross-team testing and readiness
  • Create, track, and analyze key KPIs including pre- and post-deployment metrics to monitor release performance, identify trends, and inform continuous improvement
  • Manage on-call and incident processes, including escalations, SLAs, and post-incident reviews, while coordinating with all Customer Care tiers for effective resolution
  • Monitor release health in real time and lead rapid triage and rollback/mitigation decisions when necessary
  • Maintain transparent communication on release status, risks, and outcomes to leadership and stakeholders
  • Continuously improve release, support, and incident processes through documentation, change logs, release notes, release process automation, and operational excellence initiatives
What we offer
What we offer
  • Comprehensive medical benefits coverage, dental plans and vision coverage
  • Health care and dependent care spending accounts
  • Employee and Family Assistance Program (EAP)
  • Employee discount programs
  • Retirement plan with a generous company match
  • Generous Paid Time Off, Sick, and Holidays
  • Family Leave (Maternity, Paternity)
  • Short- and long-term disability
  • Life insurance and accidental death & dismemberment insurance
  • Fulltime
Read More
Arrow Right

L2 Support Engineer & Release Manager

The L2 Support Engineer & Release Manager is responsible for overseeing producti...
Location
Location
Romania , Bucuresti
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field
  • At least 4 years of experience as Technical Support Engineer, with at least 2 years working directly as a Release Manager
  • Good knowledge of Oracle database
  • Good knowledge of Rhel OS – Linux OS
  • Shell scripting knowledge
  • Network connectivity troubleshooting
  • Previous experience in and knowledge of ETL / data sourcing techniques
  • Previous experience with Control M, Tomcat / WebLogic/ Apache/ Fabric highly desirable
  • Previous production support experience, with can do mind-set and attitude and hands-on approach
  • Ability to analyse business requirements, defects and propose hot fixes
Job Responsibility
Job Responsibility
  • Drive on different tech adoption initiatives (GCP, EXACC, TRC etc)
  • Provide support for technical infrastructure components (e.g. databases, middleware and user interfaces)
  • Provide Support and remediation on any issues pertaining to the above applications by providing detailed code analysis of applications’ production platform
  • Estimate time required to implement remediation actions which are under direct control of RTB
  • Support and contribute to all relevant documentation following DB internal Standards, Procedures and Guidelines
  • Ensure appropriate vendor interaction in a multi-vendor environment
  • Conduct incident and problem management activities
  • Conduct scheduled Problem Management meetings with infrastructure groups, problem managers and incident managers OR from Agile world with SMs, POs to track progress and highlight issues
  • Perform detailed technology analyses to highlight weaknesses and make recommendations for improvement
  • Perform releases/DR exercises/application maintenance activities over weekends (usually once every 3 weeks there is a weekend activity required)
What we offer
What we offer
  • Smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Projects have different working hours to suit your needs
  • Sponsored certifications, trainings and top e-learning platforms
  • Private Health Insurance
  • Individual coaching sessions or joining our accredited Coaching School
  • Epic parties or themed events
  • Fulltime
Read More
Arrow Right

System Admin (OS Admin)- L3

Act as the SME for Ubuntu/Linux operations. Provide deep troubleshooting, RCA, p...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ year of experience in Ubuntu/Linux operations
  • Strong troubleshooting of Ubuntu with KVM
  • Proficiency in KVM advanced ops (networking, storage, migration, tuning)
  • Experience on Bash/Python scripting
  • ITIL (Incident, Problem, Change Management)
  • Bachelor’s degree in engineering (or Equivalent)
  • Linux certifications (RHCSA / RHCE)
  • Strong Knowledge in Linux – Ubuntu
  • Strong understanding on Virtualization platform – KVM
  • Good Knowledge on Linux Life cycle management, release management, Vulnerability assessment and Mitigation
Job Responsibility
Job Responsibility
  • Lead the delivery outcomes for designated Accounts
  • Lead, Own and Drive all Major incidents End-End and providing the RCAs
  • Conduct / Participate regular customer meetings and provide regular status updates
  • Prepare / maintain the Technical Documentation – Inventory, Run book, KB Article, KEDB and SOPs for all desired activity and work for the account
  • Continuously review and improve existing processes to ensure operational efficiency and customer satisfaction
  • Act as the primary point of contact for all Linux technology inquiries and engagements
  • Collaborate with Cross-Functional teams for solutions that meet clients' Linux technology needs
  • Maintain service level agreements (SLAs) and other key performance indicators (KPIs) for Managed Services customer for the MS Linux Practice
  • Develop and maintain a thorough understanding of emerging technologies, industry trends, and best practices in Linux technology
  • Identify and lead the execution of process improvement initiatives
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Microsoft Azure cloud System Support

The Microsoft Azure Cloud System Support role involves managing and supporting A...
Location
Location
Malaysia , Kuala Lumpur
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years’ experience in Microsoft Azure cloud System Support
  • Understand the Microsoft Azure cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree
  • Perform L1.5 activities such as monitoring, deployment, rollback
  • Monitor the efficiency of the Azure cloud systems to prevent outages, and initiate an Incident Management bridge in case of an outage
  • Troubleshoot Azure resources, escalate to Level 3 (soft dev team)
  • Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), Cosmos DB, SQL Server, IoT Hub, Databricks, Key Vault, Data Lake
  • Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting
  • Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, GitHub, Ansible
  • Understand the concept of container orchestration platforms (e.g. Kubernetes)
  • Understand the concept of scripts: PowerShell, Python
Job Responsibility
Job Responsibility
  • Release Management of new software via Tools
  • Understand release management SOP = QA -> Load Test -> Stage Environment -> PROD
  • Create/Manage monitoring and alerting systems and as needed to meet SLA’s
  • Comfortable with both Linux and Windows administration
  • Working in agile teams, build, test and maintain aspects of CICD Pipeline
  • Manage UI visual of license consumption & performance
  • Evangelize with Engineering, Security, and cross functions on Ops Best Practices
  • Firmware release - OTA (over the air)
  • Launch new the mobile app / release new version of the existing mobile app - Appstore / Play store
  • Participate in RCCAs when needed
Read More
Arrow Right
New

Sr. Specialist - Site Reliability Engineer

The Production Support SRE Engineer is responsible for ensuring the reliability,...
Location
Location
United States , Southlake; Austin
Salary
Salary:
115000.00 - 131000.00 USD / Year
schwab.com Logo
Charles Schwab
Expiration Date
March 23, 2026
Flip Icon
Requirements
Requirements
  • 2+ yrs experience in production support, incident management, and real‑time troubleshooting for high‑availability systems
  • Solid understanding of SRE principles, including SLIs, SLOs, error budgets, and incident response frameworks
  • Hands-on experience with observability and monitoring tools such as Splunk, Grafana, Moogsoft, or xMatters
  • Proficiency with structured logging, log analysis, and alert tuning
  • Ability to create and maintain runbooks, operational guides, and incident playbooks
  • Familiarity with automation concepts and ability to identify and reduce operational toil through scripts, tooling, or process improvements
  • Strong communication skills with the ability to translate complex technical issues into clear, business-friendly language
  • Ability to partner with product, engineering, and delivery teams to embed reliability into the development lifecycle
  • Experience participating in on-call rotations, including market‑hours support and after‑hours escalations
  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Serve as the primary production support engineer for assigned Workplace Services applications, ensuring high availability, rapid incident response, and effective participation in both market‑hour and after‑hours on‑call rotations
  • Lead root‑cause analysis, support SLO breach investigations, and partner with product and delivery teams to restore and maintain service health
  • Champion Schwab’s SRE principles by improving observability, structured ELI logging, meaningful alerting, automation, and standardized dashboard/reporting patterns
  • Ensure new features, releases, and operational changes meet reliability, monitoring, and readiness expectations
  • Develop and maintain runbooks, operational guides, incident playbooks, and service documentation
  • Identify sources of operational toil, drive automation efforts, rationalize alerts, and deliver data‑driven insights and trends to product and engineering teams for proactive reliability improvements
  • Act as the embedded SRE partner for your service area—attending key ceremonies, advising teams on operational risks, and promoting best practices in reliability engineering
  • Foster a culture of blameless postmortems, continuous learning, and cross‑team enablement
What we offer
What we offer
  • 401(k) with company match and Employee stock purchase plan
  • Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
  • Paid parental leave and family building benefits
  • Tuition reimbursement
  • Health, dental, and vision insurance
  • Fulltime
Read More
Arrow Right

SHE Compliance Support Manager

Lead to ensure that Legal Compliance Management & Tracking System and Manage & C...
Location
Location
Thailand , Bangkok
Salary
Salary:
Not provided
unilever.com Logo
Unilever
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree from in Occupational Safety, Safety Engineering, Occupational Health, Engineering, Environment and related fields
  • License in Safety Professional
  • At least 5 years’ experience in SHE (Manufacturing or FMCG are preferable)
  • Technical expertise in analyzing numerical and claims data and industry performance measures such as Loss Time Injury Rates, Frequency Rates, and ISO14001
  • Expertise in OH&S systems and industry “best practices”
  • Fluent in English skill both communication and writing
  • Knowledge in Digital and innovation tool
  • High leadership skills, people management skills and influence & communication
  • Able to work under extreme pressure in a multicultural environment
Job Responsibility
Job Responsibility
  • Lead to ensure that Legal Compliance Management & Tracking System and Manage & Coordinate all legal compliance tasks, including the implementation in all areas
  • Lead to ensure safety compliance with Unilever standard/program in all aspects such as the UMS (Unilever Management System) in scope of Safety, Health and Environment
  • Establish structure of compliance risk assessment to support UMS on Safety Pillar (OHS, PSM) and Environment Pillar
  • Develop and implement process to perform periodic audits both internal and external
  • SHEPAR, PCC, PSM, GRC and others
  • Support SHE manager to ensure that SHE PAR (Positive Assurance Review), PCC (Program Compliance Check), DCA (Deep Compliance Audit and Checking) etc. has been completed with effectiveness
  • Assist and support the site SHE managers in completing any compliance related documents and reports accurately and on time
  • Establish effective process to manage contractor safety management system and design process to be ensure contractor work are compliance action as agreed safe method statement and rules
  • Implement and Maintain Safety Health & Environmental (SHE) program to ensure the compliance of Unilever SHE standard and Thai law and Regulations and other related requirements
  • Be the Site Coordinator for PSM (Process Safety Management) to ensure that PSM processes are complied with local regulation and Unilever standard
  • Fulltime
Read More
Arrow Right
New

Principal Group Engineering Manager

Microsoft Specialized Clouds combines the power of edge platforms, devices, and ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of professional software engineering experience, including designing, building, and operating distributed, cloud-scale services
  • 5+ years of engineering leadership experience, including managing managers and leading multi-team engineering organizations (M2+)
  • Deep experience with network device platforms — specifically Arista (EOS, eAPI, CloudVision) and/or Cisco (NX-OS, DCNM/NDFC) — including device programming, configuration management, and automation
  • Strong background in device programming and network automation — building systems that programmatically configure, validate, and manage network device state at scale
  • Experience with Azure Resource Provider (RP) engineering — ARM resource modeling, deployment pipelines, control-plane architecture, and resource lifecycle management
  • Solid understanding of L2/L3 networking fundamentals: spine-leaf architecture, VXLAN, overlay/underlay networking, BGP, and data center network design
  • Proven ability to set technical direction and architectural strategy for complex platforms spanning multiple components and partner teams
  • Demonstrated success owning end-to-end delivery of customer-critical services, including design, development, release, and live-site operations
  • Strong experience driving operational excellence, including reliability, incident management, automation, and cost optimization for production services
  • Proven track record of leading organizational transformation — such as quality resets, reliability turnarounds, code yellow resolution, or engineering culture change across an engineering org
Job Responsibility
Job Responsibility
  • Lead engineering teams through the design, architecture, development, testing, and operations of the Network Fabric platform — the cloud-managed networking layer for Azure Operator Nexus and Azure Local
  • Drive execution excellence across the full software lifecycle: semester planning, feature delivery, release management, and live-site operations
  • Own engineering commitments across multiple workstreams including network device programming, Azure Resource Provider development, fabric orchestration, and network configuration management
  • Ensure services meet Microsoft standards for quality, reliability, security, and operational readiness
  • Establish and enforce engineering best practices — including test-driven development, automated validation, secure development lifecycle (SDL/SFI), and continuous integration
  • Continue and accelerate the ongoing engineering transformation: driving quality resets, improving release predictability, and reducing customer-impacting incidents
  • Own the resolution of code yellow and equivalent quality escalations, driving root cause analysis and systemic remediation across the engineering organization
  • Champion a culture of engineering fundamentals — ensuring that quality, security, and operational maturity are embedded into every sprint, not treated as afterthoughts
  • Drive measurable reduction in support costs through automation, improved test coverage, and process optimization
  • Provide technical leadership across device programming (Arista EOS, Cisco NX-OS), network fabric orchestration, and Azure Resource Provider engineering
  • Fulltime
Read More
Arrow Right