Job Description:
Monitor critical applications and infrastructure such as Kibana, Firebase Crashlytics, APM tools, CloudWatch, etc., and proactively identify anomalies or service degradation. Manage end-to-end incident lifecycle: log, track, and resolve tickets, and provide RCA (Root Cause Analysis) and summary reports to stakeholders. Diagnose and troubleshoot software, hardware, networking, and system-level issues. Handle on-ground incidents and coordinate with field engineers for device installation, calibration, and replacement. Work with IoT hardware and embedded systems including ANPR cameras, sensors, and edge controllers. Configure, validate, and monitor device health, firmware, and connectivity. Troubleshoot API-based integrations between edge devices, backend systems, and third-party platforms. Ensure proper event flow sequencing and identify processing gaps. Analyze logs and telemetry data for device-level and API-level issues. Develop troubleshooting guides, SOPs, and diagnostic tools. Drive Service Improvement Plans (SIP) focusing on system reliability and uptime. Collaborate with Incident Resolution Teams, SDMs, and stakeholders for faster service restoration. Participate in weekly/monthly review meetings with internal and external stakeholders. Follow defined escalation paths and ensure SLA adherence. Communicate effectively across teams, including engineers and senior management. Flexible to work in rotational shifts.