This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Responsible for ensuring the high availability, reliability, and performance of Azure-based AI cloud platform
Lead proactive monitoring, outage detection, and incident response minimize downtime and operational risk
Design and maintain disaster recovery and business continuity processes to safeguard critical AI workloads
Oversee cybersecurity operations, including vulnerability management, audits, and compliance with security standards for the AI platform
Collaborate closely with MLOps, LLMOps, and engineering teams to integrate automation, observability, and security best practices into platform operation
Requirements:
Bachelor’s degree in Computer Science or equivalent
Minimum 5 years of experience in cloud administration and/or operations
Deep expertise in Azure operations and monitoring services including Azure Monitor, Log Analytics, Application Insights
Strong background in incident management, SRE practices, and disaster recovery design
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.