This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Site Reliability Engineer at Skyhigh Security will be responsible for monitoring, maintaining and troubleshooting operational issues of a high availability production environment. The SRE will also act as a bridge between Operations, Engineering and Product Management teams and you will represent the customer point of view to continue driving enhancements to our products and uptime. SREs are responsible for managing and improving the operational aspects of systems, such as monitoring, alerting, incident response, and vendor interactions.
Job Responsibility:
Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services
Ensure all SRE and operating procedures are maintained and executed
Maintain a 24×7 production environment with a high level of service availability and perform quality reviews, manage operational issues
Perform root cause analysis for major incidents and drive the process by involving required stakeholders
Perform problem management by analyzing metrics, alarms and dashboards to troubleshoot problem areas, report issues to assist in performance tuning and fault finding
Implementation of proactive monitoring, alerting, trend analysis, and self-healing solutions
Explore and innovate new technologies, features, and tools to improve the platform and automate operational tasks using Bash, Python or any other programming language
Manage and maintain Runbooks and Standard Operating procedures
Manage, coordinate, and document all types of maintenance activities and outages
Perform patching and upgrades for vulnerability management
Work closely with the teams to initiate the development of new ideas into internal tools
Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality production service
Capable of working a flexible work schedule in a 24 x 7 environment with rotational shifts
Requirements:
Bachelor’s degree in computer science, electrical engineering or a related area, with 7+ years of SRE experience in a large enterprise organization
System admin experience on Linux environments
Experience with end-to-end monitoring setup for infra and applications
Experience with Prometheus, Grafana, ELK, Opensearch, Cloudwatch, PagerDuty and other monitoring tools
Solid experience with Cloud Technologies such as AWS and OCI
Good experience with containerized workloads tools like Kubernetes
Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is required
Experience with BGP, NAT, TCP/IP, iBGP, Proxies, Cross connects
Experience with L2/L3 switching, knowledge of Juniper and Cisco routing devices
Experience understanding and managing web servers (Apache, Tomcat, Nginx)
Ability to script/program with one or more high level languages, such as Python, Go, etc
Experience with any configuration management tools like Salt or Puppet or Ansible or similar
Experience with source control tools such as Github and SVN
Experience with deployment tools Jenkins, Harness etc
Experience with SQL and NoSQL databases like Redis, Crate, Elasticsearch
Experience in performing and writing Root Cause Analysis documents
Strong communication and analytical/problem-solving skills
Systematic approach and to drive problems to resolution
Only US Citizens are eligible
Nice to have:
Good to have experience/knowledge of GCP, Azure
Experience in Security domain will be added advantage
Experience with open-source technologies like Kafka, Hadoop, HBase, Zookeeper, Oozie will be an added advantage
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.