This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The CTS Enterprise Analytics Services (EAS) organization is actively recruiting for a strong Platform Engineer to work on a broad spectrum of engineering initiatives. EAS organization is driving enterprise-wide strategy of engineering and managing best in class data and analytics services including Big Data platforms, Spark services, AI & ML services, etc. The role requires a thought leader who can perform hands on work in partnership with key stakeholders, architects, engineers, data scientist and devops teams to engineer and deliver highly resilient solutions.
Job Responsibility:
Assess the current landscape and book of work, and partner with various teams to identify key areas for infrastructure automations, configuration management, monitoring, alerting, etc.
Continuously work on designing and improving processes of detecting and responding to production service outages and build preventive solutions
Act as the subject matter expert in Site Reliability Engineering to help drive engineering vision set by EAS stakeholders
Produce availability and performance metrics for services and deliver processes to improve on major KPIs
Operationalize highly available services deployed across multi-region and multi-data center environments
Handle outages, perform root cause analysis, and provide architectural and engineering recommendations
Build internal knowledge base to educate partners and support teams
Requirements:
Proven track record of system design experience with highly available platforms and services supporting various types of workloads
Experience in designing fail-over processes and solutions
Strong scripting skills – shell scripts, Python, Perl, etc.
Experience with virtualization, containerization, and cloud technologies – Docker, Kubernetes and Cloud Service Providers e.g. GCP, AWS, etc.
Analytical thinker able to assess various aspects to methodically arrive at a solution
Hands on experience in gathering performance metrics, troubleshooting, tuning, monitoring, etc.
Experience with monitoring and logging solutions and frameworks e.g. OTEL, Grafana, Prometheus, Kibana, Splunk, etc.
Hands on work on installing, configuring and troubleshooting Linux based environments
Experience in IaC and CI/CD tooling e.g. Terraform, Jenkins, Harness, etc.
Strong knowledge of configuration management tools e.g. Ansible and/or Chef
Familiarity with GPU management in virtualized enterprise environments
Good understanding of security concepts and best practices
Excellent written and verbal communication skills
Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products
Ability to work in a matrixed environment and follow procedures, processes and policies
Experience managing vendor interactions for troubleshooting sessions, enhancement requests, and guiding vendor roadmaps to meet Citi standards and functional requirements
Self-starter who works with minimal supervision and can work in a team of diverse skills and geographies
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.