This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Sr. Manager – SRE & Operations in Client Data Technology (CDT), you will lead System Availability Engineering (SAvE) Teams for CAT applications, playing a critical role in ensuring availability of CDT eco systems, and guiding the development, automation, tooling and realization of SRE best practices.
Job Responsibility:
Identifying tactical and strategic opportunities to improve service health, performance, reliability, and telemetry across CDT Platform
Leading the team with data driven mindset focusing on addressing key performance metrics such as MTTD, MTTR, Availability in close collaboration with Trading development and IT Operations teams
Leading the design, architecture and implementation of availability and resiliency roadmap that delivers on modernized tooling and metrics
Working closely with development team to define a sustainable operating model for CDT applications and its DB focusing on platform scale, availability, fault tolerance and performance
Leading the automation and Infrastructure as Code(IaaC) practices to ensure teams are following patterns to ensure repeatability, consistency and portability
Identifying toil and technical debt, develop a comprehensive plan and lead the team through the process of execution
Driving a shift-left mindset and influence architectural decisions to ensure resiliency and scale at the outset of software development process
Being a hands-on technical leader who will lead the team from the front and be able to inspire thought leadership in the team
Requirements:
10+ years of software development and site reliability engineering experience supporting production applications on prem & in any public cloud environment, PCF and IaaS
6+ years in DevOps engineering leadership focusing on complementing production operations with automation and tooling initiatives
6+ years of technical leadership, supporting highly technical individuals , development and driving efficiencies
5+ years of experience defining, driving and implementing operational best practices (SLOs, SLIs, Error Budgets, Monitoring errors, capacity planning, blameless postmortems and toil management)
5+ years of experience with CI/CD tools, logging, observability and telemetry solutions (Github, Jenkins, Datadog, Splunk, Prometheus, Grafana etc.)
6+ months of Schwab technology domain experience gained as a current or recent contractor or employee
Proficient in programming languages to automate repeatable processes and building IaaC solutions (Python, CloudFormation, Terraform)
Knowledge of databases - (SQL, Aerospike, Postgres preferred)
Knowledge of IBM MQ, RabbitMQ and Kafka
What we offer:
401(k) with company match and Employee stock purchase plan
Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions