This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us 'challenge the status quo' and transform the finance industry together. Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning. The ITO Observability team helps manage risk to the firm by monitoring the real-time health and performance of software applications running across thousands of servers in a complex, multi-location network environment. This team-based role requires strong attention to detail, sound judgment, and the ability to respond quickly when potential instability or anomalies are identified. When an issue is detected, this role leverages a broad information base, established procedures, and cross-functional collaboration to support timely restoration and minimize business impact.
Job Responsibility
Monitor Schwab systems for anomalies, indicators of instability, and potential service-impacting events
escalate to the appropriate support teams to drive timely restoration
Identify opportunities to improve, automate, and streamline monitoring, alerting, and escalation workflows
Analyze operational data and trends to identify workflow improvements, reduce noise, and strengthen service reliability
Draft clear, concise communications for business partners regarding system instability, potential impact, and recommended mitigation steps
Present the enterprise-wide status of Schwab systems during recurring operational readiness and status meetings
Maintain high service standards and support business Service Level Agreements during incidents and service-impacting events
Provide hands-on troubleshooting and technical support where applicable
Create and maintain documentation for operational processes, procedures, knowledge articles, and training guides
Apply Copilot and other AI-enabled tools to support problem solving, improve efficiency, and explore agent-based solutions that streamline operational workflows
Requirements
Bachelor's degree in Computer Science or equivalent professional experience in IT Operations
4+ years of experience in a large-scale, 24x7, high-availability data center or enterprise technology environment
Foundational understanding of distributed and server operating systems, server administration, network components, virtualization, and cloud computing environments gained through professional experience or formal technical training
Experience with event management platforms, such as Moogsoft
Experience with monitoring and visualization platforms, such as Grafana
Familiarity with Confluence, Jira, and Remedy/SmartIT service management tools
Strong attention to detail, with the ability to execute tactical responsibilities accurately in a fast-paced environment
Demonstrated ability to act as a change agent by bringing forward innovative ideas, solving problems, and delivering practical solutions
Ability to influence and collaborate effectively with others through constructive communication and positive reinforcement
What we offer
401(k) with company match and Employee stock purchase plan
Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
Paid parental leave and family building benefits
Tuition reimbursement
Health, dental, and vision insurance
Medical, dental and vision benefits
401(k) and employee stock purchase plans
Tuition reimbursement to keep developing your career
Paid parental leave and adoption/family building benefits
Sabbatical leave available after five years of employment