CrawlJobs Logo

Senior AI Site Reliability Engineer

schwab.com Logo

Charles Schwab

Location Icon

Location:
United States , San Francisco

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

190000.00 - 270000.00 USD / Year

Job Description:

At Schwab, you will build a rewarding career while making a difference in the lives of our millions of clients. Here, innovative thinking meets creative problem solving as we work together to challenge the status quo. We believe in the power of collaboration and value being together in the office, which is why this role is based on-site in our San Francisco office. Joining Schwab means joining a company committed to transforming the financial industry and putting clients at the center of everything we do. Schwab’s AI Strategy & Transformation team, known as AI.x, is the central hub for Artificial Intelligence at Schwab. We are an integrated product, engineering, strategy and risk team, all based in San Francisco. We help set the enterprise vision for AI, invest in the most promising opportunities, and accelerate delivery across the company. We also build the core platform that powers AI at scale and explore next-generation GenAI efforts that will redefine how we serve our clients. As a Senior AI Site Reliability Engineer on AI.x, you will play a key role in ensuring our AI solutions are reliable, scalable, and resilient—enabling us to deliver innovative experiences to millions of clients. This role is more than a reliability engineering position. It is an opportunity to join a high-profile team shaping Schwab’s future with AI, to build and maintain solutions that matter to millions of clients, and to grow your career in one of the most exciting areas of technology today.

Job Responsibility:

  • Design, implement, and manage the reliability and operational excellence of GenAI applications and platforms
  • Work closely with architects, engineers, and business leaders to align reliability practices with Schwab’s enterprise strategy
  • Mentor and coach junior engineers, helping to build strong operational practices and foster a culture of continuous improvement
  • Lead by example in solving complex reliability challenges, advancing SRE standards, and driving rapid iteration from concept to production

Requirements:

  • 8+ years of software development or reliability engineering experience, with 4+ years as a hands-on senior engineer in startups and/or large organizations
  • Bachelor’s degree in Computer Science or related field
  • 5+ years of experience building and operating complex products from scratch and running them in production
  • 3+ years of experience supporting applications that use Artificial Intelligence (AI) models to deliver real business impact
  • 3+ years of experience building and maintaining data pipelines and infrastructure for large datasets
  • 3+ years of experience with containers and cloud-native applications, and the ability to operationalize them in the public cloud with infrastructure as code
  • Experience implementing monitoring, alerting, and incident response for large-scale distributed systems
  • Proven track record in driving reliability, scalability, and performance improvements for production AI systems

Nice to have:

  • Strong computer science fundamentals and experience working across different parts of the tech stack
  • Experience working with proprietary or open-source LLMs (Gemini, Claude, OpenAI or other models) and supporting LLM-powered applications in production
  • Focus on quality and reliability in everything you do. Continue to raise the bar and drive others to deliver high-quality, resilient products, with experience writing tests and implementing automated reliability checks
  • Experience writing and running evaluations to ensure quality and monitor consistency in LLM-generated responses and actions
  • Strong communication skills – you balance written and verbal communication to clearly share your perspective with others on the team
  • Experience mentoring junior engineers and helping them grow their technical and operational skills through clear feedback and code reviews
  • Demonstrated mindset of continuous learning and improvement
  • Ability to solve complex problems with ambiguous or incomplete data in highly distributed systems
  • Demonstrated business domain knowledge related to all products you have worked on
  • Curiosity about new technologies and processes – you always seek to improve yourself and everyone around you and proactively seek and share knowledge with others on your team
  • Experience with Python and front-end development preferred but not required
  • Master’s or advanced degrees in Computer Science or related fields
What we offer:
  • 401(k) with company match and Employee stock purchase plan
  • Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
  • Paid parental leave and family building benefits
  • Tuition reimbursement
  • Health, dental, and vision insurance
  • Bonus or incentive opportunities

Additional Information:

Job Posted:
December 24, 2025

Expiration:
January 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior AI Site Reliability Engineer

Senior Software Engineer, Site Reliability

Babylist is looking for a Senior Software Engineer, Site Reliability to join our...
Location
Location
United States; Canada
Salary
Salary:
186818.00 - 224183.00 USD; CAD / Year
babylist.com Logo
Babylist
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience as a Site Reliability Engineer or similar role
  • Experience supporting high-traffic consumer-facing websites
  • Proficiency with Terraform
  • Strong experience working with AWS cloud-based infrastructure and services
  • Proficiency with Docker and Kubernetes
  • Solid understanding of cloud-native systems design
  • Troubleshooting and debugging skills
  • Experience designing and supporting CI systems
  • Familiar with monitoring and alerting best practices
  • Proven experience in on-call management best practices
Job Responsibility
Job Responsibility
  • Manage and build our AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform
  • Improve the speed and reliability of our Continuous Integration (CI) systems
  • Provide support to developers in troubleshooting issues
  • Establish, communicate, and support best practices for monitoring and alerting
What we offer
What we offer
  • Company-paid medical, dental, and vision insurance
  • Retirement savings plan with company matching and flexible spending accounts
  • Generous paid parental leave and PTO
  • Remote work stipend
  • Perks for physical, mental, and emotional health, parenting, childcare, and financial planning
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are seeking an experienced Senior Site Reliability Engineer (L3) to join our ...
Location
Location
India , Chennai
Salary
Salary:
Not provided
arcadia.com Logo
Arcadia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
  • 8–10+ years of experience in SRE/DevOps/Cloud Engineering, with deep hands-on exposure to AWS and Kubernetes
  • Strong hands-on experience with: Terraform & Infrastructure as Code
  • AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
  • Jenkins + Groovy, GitHub Actions, ArgoCD, FluxCD
  • Kubernetes troubleshooting and operations
  • Prometheus/Grafana/Datadog observability stacks
  • Proven ability to operate in high-scale, high-uptime, multi-environment production systems
  • Experience building automation via Python/Bash and reducing operational toil
  • Strong understanding of incident management, root cause analysis, and reliability engineering principles
Job Responsibility
Job Responsibility
  • Design, build, and maintain AWS infrastructure (EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using Terraform and CloudFormation
  • Lead all aspects of Kubernetes operations including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
  • Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
  • Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
  • Implement and enhance observability across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
  • Drive FinOps initiatives, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
  • Manage database operations across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
  • Maintain and improve secret management using Vault, AWS Secrets Manager, and Parameter Store
  • Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
  • Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems
What we offer
What we offer
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation
  • Fulltime
Read More
Arrow Right

Senior AI Engineer

We are seeking an innovative AI Engineer to join a brand new team focused on pro...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience as an AI Engineer with a significant delivery history
  • Strong expertise in multiple programming languages & frameworks
  • Experience and proven experience in using quantitative testing practice applied to the field of AI/ML for actionable Go/No-Go decisions of delivering software to production
  • Demonstrated expertise of developing on a range of architectures, ideally up to and including container-based micro-services with focus on scalability, reliability, maintainability, and high performance
  • Good understanding of SQL and NoSQL databases
  • Excellent communication and collaboration skills
  • A growth mindset and willingness to learn and adapt in a fast-paced environment
  • Passion about site reliability engineering and its impact on product development
  • Being connected to latest technologies, like Generative AI, and keen to put them in practice.
Job Responsibility
Job Responsibility
  • Understand the landscape, tooling and procedures used by developers at Citi and look for opportunities to reduce toil and aid simplification using Gen AI based solutions
  • Apply classic AI and novel Gen AI evaluation methodology to raise the quality and reliability bar for the software that you will deliver, as well to manage and mitigate risks that are specific/inherent to this field
  • Advice on Evaluation metrics, devise and implement Quantitative Testing Plans, and help evolve the existing approaches to AI evaluation
  • Work with a wide variety of Citi technology teams and help them drive towards everything-as-code and a codified controls environment
  • Collaborate with product and engineering teams to design, build and maintain scalable and reliable web applications and services
  • Be hands-on with coding and software design to ensure adherence to high quality standards and best practices
  • Mentor and nurture other engineers to help them grow their skills and expertise
  • Support and drive cultural change, including instigating critical thinking about controls and processes and encouraging a culture of continuous improvement.
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance-related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources.
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Backend

As a Senior Software Engineer, Backend specializing in database architecture and...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 7+ years of professional experience in backend development roles with demonstrated leadership experience
  • Expert knowledge of relational databases (MySQL, PostgreSQL) including schema design, optimization, and administration
  • Strong proficiency with Python and JavaScript/TypeScript with advanced software engineering skills
  • Extensive experience leading projects with at least two web frameworks: Flask, FastAPI, Django, Node.js, or Next.js
  • Proven experience designing and implementing RESTful and GraphQL APIs at scale
  • Advanced understanding of containerization (Docker) and orchestration (Kubernetes) technologies
  • Experience with cloud infrastructure and deployment (AWS, GCP, or Azure) in production environments
  • Proven experience leading complex backend projects and mentoring junior engineers
  • Understanding of data requirements for robotics or automation systems
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and optimization of database schemas to support robot operations, telemetry, recipe management, and system analytics
  • Develop robust data migration strategies and version control for database schema evolution
  • Implement efficient query optimization and indexing strategies to support high-throughput robot operations
  • Establish data integrity protocols and backup systems to ensure operational continuity across customer deployments
  • Create scalable data access layers that balance security, performance, and maintainability
  • Mentor team members on database design patterns and optimization techniques
  • Lead the development and maintenance of scalable APIs to serve robot control systems, dashboards, and monitoring tools
  • Design and implement secure authentication and authorization mechanisms across backend services
  • Develop robust middleware for processing and validating data between robotics subsystems
  • Create service interfaces that enable efficient communication between robotics components and cloud services
What we offer
What we offer
  • medical, dental, and vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right

Executive Director – AI and Machine Learning

At CVS Health, we’re building a world of health around every consumer and surrou...
Location
Location
United States , Work At Home, New Jersey
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
December 31, 2025
Flip Icon
Requirements
Requirements
  • PhD or Master's degree in AI/ML, Computer Science, Statistics, Engineering, or equivalent experience
  • 15+ years leading Enterprise Machine Learning, Infrastructure, Data Science, and/or SRE practices
  • 5+ years applying Machine Learning to optimize technology operations (AIOps)
  • 10+ years at a leadership level or above, within a Fortune 500 company with significant scale
  • Proven experience leading AI governance, establishing and maintaining robust ML Ops environments, leading development of large-scale AI and ML platforms and solutions, and developing strategic partnerships with internal clients, industry experts, and vendors
  • Ability to develop and implement a comprehensive AI/ML strategy that aligns with the organization's business goals
  • Deep understanding of AI/ML technologies, including model development, deployment, MLOps, GenAIOps, and LLMOps practices
  • Demonstrated knowledge of and significant experience building and operating on-premise AI processor (e.g., GPU clusters) and platform architectures for the deployment and management of enterprise AI workloads
  • Experience with and commitment to ensuring AI/ML solutions are developed and deployed ethically, with a focus on fairness, transparency, and accountability
  • Familiarity with industry standards and regulations related to AI and Machine Learning
Job Responsibility
Job Responsibility
  • Develop, implement, and enhance governance frameworks and policies to ensure effective oversight of operational and security-focused AI and ML solutions
  • Establish and enforce standards for the build, management, governance, and utilization of AI models and model execution platforms
  • Establish and socialize a framework for the documentation, proposal, evaluation, build, delivery, and ongoing value assessment of scalable operations and security-focused AI/ML solutions
  • Evaluate and certify foundational models for use within CVS Health, ensuring alignment with organizational goals and security requirements
  • Regularly assess and enhance the governance model and associated standards to address emerging challenges and opportunities
  • Establish and maintain robust MLOps, GenAIOps, and LLMOps practices
  • Build and manage pipelines to enable teams to design AI-powered applications, develop and experiment with models, and deploy, monitor, and maintain them in production
  • Drive delivery of AI and ML solutions providing provide deep insights and reporting on operations and security data
  • Develop proactive AI-driven solutions to measurably reduce time to detect security and operational issues, provide adaptive recommendations, and automate remediation
  • Deliver solutions to enable users to interact with operational data driving measurable improvements in productivity, performance, and innovation
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
!
Read More
Arrow Right
New

Sr Wlan Software Engineer

Belden is seeking a highly skilled wireless embedded software professional in ou...
Location
Location
United States , Santa Clara; Chicago
Salary
Salary:
158600.00 - 220300.00 USD / Year
Belden, Inc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors degree in Computer Science, Software Engineering, Computer Engineering or equivalent
  • 15 years experience with embedded C/C++ software development on linux
  • 10 years experience in wireless 802.11 driver and product development
  • 10 years experience Wireless network debugging
  • 10 years experience interfacing with low level WiFi drivers
  • In depth knowledge of WiFi standards and architecture
  • Proven ability to manage / work on multiple projects in parallel
Job Responsibility
Job Responsibility
  • Participate in troubleshooting and triaging of issues to drive towards root cause identification and resolution
  • Implement code changes, document, unit test, and participate in code reviews
  • Design and create unit test conditions and scripts to address business and technical use cases
  • Develop new product features
  • Collaborate with the engineering team to propose solutions
  • Innovate and use existing tools and techniques to perform wireless network debugging
  • Document, track and communicate issue status and resolution as appropriate, using Jira
  • Experience with wireless test environments and equipment
  • Evaluate documentation associated to wireless communication industry
  • Evaluate development activities (own and others) to provide guidance and process improvements
What we offer
What we offer
  • health/dental/vision
  • long term/short term disability
  • life insurance
  • HSA/FSA
  • matching retirement plans
  • paid vacation
  • parental leave
  • employee stock purchase plan
  • paid leave for volunteer work in your community
  • training opportunities
  • Fulltime
Read More
Arrow Right
New

Director, Strategy - Technology Research

As the Technology Landscape Researcher Leader, you will lead a team of strategic...
Location
Location
United States , Santa Clara
Salary
Salary:
160000.00 - 200000.00 USD / Year
Belden, Inc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading a strategic technology scouting or corporate innovation research function
  • Expertise in identifying, qualifying, and assessing the business impact of emerging Horizon 3 technologies
  • A strong background in analyzing technology trends and their relevance to specific industry verticals
  • Demonstrated ability to facilitate large-scale, cross-functional innovation workshops or events
  • Excellent leadership skills with the ability to guide a team in translating abstract technological signals into concrete business opportunities
Job Responsibility
Job Responsibility
  • Lead and mentor a team of researchers to identify and qualify Horizon 3 technologies from external sources like startups, academia, and competitors
  • Oversee the analysis and assessment of the relevance and potential impact of new technologies on Belden’s priority business verticals
  • Facilitate the Global Technology Radar Event, ensuring all Belden functions have an opportunity to contribute to our technology roadmap
  • Drive the initial contextualization of new technologies to build the pipeline of potential Horizon 3 opportunities
  • Collaborate with cross-functional teams to support technology communications, patent applications, and market readiness assessments
What we offer
What we offer
  • health/dental/vision
  • long term/short term disability
  • life insurance
  • HSA/FSA
  • matching retirement plans
  • paid vacation
  • parental leave
  • employee stock purchase plan
  • paid leave for volunteer work in your community
  • training opportunities
  • Fulltime
Read More
Arrow Right
New

Sales Engineer

The role of a Sales Engineer at AMS is to deeply understand our highly technical...
Location
Location
United States , Knoxville
Salary
Salary:
Not provided
Analysis and Measurement Services
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Holds a bachelor’s degree from an accredited institution, ideally in engineering, another technical discipline, or a closely related field
  • Demonstrates the ability to learn and communicate complex technical concepts with clarity and credibility to both internal teams and external clients
  • Brings a track record of reliability, integrity, and professionalism, with evidence of long-term commitment and progression in prior roles
  • Is willing and able to travel 6–8 times per year (approximately four days per trip) for field services testing at nuclear power plants and for business development activities
  • Complies with applicable company policies, quality and safety standards, and all administrative directives
  • Shows a strong work ethic, a genuine interest in helping customers solve problems, a collaborative spirit toward colleagues and partners, and a desire to contribute to the long-term success and reputation of the company
  • U.S. citizenship is required to meet federal compliance and security-access regulations
Job Responsibility
Job Responsibility
  • Develop deep knowledge of AMS technologies and services through hands-on field testing and plant trips
  • Serve as the technical bridge between customers and internal teams, helping define project scopes, gather requirements, and translate customer needs into clear proposals
  • Own day-to-day customer relationships through calls, emails, follow-ups, and proactive account management
  • Develop and deliver compelling sales and technical presentations that clearly communicate AMS capabilities and demonstrate solution value
  • Collaborate with engineering and operations teams to develop proposals and quotes, reviewing procedures and ensuring accuracy in scope, assumptions, and pricing
  • Advance sales opportunities through the pipeline by identifying risks and gaps, coordinating next steps, and partnering with the Sales and Business Development team on new business
What we offer
What we offer
  • Generous employer contributions toward comprehensive health, dental, vision, life, and disability insurance
  • 401(k) with generous employer matching contributions
  • Four weeks annual vacation plus at least 10 paid holidays per year
  • Professional Development – AMS is committed to your growth technically and as a sales professional
  • Career Growth – Annual salary reviews, raises, and advancement opportunities
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.