CrawlJobs Logo

Lead Systems Ops Engineer - GCP

https://www.wellsfargo.com/ Logo

Wells Fargo

Location Icon

Location:
United States , Charlotte

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Wells Fargo is seeking a seeking a highly skilled and motivated Lead Systems Operations Engineer focusing on Google Cloud Platform (GCP) L2 Support Engineer with specialized expertise in Google Kubernetes Engine (GKE) or Kubernetes to serve as our Technical Lead. This role is critical in providing advanced technical support, leading the technical team, resolving complex issues, and ensuring the smooth operation of our GCP environments. The ideal candidate will possess deep knowledge of GCP services, particularly GKE/Kubernetes, demonstrate strong problem-solving skills, and have proven leadership abilities.

Job Responsibility:

  • Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams
  • Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
  • Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
  • Make decisions on technical changes and enhancements
  • Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
  • Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
  • Provide L2 support for GCP environments, focusing on GKE/Kubernetes
  • Provide technical guidance and support to team members, ensuring best practices are followed
  • Conduct root cause analysis and implement corrective actions to prevent recurrence
  • Participate in change management processes to ensure minimal disruption to services
  • Develop and maintain scripts and automation tools to streamline support processes
  • Communicate effectively with stakeholders, providing updates on support activities and project progress

Requirements:

  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 3+ years of hands-on experience with GCP services, including GKE/Kubernetes
  • 3+ years of Terraform for provisioning and managing complex cloud infrastructure

Nice to have:

  • Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent experience)
  • Proven experience in a leadership or technical lead role
  • Familiarity with containerization and orchestration tools (Docker, Helm, Istio)
  • Experience with monitoring and logging tools (Stackdriver, Prometheus, Grafana)
  • Proficiency in scripting languages (Python, Bash, etc.) and automation tools (Terraform, Ansible)
  • Ability to mentor and develop junior team members
  • Excellent problem-solving and analytical skills
  • Strong communication and interpersonal skills
  • Ability to work independently and as part of a team
  • Certifications (Preferred): Google Cloud Professional Cloud Architect, Google Cloud Professional DevOps Engineer or Certified Kubernetes Administrator (CKA)
What we offer:

Relocation assistance is not available for this position

Additional Information:

Job Posted:
March 05, 2026

Expiration:
March 06, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead Systems Ops Engineer - GCP

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Engineering Manager, Core Platform & AI Systems

We are looking for a Principal Engineering Manager, Core Platform & AI Systems t...
Location
Location
United States , Seattle
Salary
Salary:
208000.00 - 313000.00 USD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience with 4+ years in engineering leadership roles
  • Experience managing senior/principal engineers, ideally across multiple functional areas
  • Strong technical background in cloud-native distributed systems, platform engineering, or AI/ML infrastructure
  • Proven track record of scaling SaaS platforms and leading teams responsible for mission-critical backend systems
  • Experience working closely with cross-functional teams such as Product, Infrastructure, AI/ML, and Security
  • Deep understanding of reliability, operational excellence, and cost optimization in cloud environments (AWS, Azure, GCP)
  • Excellent communication, collaboration, and executive stakeholder management skills
  • Passion for developing people and building strong, healthy engineering teams
Job Responsibility
Job Responsibility
  • Lead and grow the Core Platform & AI Systems team
  • Drive the technical roadmap for the platform, ensuring scalability, performance, availability, and cost-efficiency
  • Partner closely with product engineering teams to deliver platform capabilities that unlock business features while simplifying the developer experience
  • Collaborate with Data Science, ML, and AI teams to provide robust ML Ops and AI infrastructure that enables rapid experimentation and production-grade AI deployments
  • Own platform-wide reliability and operational health, continuously investing in observability, incident management, and system resilience
  • Contribute to architectural decisions that shape the long-term direction of Highspot’s SaaS platform
  • Attract, retain, and develop top engineering talent, building a high-performing and inclusive team culture
  • Communicate effectively with senior leadership, providing visibility into roadmap progress, technical trade-offs, and organizational needs
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Health Savings Account (HSA) with employer contribution
  • 401(k) Matching with immediate vesting on employer match
  • Flexible PTO
  • 8 paid holidays and 5 paid days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • 18 weeks paid parental leave
  • Access to Coaches and Therapists through Modern Health
  • 2 volunteer days per year
  • Commuting benefits
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager

As Senior Engineering Manager on our team, you will work with our Engineering Le...
Location
Location
Canada , Toronto
Salary
Salary:
200000.00 CAD / Year
flywheeldigital.com Logo
Flywheel Digital
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Progressive years of software engineering experience
  • 3+ years leading teams of Engineering talent from co-op to Staff
  • Strong experience working with Python, Django, Flask, ReactJS, Airflow
  • Strong software engineering fundamentals including systems architecture, algorithms, problem solving
  • Experience building web-based SaaS products
  • Understanding of how the Web and Cloud works
  • Experience working in GCP, AWS or Azure
  • Experience scaling systems that ingest terabytes of data daily
  • Customer obsessed
  • Able to build relationships with a diverse set of internal and external stakeholders
Job Responsibility
Job Responsibility
  • Lead, mentor, and manage a team of Engineers focused on advertising bidding algorithms
  • Build and execute development plans, create and deploy best-in-class processes, and proactively identify and resolve issues
  • Use depth and breadth of technical expertise to ensure platforms being built are scalable, maintainable and extensible
  • Work closely with peers in product, design, production and QA, to ensure seamless execution
  • Actively participate in design and code reviews for the team
  • Work towards building a cohesive team united by best in class engineering principles
  • Help drive a cohesive strategy to take our team and products to the next phase of growth and scale
  • Attract, develop, and retain the next generation of Engineering leaders and top talent in North America
What we offer
What we offer
  • Flexible vacation time
  • Great learning and development opportunities
  • Benefits that help you live your best life
  • Parental leave and benefits
  • Volunteering opportunities
  • Employee Resource Groups (ERGs)
  • Competitive rewards package
  • Unparalleled career growth opportunities
  • Supportive, fun and engaging culture
  • Fulltime
Read More
Arrow Right
New

Senior Systems Operations Engineer

Transform traditional operations into a modern SRE model—building reliability by...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
March 04, 2026
Flip Icon
Requirements
Requirements
  • 4+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Strong experience in large-scale distributed systems
  • 5+ years hands-on SRE/DevOps/Platform Engineering
  • Cloud: One or more—AWS / Azure / GCP (certifications a plus)
  • IaC & Automation: Terraform, Ansible/Chef
  • solid Git practices (GitOps
  • Observability: Prometheus, Grafana, OpenTelemetry, Thousandeyes, Appdynamics, Aternity
  • CI/CD: Azure DevOps, GitHub Actions, Jenkins, or GitLab CI
  • artifact mgmt and environment promotions
  • Programming: One of Python/Go/Java (scripting + API integrations)
Job Responsibility
Job Responsibility
  • Lead or participate in managing all installed systems and infrastructure within the Systems Operations functional area
  • Contribute in increasing system efficiencies and lowering the human intervention time on related tasks
  • Review and analyze moderately complex operational support systems, application software, and system management tools to ensure the highest levels of systems and infrastructure availability
  • Work with vendors and other technical personnel for problem resolution
  • Lead team to meet technical deliverables while leveraging solid understanding of technical process controls or standards
  • Collaborate with vendors and other technical personnel to resolve technical issues and achieve highest levels of systems and infrastructure availability
  • Define and implement SLIs/SLOs and error budgets for critical services
  • drive SLO adoption across teams
  • Build and tune observability (metrics/logs/traces) with golden signals (latency, traffic, errors, saturation)
  • Partner with Performance Engineering to run load/stress/soak tests and remove performance bottlenecks
  • Fulltime
Read More
Arrow Right

Principal Full Stack Cybersecurity Engineer

The Principal Full Stack Cybersecurity Engineer will work with software engineer...
Location
Location
United States , Bellevue
Salary
Salary:
129400.00 - 233400.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree Computer Science or Engineering
  • 7-10 years’ experience in full stack development experience including front end and back end
  • 7-10 years designing database schemas, writing SQL
  • 3+ years DevOps experience with infrastructure as code
  • 4-7 years using cloud services from AWS, Azure or GCP
  • 7-10 years technical engineering experience
  • 1+ years coaching and mentoring team members
  • Expected to be able to setup a completely new full stack environment from scratch including build steps and backend infrastructure
  • Deep knowledge of at least one structured and one scripting language
  • Understands web protocols, how full stack applications operate and data flows
Job Responsibility
Job Responsibility
  • Design new infrastructure and monitor existing systems to ensure security compliance
  • Work with engineers to develop full-stack SW solutions with a focus on security
  • Advise engineering teams on security, compliance, and risk assessments
  • Interface with groups including Cybersecurity, application support, engineering ops, privacy
  • Perform security analysis of existing and new technologies and form recommendations on their use
  • Propose and implement improvements to enhance existing systems and processes
  • Lead the identification of security needs & recommends plans/resolutions
  • Implement, test, and monitor information security improvements
  • Leads information security reviews of Engineering projects and proposals
  • Executes security projects driven by groups both internal and external to Engineering teams
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off
  • Up to 12 paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

AI/ML Technical Lead

We build breakthrough software products that power digital businesses. We are an...
Location
Location
India , Noida
Salary
Salary:
Not provided
3pillarglobal.com Logo
3Pillar Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in AI/ML development, including leading technical teams
  • Strong expertise in agentic AI concepts, multi-agent orchestration, and autonomous tool-using AI systems
  • Hands-on experience with LangChain (chains, agents, custom tools, memory, LLM orchestration)
  • Experience in Agentic AI
  • Experience in Computer Vision
  • Proficiency with modern LLMs (OpenAI, Anthropic, Llama, Mistral, etc.) and fine-tuning methods
  • Deep knowledge of Python and ML frameworks (PyTorch, TensorFlow, Hugging Face)
  • Experience building and deploying RAG systems using vector databases (Pinecone, Chroma, Weaviate, etc.)
  • Strong understanding of ML Ops, CI/CD, containerization (Docker), and cloud platforms (AWS, GCP, Azure)
Job Responsibility
Job Responsibility
  • Lead the design, development, and deployment of advanced AI/ML solutions, including LLM-powered applications
  • Architect and implement agentic AI workflows, multi-agent systems, and autonomous reasoning pipelines
  • Build scalable applications using LangChain, retrieval-augmented generation (RAG), vector databases, and tool integrations
  • Mentor and lead a team of ML engineers, data scientists, and AI developers
  • Collaborate with cross-functional teams (Product, Engineering, Data) to define AI strategies and roadmaps
  • Optimize model performance, latency, reliability, and cost efficiency
  • Evaluate new AI frameworks, libraries, and models for integration into the stack
  • Ensure best practices for code quality, ML Ops, versioning, testing, and monitoring
  • Fulltime
Read More
Arrow Right

Senior Detection Engineer

This is a detection engineering role that leverages knowledge of monitoring, ana...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Sciences or related field or equivalent experience/certification
  • 3+ years of collective experience in Splunk SIEM (Splunk Enterprise Security) threat detection use case development or UEBA (Exabeam) use case development for insider threat use case development
  • 5+ years of experience in security functions such as SOC, CIRT, security engineering, risk management, vulnerability management or technical infrastructure operations, administration, or systems engineering
  • scripting or programming language, including Python
  • Current information security certification such as Certified Information Security Manager (CISM), Certified Information Systems Security Professional (CISSP) preferred
  • offensive and defensive security certifications such as CEH, IGAC Cyber Defense, OSCP or other related certifications preferred
  • Splunk Certification, including Splunk Enterprise Security Certified Admin preferred
  • use case development experience on the Exabeam platform preferred
  • working knowledge of the NIST Cyber Security Framework and ISO/IEC 27001:2022 preferred
  • working knowledge of the MITRE ATT&CK Framework preferred
Job Responsibility
Job Responsibility
  • Lead collaboration sessions within the cyber security tower and other business units to devise security monitoring use cases
  • engage and collaborate with other security engineers and architects as needed to keep pace with the evolution of corporate infrastructure and applications and share that knowledge with peers as appropriate
  • document prospective security monitoring use cases with MITRE ATT&ACK mappings using standard templates and methodologies
  • inform and consult other cyber ops teams of required data onboarding and integrations for use case development
  • develop analytics, correlation searches, dashboards, reports and alerts within the SIEM and UEBA platforms
  • solicit feedback for pre-production security monitoring content through peer review process and user acceptance testing for tuning
  • document developed security monitoring content in a documentation registry using department standard templates and methodologies
  • manage field mapping and transmission of security monitoring alerts to the security incident response platform for SOC analyst consumption as outlined in process documentation
  • provide governance support for the content development function entailing content development standards compliance, change management approvals for SIEM or UEBA content, and lifecycle management of developed security monitoring content
  • service operational requests in queue such as analytics content performance tuning, filtering, search refinement, parsing issues
  • Fulltime
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , North Bethesda
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
  • Track record of identifying and delivering measurable business impact via ML/AI - e.g., revenue growth, cost savings, improved efficiency
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • annual bonus
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right