CrawlJobs Logo

Lead Infrastructure Platform Support Engineer

https://www.wellsfargo.com/ Logo

Wells Fargo

Location Icon

Location:
United States , Charlotte

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119000.00 - 224000.00 USD / Year

Job Description:

Wells Fargo is seeking a Lead Infrastructure Engineer to join our AI Platforms and model Support Group as part of Digital Technology and Innovations. Learn more about the career areas and business divisions at wellsfargojobs.com. The Lead Infrastructure Engineer is responsible for designing, building, and operating highly scalable, resilient infrastructure Production platforms that support enterprise Generative AI and Predictive AI workloads. This role provides technical leadership across GPU-accelerated environments, OpenShift/Kubernetes platforms, and advanced AI infrastructure patterns, including large AI factory scale GPU compute architectures. The engineer partners closely with platform, application, and vendor teams to ensure secure, performant, and production-grade AI solutions.

Job Responsibility:

  • Lead complex initiatives to develop infrastructure to provide solutions for business applications
  • Participate in various projects intended to continually improve or upgrade the infrastructure
  • Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals
  • Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future
  • Design, build, deploy and maintain infrastructure solutions through collaborative efforts with the team and third party vendors
  • Design, code, test, debug and document programs using Agile development practices
  • Make decisions in technical designs, implementation plans and identify project risks and resource requirements
  • Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
  • Recommend courses of action to maintain cost effectiveness and achieve results
  • Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals
  • Interact with customer and vendor

Requirements:

  • 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years troubleshooting complex end-to-end architectures (including CI/CD pipeline)
  • 5+ years Linux systems experience
  • 4+ years supporting AI/ML platforms
  • 4+ years of Kubernetes / container platform experience including production support

Nice to have:

  • Experience with Generative AI and Predictive AI platforms
  • Hands-on GPU platform operations including scheduling, quota, and performance tuning
  • Experience with OpenShift in GPU-enabled, multi-tenant environments
  • Experience designing or operating GPU SuperPods
  • Deep experience with observability using Grafana, Splunk, and custom telemetry pipelines
  • Experience building AI- or agent-driven automation tooling (AIOps)
  • Hands-on experience supporting AI/ML workloads on GCP and Azure, including GPU-backed services and managed AI infrastructure
  • Experience operating hybrid or multi-cloud AI platforms, with an understanding of cloud-native services, networking, identity, and cost optimization for Generative and Predictive AI
  • Strong monitoring of AI signals such as inference latency and GPU utilization
  • Experience with BCP/DR, resiliency, and highly available architectures
What we offer:
  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Additional Information:

Job Posted:
May 06, 2026

Expiration:
May 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead Infrastructure Platform Support Engineer

Director of Engineering, Platform Engineering

In your role as ‘Director of Engineering, Platform Engineering’ you will guide t...
Location
Location
United States , Oakland, California
Salary
Salary:
241000.00 - 305000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 4 years of experience managing and leading senior engineers, including technical workstream management and execution support
  • At least 2 years of experience managing and leading managers, coaching them on talent management, strategic planning, and execution, with a focus on platform engineering teams
  • At least 5 years of experience as a senior engineer building one or more of - developer productivity tools, highly available platform services (i.e. storage systems, pub-sub systems, search systems, caching solutions, observability solutions) and/or have expertise and experience with infrastructure and/or cloud technologies (like Ansible, Terraform, Kubernetes, Docker etc)
  • You have a good dynamic range that you apply to different situations - you can step back and empower, while also diving deep into the code to understand the details
  • You can communicate at the right altitude with both technical and non-technical stakeholders
  • You have experience working with stakeholder teams (internal and/or external) in setting and collaborating on technical roadmaps
  • You have experience communicating with customers articulating to them how the platform works on reliability, security and compliance matters
  • You have a BS/MS or PhD in Computer Science (or equivalent)
  • You have a sound foundational understanding of a wide range of computer science topics and concerns relating to system and software design
  • You are authorized to work in the United States
Job Responsibility
Job Responsibility
  • Inspire and empower your managers to cultivate high-performing teams, fostering a culture of continuous feedback and professional growth to ensure successful project delivery and career development
  • Use your technical knowledge to align stakeholders across Engineering and Product on the ideal path forward on complex technical decisions and roadmap decisions
  • Strategize, prioritize, resource, and execute against our Engineering roadmap
  • Work with Engineering Operations, cross-functional teams, team members and managers to improve various processes that affect infrastructure growth, support, alignment, collaboration, and accountability
  • Critically observe and understand Everlaw’s platform, tooling, and processes
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Downtown Oakland, just steps from the BART line and dozens of restaurants
  • Fulltime
Read More
Arrow Right

Bigdata Support Lead Engineer

The Lead Data Analytics analyst is responsible for managing, maintaining, and op...
Location
Location
India , Bengaluru; Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of total IT experience
  • 5+ years of experience in supporting Hadoop (Cloudera)/big data technologies
  • 5+ years of experience in public cloud infrastructure (AWS or GCP)
  • Experience with Kubernetes and cloud-native technologies
  • Experience with all aspects of DevOps (source control, continuous integration, deployments, etc.)
  • Advanced knowledge of the Hadoop ecosystem and Big Data technologies
  • Hands-on experience with the Hadoop eco-system (HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, Solr)
  • Knowledge of troubleshooting techniques for Hive, Spark, YARN, Kafka, and HDFS
  • Advanced Linux system administration and scripting skills (Shell, Python)
  • Experience on designing and developing Data Pipelines for Data Ingestion or Transformation using Spark with Java or Scala or Python
Job Responsibility
Job Responsibility
  • Lead day to day operation and support for Cloudera Hadoop ecosystem components (HDFS, YARN, Hive, Impala, Spark, HBase, etc)
  • Troubleshoot issues related to data ingestion, job failures, performance degradation and service unavailability
  • Monitor cluster health using Cloudera Manager and respond to alerts, logs, and metrics
  • Collaborate with engineering teams to analyze root causes and implement preventive measures
  • Collaborate patching, service restarts, failovers and rolling restarts for cluster maintenance
  • Assist in user onboarding, access control and issues in accessing the cluster services
  • Contribute to documentation for knowledge base
  • Work on data recovery, replication, and backup support tasks
  • Responsible for moving all legacy workloads to cloud platform
  • Ability to research and assess open-source technologies, public cloud tech stack (AWS/GCP) components to recommend and integrate into the design and implementation
  • Fulltime
Read More
Arrow Right

Managed Airflow Platform (MAP) Support Engineer

Location
Location
Salary
Salary:
Not provided
kloud9.nyc Logo
Kloud9
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • 3+ years of experience in large-scale production-grade platform support, including participation in on-call rotations
  • 3+ years of hands-on experience with cloud platforms like AWS, Azure, or GCP
  • 2+ years of experience developing and supporting data pipelines using Apache Airflow including DAG lifecycle management and scheduling best practices
  • Troubleshooting task failures, scheduler issues, performance bottlenecks managing and error handling
  • Strong programming proficiency in Python, especially for developing and troubleshooting RESTful APIs
  • 1+ years of experience in observability using the ELK stack (Elasticsearch, Logstash, Kibana) or Grafana Stack
  • 2+ years of experience with DevOps and Infrastructure-as-Code tools such as GitHub, Jenkins, Docker, and Terraform
  • 2+ years of hands-on experience with Kubernetes, including managing and debugging cluster resources and workloads within Amazon EKS
  • Exposure to Agile and test-driven development a plus
Job Responsibility
Job Responsibility
  • Evangelize and cultivate adoption of Global Platforms, open-source software and agile principles within the organization
  • Ensure solutions are designed and developed using a scalable, highly resilient cloud native architecture
  • Ensure the operational stability, performance, and scalability of cloud-native platforms through proactive monitoring and timely issue resolution
  • Diagnose infrastructure and system issues across cloud environments and Kubernetes clusters, and lead efforts in troubleshooting and remediation
  • Collaborate with engineering and infrastructure teams to manage configurations, resource tuning, and platform upgrades without disrupting business operations
  • Maintain clear, accurate runbooks, support documentation, and platform knowledge bases to enable faster onboarding and incident response
  • Support observability initiatives by improving logging, metrics, dashboards, and alerting frameworks
  • Advocate for operational excellence and drive continuous improvement in system reliability, cost-efficiency, and maintainability
  • Work with product management to support product / service scoping activities
  • Work with leadership to define delivery schedules of key features through an agile framework
What we offer
What we offer
  • Kloud9 provides a robust compensation package and a forward-looking opportunity for growth in emerging fields
Read More
Arrow Right

Morpheus Cloud Support Engineer

As a Morpheus Cloud Support Engineer, you will provide technical assistance and ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of proven experience as a Cloud Support Engineer or in a similar position
  • At least 5 years of experience in Morpheus Cloud Management Platform
  • Bachelor’s degree in computer science, Information Technology, or a related field
  • Strong understanding of cloud systems, including VMware, KVM, AWS, and Azure
  • Experience with cloud infrastructure as code (IaC) technologies such as Terraform or CloudFormation
  • Experience with containerization and orchestration systems such as Docker and Kubernetes
  • Excellent problem-solving and troubleshooting abilities
  • Strong communication skills, with the ability to clearly convey technical information to both technical and non-technical stakeholders
  • Hands-on experience in Morpheus Cloud Management Platform
  • Proficiency with Windows Server, Ubuntu, RHEL, HPE VME, Centos
Job Responsibility
Job Responsibility
  • Provide technical assistance with cloud infrastructure and services in Morpheus CMP
  • Monitor and maintain infrastructure systems to guarantee their availability and performance
  • Troubleshoot and address issues with cloud infrastructure
  • Work with the development and operations teams to optimize cloud solutions
  • Assist with the deployment and setup of cloud resources
  • Develop and maintain comprehensive documentation for cloud systems, including architecture diagrams, operational procedures, and troubleshooting guides
  • Analyze cloud system performance metrics and logs to identify trends, forecast needs, and recommend improvements or upgrades
  • Collaborate with Product Managers, Developers, Operations to understand requirements, use cases and transform them into tests
  • Handle P1 situations in Cloud Infra
  • Provide technical and architectural leadership for the infrastructure Engineering teams and Operations roles
What we offer
What we offer
  • Comprehensive suite of benefits for physical, financial, and emotional wellbeing
  • Career development programs
  • Inclusive work culture
  • Fulltime
Read More
Arrow Right

Senior Director of Platform Engineering

Lead the Future of Platform Engineering at Modus Create. As Senior Director of P...
Location
Location
United States of America
Salary
Salary:
Not provided
moduscreate.com Logo
Modus Create
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Platform Engineering/DevOps
  • 7+ years in senior engineering leadership
  • ideally in consulting or high-growth tech environments
  • a clear point of view on modern architecture, engineering best-practices, and agile delivery
  • proven experience scaling distributed global teams and platform engineering operations
  • strong pre-sales and delivery experience
  • able to shape winning proposals and roadmaps
  • a customer-first mindset and passion for solving complex problems with elegant, scalable solutions
  • excellent communication and collaboration skills in cross-functional and cross-cultural environments
  • a history of growing leaders and fostering high-trust, high-performance teams
Job Responsibility
Job Responsibility
  • Lead and scale a high-performing, distributed platform engineering team through strong mentorship and inclusive leadership
  • define what great looks like—through reusable runbooks, technical standards, and nurturing a culture grounded in quality, belonging, and continuous learning
  • help clients modernize platforms, launch new infrastructure, and make better innovation investment decisions
  • ensure every solution is aligned with client goals and drives measurable value
  • own and evolve our delivery frameworks, platform engineering standards, and team operations
  • champion cloud-native development, DevOps and SRE best practices, and scalable architecture
  • partner with Sales, Partnerships, and Client Executives to shape and win new opportunities
  • translate client needs into technical solutions, delivery plans, and estimates
  • lead development of proposals, estimation, and pre-sales architecture discussions
  • develop reusable solution assets, infrastructure templates and case studies for future engagements
What we offer
What we offer
  • Remote work with flexible working hours
  • Modus Global Office Programme: on-demand access to private offices, meeting rooms, coworking spaces and business lounges in locations in over 120 countries
  • Employee Referral Program
  • Client Referral Program
  • Travel according to client or team needs
  • The chance to work side-by-side with thought leaders in emerging tech
  • Access to more than 12,000 courses with a licensed Coursera account
  • Possibility to obtain paid certification/courses if they align with company goals and are relevant to the employee's role
  • Fulltime
Read More
Arrow Right

Engineering Manager - Product & Platform

Arize AX is the AI & Agent Engineering Platform – one place to develop, evaluate...
Location
Location
United States
Salary
Salary:
180000.00 - 250000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You’ve been both an IC and a manager – you know what great engineering looks like at the system and code level, and you know how to build and lead teams
  • Strength across both product/fullstack engineering and backend/infrastructure – comfortable moving between customer-facing workflows and the systems that power them
  • Experience balancing people leadership, project execution, and technical depth
  • Clear, direct communicator who builds trust across teams
Job Responsibility
Job Responsibility
  • Contribute as an engineer on complex product and infrastructure challenges – building features that customers touch and scaling the systems behind them
  • Lead a team of engineers and tech leads – hiring, mentoring, and creating the conditions for them to do the best work of their careers
  • Drive projects end-to-end – ensuring scope is clear, trade-offs are well understood, and delivery is predictable
  • Work cross-functionally with Product and Design to set direction, and with Solutions and Support to make sure we’re solving the real problems our customers face in production
What we offer
What we offer
  • medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, others for mental and wellness support, competitive equity package, WFH monthly stipend to pay for co-working spaces
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Platform Engineer

Motorica is at a breakthrough moment. We’ve built a generative AI animation plat...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
motorica.ai Logo
Motorica
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in Platform Engineering, SRE, or DevOps, ideally in high-growth or AI/ML-heavy environments
  • Strong grasp of CI/CD systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes)
  • Familiarity with observability, monitoring, and incident response best practices
  • Security mindset with hands-on experience in audits, compliance (ISO 27001, SOC2, etc.), and vulnerability management
  • Strong communication skills
  • you’ll be interfacing with developers daily and need to translate infrastructure into clarity, not complexity
  • A proactive, solution-oriented mindset: you anticipate friction before others feel it
Job Responsibility
Job Responsibility
  • Provide common infrastructure guidance, reusable patterns, and automated tooling to engineering teams
  • Own the “paved road” for developers, reducing friction and cognitive load
  • Champion and implement security best practices across the entire platform
  • Play a key role in achieving ISO 27001 certification through technical implementation and evidence gathering
  • Build and operate a highly reliable and cost-efficient platform, with particular focus on optimizing GPU-heavy AI/ML workloads
  • Manage CI/CD systems (GitHub Actions, GitLab CI) and track key metrics like build times, deployment frequency, and failure rates
  • Oversee cloud environments (AWS, GCP), including health, security, and cost reporting
  • Lead security scans, audits, and vulnerability remediation
  • Maintain observability stack (Prometheus, Grafana, Datadog, GCP Logging), ensuring meaningful dashboards and alerts
  • Act as point-of-contact for ML Research team’s infra requests (GPU access, specialized pipelines)
What we offer
What we offer
  • Stock Options program
  • Retirement Plan
  • Health Benefits (5000 SEK/year)
  • Life Insurance / Health Insurance / Injury Insurance
  • Competitive compensation
  • Fulltime
Read More
Arrow Right