This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Wells Fargo is seeking a Principal Engineer - Generative Gen AI GPU Infrastructure Capabilities.
Job Responsibility:
Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
Requirements:
7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Design GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for high‑throughput inferencing
document sizing and perf baselines.
Implement Run: AI constructs (Collections/Departments/Projects/workloads) for MDEV/MDEP/UCEP/MRM
codify quota, priority, and fair‑share policies.
POC & benchmark disaggregated inferencing (prefill/decode) with vLLM/TensorRT‑LLM
publish guidance for H100/H200 tuning (FP8/INT8/AWQ) and KV‑transfer behavior over NVLink.
Operationalize OpenShift AI parity for GPU scheduling, time slicing/MIG profiles, and preemption
validate upgrade paths and helm/kustomize packaging.
Integrate Triton Inference Server for multi‑model serving
standardize model repository structure, batching, dynamic shapes, and telemetry hooks.
Harden NGDC environments with AVI/GSLB patterns (Prod1/Prod2) and BCP
execute DR failover runbooks and steady‑state capacity planning.
Own endpoint product ionization via Apigee (AI Gateway)—authN/Z, rate limiting, API SLAs, versioning/deprecation and SDK generation for internal consumers.
Embed observability/evaluations with Overwatch + Arize: prompt/agent/tool tracing, SLO dashboards, alerting, and data‑retention/export workflows.
Automate CI/CD for infra and model artifacts: image scanning (JFrog remote repo), chart releases, canaries, and rollback plans across OCP/GKE.
LLM/SLM Runtimes: Work with vLLM, TensorRT‑LLM, Triton
apply FP8/INT4 quantization
tune KV‑cache strategies. Build POCs for disaggregated prefill/decode, standardize Triton repos, and optimize batching.
Orchestration: Use Run: AI structures (Collections/Departments/Projects), manage OCP/GKE environments. Implement GPU allocation patterns, enforce quotas, preemption, fair‑share scheduling.
OpenShift AI: Configure RHOAI GPU scheduling and time slicing, use helm/kustomize, validate upgrades. Achieve platform parity, certify charts and policies, ensure admission controls function reliably.
API & Gateway: Apply Apigee authN/Z, manage quotas, rate limits, OpenAPI specs, SDK generation, SLA operations. Productionize model endpoints, manage versioning and deprecation, enforce gateway‑level SLAs
Observability & Evaluation: Use Overwatch + Arize for tracing and evals, define SLOs, alerts, retention/export processes. Trace prompts/tools/agents, enforce data retention, publish standardized dashboards.