CrawlJobs Logo

Technical Staff Storage Architect

United States, Santa Clara 266000.00 - 342000.00 USD / Year · Job Posted March 20, 2026
Apply Position
Job Link Share

Job Description

From applied research to advanced engineering, the ISG CTO Engineering Technologist team has the expertise to shape ground-breaking products, material and processes. It’s a fascinating field of work. We’re involved in assessing the competition, developing technology and product strategies and generating intellectual property. We lead technology investigations, analyze industry capabilities and recommend potential acquisitions or vendor partner opportunities. Our insights influence product architecture and definitions. And we work with colleagues across the business to ensure our products always lead the way.

Job Responsibility

  • Serve as the technical leader interfacing with enterprise customers to define architectures, understand workload requirements, and translate them into comprehensive solution designs
  • Lead investigations new fabric technologies, and storage platforms to support large‑scale AI deployments
  • Develop and propose solution configurations, BOMs, and SKUs based on customer needs in collaboration with Engineering
  • Architect AI-optimized storage and data management solutions integrating optimized, scale-out file, object, and parallel filesystem SDS technologies
  • Influence end-to-end AI infrastructure stacks—from software frameworks through compute, storage, networking, data security, and datacenter deployment (L10–L12)
  • Engage with strategic technology partners to co-innovate and build differentiated solutions and influence multi-year technology and product roadmaps with inputs from engineering, manufacturing, procurement, customers, and the broader industry
  • Serve as a technical escalation point for complex development challenges and represent Dell in internal and external architecture forums and generate intellectual property, including invention disclosures and technology strategy insights

Requirements

  • 10+ years of experience in storage architecture, data platforms, or large-scale distributed systems including expertise in scale-out file/object storage, parallel filesystems, and key storage concepts (NFS/S3, erasure coding, deduplication, encryption, tiering, backup, versioning)
  • Deep understanding of AI/ML workflows, data pipelines, training/inference patterns, and enterprise cluster deployments along with proven ability to collaborate with ISVs on data management and AI workflow integration
  • Strong ability to define storage configurations to meet production grade performance and SLA requirements
  • Excellent communication skills with experience presenting to cross functional teams and senior leaders
  • Demonstrated experience working with industry vendors to influence next generation technology design
  • Ability to translate business outcomes into practical, scalable technical architectures
  • willingness to travel 20–30%

Nice to have

  • Masters, PhD or Bachelor’s degree or equivalent experience
  • Knowledge of AI/ML or HPC performance engineering, including benchmarking and scripting (e.g., Python, Bash) on Linux
  • Hands-on experience with datacenter technologies, including fabrics, cooling, telemetry, or cluster orchestration

What we offer

  • Comprehensive Healthcare Programs
  • Award Winning Financial Wellness Tools and Resources
  • Generous Leave of Absence for New Parents and Caregivers
  • Industry Leading Wellness Platform
  • Employee Assistance Program

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Technical Staff Storage Architect

8 matching positions

Member of Technical Staff, Software Engineer

Help build the infrastructure that powers training, evaluation, and data platfor...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering background building reliable, scalable production systems (Python preferred)
  • Hands‑on experience supporting large‑scale ML / LLM training, evaluation, or experimentation infrastructure
  • Operating GPU‑heavy workloads in cloud environments using Docker and Kubernetes (scheduling, utilization, isolation)
  • Designing and running data / compute pipelines and orchestration (e.g., Airflow, Argo) with object storage (Azure Blob / S3)
  • Platform reliability and operability: observability, metrics, logging, tracing, alerting (Prometheus, Grafana, OpenTelemetry)
Job Responsibility
Job Responsibility
  • Design and build core platform services for scalable training and evaluation, including cluster orchestration, job scheduling, data and compute pipelines, and artifact management
  • Standardize containerized workflows by maintaining Docker images, CI/CD, and runtime configurations
  • advocate for best practices in security, reproducibility, and cost efficiency
  • Implement end-to-end observability and operations through metrics, tracing, logging, dashboard development, monitoring, and automated alerts for model training and platform health (using Prometheus, Grafana, OpenTelemetry)
  • Architect and operate services on Azure cloud platforms, managing infrastructure-as-code (Terraform/Helm), secrets, networking, and storage
  • Enhance developer experience by creating tools, CLIs, and portals that simplify job submission, metrics analysis, and experiment management for generalist software engineering and research teams
  • Enforce security and compliance policies for data access, container hardening, and supply-chain integrity, and partner with security and privacy teams to maintain robust practices in multi-tenant environments and secret management
  • Collaborate cross-functionally with data, model, and product teams to align infrastructure roadmaps with training needs, evaluation protocols, and Copilot product goals
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Platform

If you are excited by the challenge of designing distributed systems that proces...
Location
Location
United States , Mountain View; Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 3+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience
  • Proficiency in Python, Scala, Java, or Go
  • Deep Distributed Systems Knowledge: Demonstrated technical understanding of massive-scale compute engines (e.g., Apache Spark, Flink, Ray, Trino, or Snowflake)
  • Experience architecting Lakehouse environments at scale (using Delta Lake, Iceberg, or Hudi)
  • Experience building internal developer platforms or "Data-as-a-Service" APIs
  • Strong background in streaming technologies (Kafka, Azure EventHubs, Pulsar) and stateful stream processing
  • Experience with container orchestration (Kubernetes) for deploying data applications
  • Experience enabling AI/ML workloads (Feature Stores, Vector Databases)
Job Responsibility
Job Responsibility
  • Core Platform Engineering: Design and build the underlying frameworks (based on Spark/Databricks) that allow internal teams to process massive datasets efficiently
  • Distributed Systems Architecture: Modernize our data stack by moving from batch-heavy patterns to event-driven architectures
  • Unstructured AI Data Pipelines: Architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets
  • AI Feedback Loops: Engineer the high-throughput telemetry systems that capture user interactions with Copilot
  • Infrastructure as Code: Treat the data platform as software. Define and deploy all storage, compute, and networking resources using IaC (Bicep/Terraform)
  • Data Reliability Engineering: Move beyond simple "validation checks" to build automated governance and observability systems
  • Compute Optimization: Deep-dive into query execution plans and cluster performance. Optimize shuffle operations, partition strategies, and resource allocation
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Infrastructure Data & Analytics

We are seeking experienced Infrastructure Data & Analytics Engineers to join our...
Location
Location
United States , Multiple Locations; Mountain View; San Francisco Bay area; New York City metropolitan area
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical field AND 8+ years technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 6+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 10+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Proven technical leadership in data engineering, analytics platforms, or large-scale telemetry systems
  • Hands-on experience with ETL orchestration frameworks such as Airflow, Dagster, or similar
  • Strong communication skills
  • can explain complex systems clearly to senior leader
Job Responsibility
Job Responsibility
  • Act as the technical lead and owner for infrastructure analytics across compute, storage, and networking
  • Design and build durable, scalable data pipelines that ingest telemetry from clusters, schedulers, health systems, and capacity trackers into Data Warehouse
  • Define and standardize core metrics and semantics (e.g., utilization, occupancy, MFU, goodput, capacity readiness, delivery-to-production)
  • Architect and maintain self-service dashboards and APIs for fleet, cluster, and squad-level visibility
  • Partner closely with stakeholders across Supercomputing Infra, Researchers, Strategy and Executives to ensure metrics reflect operational and business reality
  • Implement robust and fault-tolerant systems for data ingestion and processing
  • Lead data architecture and engineering decisions, applying strong technical judgment to proactively shape executive-level discussions and decisions
  • Identify data gaps and instrumentation issues
  • drive fixes by influencing upstream engineering teams
  • Establish data quality, validation, documentation, and governance so metrics are trusted and repeatable
  • Fulltime
Read More
Arrow Right

HPC Software Technical Architect

HPC Software Technical Architect role focused on delivering innovative solutions...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science or equivalent
  • Typically 12+ years experience
  • Excellent analytical and problem solving skills
  • Should have very good systems knowledge including hardware, firmware and Operating System
  • Linux systems knowledge with Python and other languages
  • Good understanding of Network boot technologies (PXE or gPXE/Etherboot etc)
  • Storage specific knowledge: LVM, RAID, iSCSI, Disk partitioning (GPT, MBR)
  • Exposure to Opensource community and software
  • Knowledge of shared storage like CFS and High Availability solutions
  • Using, evaluating, and developing appropriate engineering design tools and software packages
Job Responsibility
Job Responsibility
  • Develops organization-wide architectures and methodologies for design and development of complex products, platforms, systems, software, and technology across multiple engineering disciplines
  • Identifies and evaluates new technologies, innovations, and outsourced development partner relationships for alignment with technology roadmap and business value
  • Reviews and evaluates designs and project activities for compliance with technology and development guidelines and standards
  • Leverages recognized domain expertise to influence decisions of executive business leadership, outsourced development partners, and industry standards groups
  • Provides guidance and mentoring to less-experienced staff members
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, AI Training Infrastructure

As a Training Infrastructure Engineer, you'll design, build, and optimize the in...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
  • 3+ years of experience with distributed systems and ML infrastructure
  • Experience with PyTorch
  • Proficiency in cloud platforms (AWS, GCP, Azure)
  • Experience with containerization, orchestration (Kubernetes, Docker)
  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for large-scale model training workloads
  • Develop and maintain distributed training pipelines for LLMs and multimodal models
  • Optimize training performance across multiple GPUs, nodes, and data centers
  • Implement monitoring, logging, and debugging tools for training operations
  • Architect and maintain data storage solutions for large-scale training datasets
  • Automate infrastructure provisioning, scaling, and orchestration for model training
  • Collaborate with researchers to implement and optimize training methodologies
  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
  • Troubleshoot complex performance issues in distributed training environments
What we offer
What we offer
  • meaningful equity in a fast-growing startup
  • comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...
Location
Location
United States , New York, NY; San Mateo, CA; Redwood City, CA
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Strong software development skills in languages like Python, or C++
  • Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization
Job Responsibility
Job Responsibility
  • Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
  • Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
  • Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
  • Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
  • Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
  • Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
  • Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer : Storage, Search, & Data Platforms

The Storage, Search, and Data (SSD) group is the custodian of Uber's digital int...
Location
Location
United States , Seattle; San Francisco; Sunnyvale
Salary
Salary:
232000.00 - 258000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of software engineering experience, with a proven history of designing and operating massive-scale distributed data systems
  • Elite engineering skills in Go, Java, C++, or Rust. You are comfortable deep-diving into database internals, kernel-level optimizations, and complex distributed consensus protocols
  • Proven experience leading technical strategy across multiple teams or organizations, turning high-level business goals into concrete technical realities
  • Extensive experience managing Tier-0, mission-critical systems with 99.99% availability and global blast-radius constraints
Job Responsibility
Job Responsibility
  • Define and execute the multi-year roadmap to transition Uber from Data Storage to a Cloud-Native Data Provider, solving for cross-region latency, global metadata consistency, and exabyte-scale cost efficiency
  • Partner with Uber's AI/ML leadership to architect the Data-to-GPU pipeline. You will design the one-stop storage APIs that allow researchers to leverage high-performance data access across multi-cloud regions and vendors seamlessly
  • Drive the next generation of our core engines: Docstore (NoSQL), Vitess (Sharded MySQL), Apache Pinot (Real-time Analytics), and OpenSearch (Discovery)
  • You will represent Uber in the global community as a leader in key open source technologies including Apache, Hudi, Iceberg and many others
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible to participate in a 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Technical Support Engineer Staff

Solves technical issues across a broad range of technologies (Servers, Storage, ...
Location
Location
United States , San Jose
Salary
Salary:
89400.00 - 206500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in a related area of study with typically 7 - 10 years’ experience
  • Proficiency in designing, integrating, and troubleshooting cloud services hosted on hybrid cloud platforms such as HPE Greenlake, Azure Stack HCI
  • Experience with various public clouds, i.e. Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
  • Understanding of cloud architecture, microservices communication, and comprehensive working knowledge of cloud building blocks from computing, storage, networking, and databases
  • Strong knowledge of both Linux and Windows operating systems
  • Strong knowledge of Virtualization and Container platforms
  • Understanding of networking components such as DNS, TCP/IP, VPNs, firewalls, and network security products at the design and implementation levels
  • Experience with DevOps framework and toolsets from prominent cloud providers, as well as from the open-source world
  • Experience with automation tools and frameworks, such as Ansible, Chef, or Terraform
  • Designing level knowledge of relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, DynamoDB)
Job Responsibility
Job Responsibility
  • Provide consultative technical support for cloud services and infrastructure
  • Address complex customer inquiries, troubleshoot issues involving multiple cloud services, and resolve technical problems
  • Collaborate and elevate/report issues to relevant teams in a timely manner when necessary
  • Ensure cloud resources and services to ensure optimal performance and availability
  • Collaborate with cloud architects and developers to implement new cloud solutions, report new issues, and gain insights on any potential underlying issues
  • Communicate effectively with customers, partners, and internal stakeholders to drive the issue resolution
  • Document customer interactions, technical issues, and key learnings in support tickets or knowledge bases
  • Manage and monitor the health status of cloud platform services and advise customers and support teams on performing regular maintenance and updates on cloud solutions
  • Analyze security patch levels of various cloud services and ensure cloud environments are secure and compliant with industry standards
  • Identify and implement improvements to enhance system reliability and performance
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right