Staff Engineer, Distributed Storage and HPC & AI Infrastructure Job at Together AI (San Francisco)

New

Staff Engineer, Distributed Storage, HPC & AI Infrastructure

In this role, you will design and deliver multi-petabyte storage systems purpose...

Location

Netherlands , Amsterdam

Salary:

Not provided

Together AI

Expiration Date

Until further notice

Requirements

8+ years in storage engineering with 3+ years managing distributed storage at multi-petabyte scale
Proven track record deploying and operating high-performance storage for GPU/HPC clusters
Deep Kubernetes and cloud-native storage experience in production environments
Strong coding skills in Go and Python with demonstrated ability to build production-grade tools
BS/MS in Computer Science, Engineering, or equivalent practical experience
History of technical leadership: designing systems that significantly improved performance (>3x), reliability (99.9%+ uptime), or cost efficiency
Distributed Storage Systems: Deep expertise in WekaFS, Lustre, GPFS, BeeGFS, or similar parallel filesystems at multi-petabyte scale
Object Storage: Production experience with S3, MinIO, Ceph, or R2 including performance optimization and cost management
Kubernetes Storage: CSI drivers, StatefulSets, PersistentVolumes, storage operators, and custom controllers
Storage optimization for GPU workloads, RDMA/InfiniBand networking, parallel filesystem optimization (100+ GB/s aggregate cluster throughput)

Job Responsibility

Design multi-petabyte AI/ML storage systems
integrate WekaFS, Ceph, etc.
lead capacity planning and cost optimization (30-50% savings via tiering, lifecycle policies, right-sizing)
Design/optimize RDMA, InfiniBand, 400GbE networks
tune for max throughput/min latency
implement NVMe-oF/iSCSI
troubleshoot bottlenecks
optimize TCP/IP for storage
Build Kubernetes storage operators/controllers
enable automated provisioning, self-service abstractions, multi-tenant isolation, quotas

New

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Strong background in one or more of the following areas: AI accelerator or GPU architectures
Distributed systems and large-scale AI training/inference
High-performance computing (HPC) and collective communications
ML systems, runtimes, or compilers
Performance modeling, benchmarking, and systems analysis
Hardware–software co-design for AI workloads
Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.

Job Responsibility

Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.

Fulltime

New

Summer 2026 Network Engineering Internship

This is an 11-week paid learning experience during which you’ll be able to conne...

Location

United States , Bellevue

Salary:

26.00 - 47.00 USD / Hour

T-Mobile

Expiration Date

Until further notice

Requirements

At least 18 years of age
Legally authorized to work in the United States
Must be actively enrolled in a Bachelors or Graduate degree program
Employees of T-Mobile or Metro by T-Mobile are ineligible for Internships
Interest in Systems Architecture, Cybersecurity
Passionate about protecting customers

Job Responsibility

Gain an understanding of the T-Mobile Consumer Identity Architecture
Collaborate with cross-functional teams to understand business requirements as they pertain to Consumer Identity
Collaborate with the existing team to design end-to-end Consumer Identity solutions to integrate T-Mobile’s new products and services
Create requirements to develop new features in the Ericsson IAM product
Assist the team with testing of new Ericsson IAM features in the NQE environment
Drive the implementation of new configurations as needed by IAM clients
Interpret transaction logs to detect fraudulent activities

What we offer

Hands-on experience
Training
Networking with other interns and leaders
Mentorship
Hands-on projects
Chance to create an immediate impact
Relocation assistance may be provided to program participants who reside more than 50 miles from the internship location

Fulltime

New

Maintenance Electrician

Estates Management is seeking to recruit suitably qualified and experienced elec...

Location

United Kingdom , Wolverhampton

Salary:

29588.00 - 32080.00 GBP / Year

University of Wolverhampton

Expiration Date

March 02, 2026

Requirements

Served a registered apprenticeship or equivalent training in the electrical installation trade
Experience ideally in an industrial/commercial environment
Must hold an NVQ Level 3 or approved equivalent in Electrical Installation
Must hold the City and Guilds 2360 Electrical Installation Theory Part 2 Course or approved equivalent
Must have completed a recognised course on BS7671 :18th edition of the I.E.T. Regulations up to and including all current amendments
Post apprenticeship /training experience required with evidence of appropriate competency
Must hold a current driving licence

Job Responsibility

Undertake electrical work including fault finding, repairs and alterations to the various building services throughout the University
Participate in an "on call" rota
Travel throughout the University estate as well as in and around the West Midlands

What we offer

Market supplement up to £1,800

Fulltime

New

Telehealth Nurse Practitioner

Visana Health is an innovative virtual women's health clinic offering comprehens...

Location

Salary:

50.00 - 65.00 USD / Hour

Visana Health

Expiration Date

Until further notice

Requirements

Experience as a Nurse Practitioner in women’s health (experience and/or certification) licensed with prescriptive authority/independent practice in good standing and without history of discipline or sanctions
Minimum of 1-2+ state licenses (additional licenses welcome)
Board certification in a related specialty (adult, family, midwife, women’s health, etc.)
DEA License
Confident learning new technology and expanding clinical knowledge
Ability to learn and adapt to a treatment philosophy where we educate and support patients
Ability to appear on camera via video conferencing tools to see patients

Job Responsibility

Direct patient care in virtual synchronous clinic visits, during clinic hours which are M-Sat 7am to 10 pm EST
Timely patient follow-up work including, but not limited to, medical leadership case consults, responding to patient messages or making phone calls to patients when required
Reviewing test results and ensuring proper patient notification in compliance with practice standards timelines
Chart note documentation and completed billing within 24 hours of the patient visit
Participation in weekly clinical meetings
Participate in group practice style environment, offering guidance and support as your expertise allows

What we offer

100% remote telehealth visits that emphasize ample time to address the patients' needs
Flexible schedule, with evening and weekend hours desired
Weekly clinical meetings to provide medical training and support the collaborative practice environment

Fulltime

New

SAP Service Delivery Manager

We are looking for a SAP Service Delivery Manager to ensure stable, compliant, a...

Location

Colombia , Medellín

Salary:

85000.00 - 90000.00 COP / Year

Algoteque

Expiration Date

Until further notice

Requirements

Proven experience in SAP S/4HANA service delivery or operations
Strong understanding of S/4HANA architecture, processes, and modules
Experience in international, or regulated SAP environments
ITIL-based service management knowledge
Excellent stakeholder management, communication, and leadership skills

Job Responsibility

Own end-to-end service delivery for SAP S/4HANA systems
Ensure availability, performance, and stability of the S/4HANA landscape
Manage SLAs, KPIs, service reviews, and continuous improvement initiatives
Act as escalation point for S/4HANA-related incidents, risks, and operational issues
Coordinate internal IT teams, AMS providers, and business stakeholders
Ensure compliance with security, audit, and governance standards
Support transition from project delivery to S/4HANA run and hypercare
Drive standardization, automation, and optimization of service processes

Fulltime

New

Senior SAP CPI / Integration Consultant

We are looking for a Senior SAP S/4HANA CPI / Integration Consultant to implemen...

Location

Colombia , Medellín

Salary:

85000.00 - 90000.00 COP / Year

Algoteque

Expiration Date

Until further notice

Requirements

Hands-on experience with SAP CPI / Integration Suite in S/4HANA environments
Knowledge of REST, SOAP, IDocs, and event-based messaging
Experience integrating S/4HANA with cloud and on-premise systems
Strong analytical, problem-solving, and communication skills

Job Responsibility

Design, develop, and maintain SAP S/4HANA integrations using SAP CPI / Integration Suite
Implement APIs, message mappings, and event-based integrations
Ensure secure, scalable, and reliable integration architecture
Support integration testing, troubleshooting, and issue resolution
Collaborate with SAP and non-SAP teams to align integration solutions
Contribute to integration standards, governance, and best practices

Fulltime

New

Specialist, Brand and Social Marketing

This position plays a key role in helping to plan, execute and track strategic m...

Location

United States , Fort Myers

Salary:

Not provided

Chico's FAS, Inc.

Expiration Date

Until further notice

Requirements

High School Diploma required, AA or bachelor’s preferred
BA/BS degree in marketing or related field required
1-3 years’ experience within marketing and social media
retail experience a plus
Highly organized with strong attention to detail
Strong written and verbal communications
Ability to think strategically and independently
Exceptional attention to detail and ability to effectively multi-task in a deadline driven atmosphere
Solid interpersonal and communication skills
Ability to interact with a diverse team of people, including all levels of leadership, remote teams and agencies

Job Responsibility

Assist in execution of seasonal marketing requirements, including briefing and tracking of creative assets to support owned channels
Partner with internal teams including, but not limited to PR/Influencer, Creative, Ecommerce, Merchandising, and Store Ops to ensure alignment and execution of key marketing strategies
Schedules all organic social media posts via Sprinklr platform and collaborates closely with Social Manager and creative team on execution
Assists with brand marketing and social media insights to inform go-forward strategies to drive the business
Oversee mall marketing initiatives including store openings and closings
Supports the goals of the Marketing team by performing ad hoc duties as assigned
Monitors Chico’s social pages for customer sentiment and relays to cross functional teams as needed
Assists Marketing leadership with department presentations, meeting preparation, channel reporting, and recaps

Fulltime

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

Together AI

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Staff Engineer, Distributed Storage and HPC & AI Infrastructure

Staff Engineer, Distributed Storage, HPC & AI Infrastructure

Member of Technical Staff, Software Co-Design AI HPC Systems