Member of Technical Staff, GPU Optimization Job at Runway

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...

Location

United States , San Mateo

Salary:

175000.00 - 220000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience working on performance optimization or high-performance computing systems
Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
Familiarity with PyTorch and performance-critical model execution
Experience with distributed system debugging and optimization in multi-GPU environments
Deep understanding of GPU architecture, parallel programming models, and compute kernels

Job Responsibility

Optimize system and GPU performance for high-throughput AI workloads across training and inference
Analyze and improve latency, throughput, memory usage, and compute efficiency
Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
Implement low-level optimizations using CUDA, Triton, and other performance tooling
Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
Improve support for mixed precision, quantization, and model graph optimization
Build and maintain performance benchmarking and monitoring infrastructure
Scale inference and training systems across multi-GPU, multi-node environments
Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes

What we offer

Meaningful equity in a fast-growing startup
Competitive salary
Comprehensive benefits package

Fulltime

Member of Technical Staff - GPU Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier age...

Location

United States , San Francisco

Salary:

Not provided

Prime Intellect

Expiration Date

Until further notice

Requirements

3+ years hands-on experience with GPU clusters and HPC environments
Deep expertise with SLURM and Kubernetes in production GPU settings
Proven experience with InfiniBand configuration and troubleshooting
Strong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack
Experience with infrastructure automation tools (Ansible, Terraform)
Proficiency in Python, Bash, and systems programming
Track record of customer-facing technical leadership
NVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)
Container runtime configuration for GPUs (Docker, Containerd, Enroot)
Linux kernel tuning and performance optimization

Job Responsibility

Partner with clients to understand workload requirements and design optimal GPU cluster architectures
Create technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUs
Develop deployment strategies for LLM training, inference, and HPC workloads
Present architectural recommendations to technical and executive stakeholders
Deploy and configure orchestration systems including SLURM and Kubernetes for distributed workloads
Implement high-performance networking with InfiniBand, RoCE, and NVLink interconnects
Optimize GPU utilization, memory management, and inter-node communication
Configure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performance
Tune system performance from kernel parameters to CUDA configurations
Serve as primary technical escalation point for customer infrastructure issues

Fulltime

Member of Technical Staff - GPU Performance Engineer

Our models and workflows require performance work that generic frameworks don’t ...

Location

United States , San Francisco; Boston

Salary:

Not provided

Liquid AI

Expiration Date

Until further notice

Requirements

Authored custom CUDA kernels (not only calling cuDNN/cuBLAS)
Strong understanding of GPU architecture and performance: memory hierarchy, warps, shared memory/register pressure, bandwidth vs compute limits
Proficiency with low-level profiling (Nsight Systems/Compute) and performance methodology
Strong C/C++ skills

Job Responsibility

Write high-performance GPU kernels for our novel model architectures
Integrate kernels into PyTorch pipelines (custom ops, extensions, dispatch, benchmarking)
Profile and optimize training and inference workflows to eliminate bottlenecks
Build correctness tests and numerics checks
Build/maintain performance benchmarks and guardrails to prevent regressions
Collaborate closely with researchers to turn promising ideas into shipped speedups

What we offer

Competitive base salary with equity in a unicorn-stage company
We pay 100% of medical, dental, and vision premiums for employees and dependents
401(k) matching up to 4% of base pay
Unlimited PTO plus company-wide Refill Days throughout the year

Fulltime

Member of Technical Staff, Pre-Training Infrastructure

Microsoft AI is looking for a Member of Technical Staff, Pre-Training Infrastruc...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Experience in distributed computing and large-scale systems
Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch
Proven ability to profile, benchmark, and optimize performance-critical systems
Experience in leading technical projects and supporting architectural decisions with data
Experience building infrastructure for large-scale machine learning or generative AI workloads
Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
Track record of contributing to high-performance computing or large-scale AI infrastructure projects

Job Responsibility

Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters
Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems
Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies
Collaborate with hardware teams to optimize for next-generation accelerators (NVIDIA, AMD, and beyond)
Gather data and insights to develop the pretraining compute roadmap
Care deeply about conversational AI and its deployment
Actively contribute to the development of AI models powering our innovative products
Find solutions to overcome roadblocks and deliver your work to users quickly and iteratively
Enjoy working in a fast-paced, design-driven product development cycle
Embody our Culture and Values

Fulltime

Member of Technical Staff, Site Reliability Engineer (HPC)

As Microsoft continues to push the boundaries of AI, we are on the lookout for p...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
OR equivalent experience
Strong proficiency in Kubernetes, Docker, and container orchestration
Knowledge of CI/CD pipelines for Inference and ML model deployment
Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code
Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.)
Strong programming/scripting skills in Python, Go, or Bash
Solid knowledge of distributed systems, networking, and storage
Experience running large-scale GPU clusters for ML/AI workloads (preferred)

Job Responsibility

Reliability & Availability: Ensure uptime, resiliency, and fault tolerance of HPC clusters powering MAI model training and inference
Observability: Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into all aspects of HPC systems including GPU, clusters, storage and networking
Automation & Tooling: Build automation for deployments, incident response, scaling, and failover in CPU+GPU environments
Incident Management: Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements
Security & Compliance: Ensure data privacy, compliance, and secure operations across model training and serving environments
Collaboration: Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows

What we offer

Competitive compensation, equity options, and comprehensive benefits

Fulltime

Member of Technical Staff, LLM Inference - MAI Superintelligence Team

Our Inference team is responsible for building and maintaining the tools and sys...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Experience with generative AI
Experience with distributed computing
Python and Python ecosystem (eg. uv, pybind/nanobind, FastAPI) expertise
Experience with large scale production inference
Experience with GPU kernel programming
Experience benchmarking, profiling, and optimizing PyTorch generative AI models
Experience with open source inference frameworks like vLLM and SGLang
Working experience and conversant with the material in the JAX scaling book

Job Responsibility

Work alongside researchers and engineers to implement frontier AI research ideas
Introduce new systems, tools, and techniques to improve model inference performance
Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues
Build tools and establish processes to enhance the team’s collective productivity
Find ways to overcome roadblocks and deliver your work to users quickly and iteratively
Enjoy working in a fast-paced, design-driven product development cycle
Embody our Culture and Values

Fulltime

Member of Technical Staff - Edge Inference Engineer

Our Edge Inference team compiles Liquid Foundation Models into optimized machine...

Location

United States , San Francisco; Boston

Salary:

Not provided

Liquid AI

Expiration Date

Until further notice

Requirements

5+ years of experience in systems programming with strong C++ proficiency
Embedded software engineering experience or work on resource-constrained systems
Understanding of ML fundamentals at the linear algebra level (how matrix operations, attention, and quantization work)
Experience with hardware architecture concepts: cache hierarchies, memory bandwidth, SIMD/vectorization

Job Responsibility

Implement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardware
Develop quantization strategies (INT4, INT8, FP8) that maximize compression while preserving model quality under strict memory budgets
Contribute to llama.cpp and other open-source inference frameworks, including new model architectures (audio, vision)
Profile and optimize end-to-end inference pipelines to achieve sub-100ms time-to-first-token on target devices
Collaborate with ML researchers to understand model architectures and identify optimization opportunities specific to Liquid Foundation Models

What we offer

Competitive base salary with equity in a unicorn-stage company
100% of medical, dental, and vision premiums for employees and dependents
401(k) matching up to 4% of base pay
Unlimited PTO plus company-wide Refill Days throughout the year

Fulltime

Member of Technical Staff, Capacity & Efficiency Infrastructure

Microsoft AI is looking for a Member of Technical Staff – Capacity & Efficiency ...

Location

United States , Mountain View

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor’s Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Deep understanding of the fundamentals of GPU architectures and DL/LLM architectures
Deep experience in profiling and analyzing performance in large-scale distributed computing systems
Deep experience in profiling and analyzing performance in ML models especially GenAI models
Experience with low-level GPU programming (CUDA, Triton, NCCL) and frameworks such as PyTorch or JAX
Experience in leading technical projects and supporting architectural decisions with data
Experience building infrastructure for large-scale machine learning or generative AI workloads
Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
Track record of contributing to high-performance computing or large-scale AI infrastructure projects

Job Responsibility

Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters
Build and evolve telemetry systems to provide visibility into infrastructure & ML model performance, utilization, and cost related metrics
Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems
Drive architectural improvements across various ML services which deliver measurable efficiency improvements
Build and evolve tools to automatically provide insights and recommendations to improve fleet-wide efficiency
Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies
Partner with ML researchers and infrastructure engineers to understand their plans and future needs and develop plans to balance growth with efficiency
Collaborate with hardware teams to optimize for next-generation accelerators (NVIDIA, MAIA, and beyond)
Embody our Culture and Values

Fulltime

Member of Technical Staff, GPU Optimization

Runway

Location:
United States

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
December 11, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff, GPU Optimization

Member of Technical Staff, Performance Optimization

Member of Technical Staff - GPU Infrastructure

Member of Technical Staff - GPU Performance Engineer

Member of Technical Staff, Pre-Training Infrastructure

Member of Technical Staff, Site Reliability Engineer (HPC)

Member of Technical Staff, LLM Inference - MAI Superintelligence Team

Member of Technical Staff - Edge Inference Engineer

Member of Technical Staff, Capacity & Efficiency Infrastructure

Our AI answers in your language

Member of Technical Staff, GPU Optimization

Runway

Location:United States

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:December 11, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff, GPU Optimization

Member of Technical Staff, Performance Optimization

Member of Technical Staff - GPU Infrastructure

Member of Technical Staff - GPU Performance Engineer

Member of Technical Staff, Pre-Training Infrastructure

Member of Technical Staff, Site Reliability Engineer (HPC)

Member of Technical Staff, LLM Inference - MAI Superintelligence Team

Member of Technical Staff - Edge Inference Engineer

Member of Technical Staff, Capacity & Efficiency Infrastructure

Location:
United States

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
December 11, 2025