Member of Technical Staff - GPU Performance Engineer Job at Liquid AI (San Francisco)

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...

Location

United States , San Mateo

Salary:

175000.00 - 220000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience working on performance optimization or high-performance computing systems
Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
Familiarity with PyTorch and performance-critical model execution
Experience with distributed system debugging and optimization in multi-GPU environments
Deep understanding of GPU architecture, parallel programming models, and compute kernels

Job Responsibility

Optimize system and GPU performance for high-throughput AI workloads across training and inference
Analyze and improve latency, throughput, memory usage, and compute efficiency
Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
Implement low-level optimizations using CUDA, Triton, and other performance tooling
Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
Improve support for mixed precision, quantization, and model graph optimization
Build and maintain performance benchmarking and monitoring infrastructure
Scale inference and training systems across multi-GPU, multi-node environments
Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes

What we offer

Meaningful equity in a fast-growing startup
Competitive salary
Comprehensive benefits package

Fulltime

Member of Technical Staff, GPU Optimization

We are building AI to simulate the world through merging art and science. We bel...

Location

United States

Salary:

260000.00 - 325000.00 USD / Year

Runway

Expiration Date

Until further notice

Requirements

5+ years of relevant engineering or research experience in machine learning, computer vision and/or graphics
Experience with CUDA, C++ and systems level performance optimizations
Solid knowledge of at least one machine learning framework (e.g. PyTorch, Tensorflow)
Very strong programming skills and ability to write clean and maintainable research code
Deep interest in building human-in-the-loop systems for creativity
Ability to rapidly prototype solutions and iterate on them with tight product deadlines
Strong communication, collaboration, and documentation skills

Job Responsibility

Develop innovative research projects in computer vision, focusing on generative models for image and video
Work with a world-class engineering team pushing the boundaries of content creation on the browser
Collaborate closely with the rest of the product organization to bring cutting-edge machine learning models to production

Fulltime

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Strong background in one or more of the following areas: AI accelerator or GPU architectures
Distributed systems and large-scale AI training/inference
High-performance computing (HPC) and collective communications
ML systems, runtimes, or compilers
Performance modeling, benchmarking, and systems analysis
Hardware–software co-design for AI workloads
Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.

Job Responsibility

Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.

Fulltime

New

Member of Technical Staff - Distributed Training Engineer

Our Training Infrastructure team is building the distributed systems that power ...

Location

United States , San Francisco; Boston

Salary:

Not provided

Liquid AI

Expiration Date

Until further notice

Requirements

Hands-on experience building distributed training infrastructure (PyTorch Distributed DDP/FSDP, DeepSpeed ZeRO, Megatron-LM TP/PP)
Experience diagnosing performance bottlenecks and failure modes (profiling, NCCL/collectives issues, hangs, OOMs, stragglers)
Understanding of hardware accelerators and networking topologies
Experience optimizing data pipelines for ML workloads

Job Responsibility

Design and build core systems that make large training runs fast and reliable
Build scalable distributed training infrastructure for GPU clusters
Implement and tune parallelism/sharding strategies for evolving architectures
Optimize distributed efficiency (topology-aware collectives, comm/compute overlap, straggler mitigation)
Build data loading systems that eliminate I/O bottlenecks for multimodal datasets
Develop checkpointing mechanisms balancing memory constraints with recovery needs
Create monitoring, profiling, and debugging tools for training stability and performance

What we offer

Competitive base salary with equity in a unicorn-stage company
We pay 100% of medical, dental, and vision premiums for employees and dependents
401(k) matching up to 4% of base pay
Unlimited PTO plus company-wide Refill Days throughout the year

Fulltime

New

Member of Technical Staff, Synthetic Data

As a Machine Learning Engineer specializing in synthetic data, you will play a p...

Location

Salary:

Not provided

Cohere

Expiration Date

Until further notice

Requirements

Strong software engineering skills, with proficiency in Python and experience building data pipelines
Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
Experience working with LLMs through work projects, open-source contributions or personal experimentation
Familiarity with LLM inference frameworks such as vLLM and TensorRT
Experience working with large-scale datasets, including web data, code data, and multilingual corpora
A passion for bridging research and engineering to solve complex data-related challenges in AI model training

Job Responsibility

Design and build scalable inference pipelines that run on large GPU clusters
Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
Research and implement innovative synthetic data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models

What we offer

An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
6 weeks of vacation (30 working days!)

Fulltime

New

Member of Technical Staff - Inference

Prime Intellect is building the open superintelligence stack - from frontier age...

Location

United States , San Francisco

Salary:

Not provided

Prime Intellect

Expiration Date

Until further notice

Requirements

3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs
Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM
Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo
Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies
Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end
Python: Systems tooling and backend services
PyTorch: LLM Inference engine development and integration, deployment readiness
AWS/GCP service experience, cloud deployment patterns
Running infrastructure at scale with containers on Kubernetes
Architecture, CUDA runtime, NCCL, InfiniBand

Job Responsibility

Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets
Design placement and scheduling algorithms for heterogeneous accelerators
Implement multi‑region/zone failover and traffic shifting for resilience and cost control
Build autoscaling, routing, and load balancing to meet throughput/latency SLOs
Optimize model distribution and cold-start times across clusters
Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM
Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance
Profile kernels, memory bandwidth and transport
apply techniques such as quantization and speculative decoding
Develop reproducible performance suites (latency, throughput, context length, batch size, precision)

What we offer

Competitive compensation with significant equity incentives
Flexible work arrangement (remote or San Francisco office)
Full visa sponsorship and relocation support
Professional development budget
Regular team off-sites and conference attendance
Opportunity to shape decentralized AI and RL at Prime Intellect

Fulltime

New

AI Research Engineer

Build the future of offensive security with XBOW. Attackers are already using AI...

Location

Salary:

150000.00 - 350000.00 USD / Year

Xbow

Expiration Date

Until further notice

Requirements

Strong experience with building software around LLMs: prompting, agentic orchestration, fault-tolerance, and integration of LLM parts with hard-coded logic
Strong software engineering skills: architecting and building production-grade software that runs reliably and can be maintained
Experience with TypeScript or proven ability to learn a new programming language quickly
Strong skills in structured and independently-driven problem-solving. Able to work with incomplete information and rapidly testing hypotheses
Comfortable with an energetic environment that mixes the fast-paced agile prioritisation of a startup with the curiosity mentality of a research lab
Eager to own projects and jump into the deep end, learning as you go. Curious, adaptable and collaborative
MSc or equivalent or higher in computer science, math, physics or machine learning

Job Responsibility

Build LLM-powered software that actually works, by designing prompt flows and orchestrations that ensures great performance with no false positives
Architect and build an AI-powered software stack that is production-grade, testable and maintainable
Design and build experiments and evaluation frameworks for performance testing of the system at scale. Conduct data analysis to draw conclusions
Collaborate with the rest of the AI team, with security experts, and both frontend and backend developers to create end-to-end systems that work and customers love
Own projects end-to-end: from basic ideation and experimentation to deployment and production monitoring
Continuously conduct research on how to harness the advancements in LLMs to make our system better and faster

What we offer

Competitive salary and a generous equity package, making you a true owner of the company
Shape your role, lead the function, and grow with the company as we redefine cybersecurity
You will tackle technically complex challenges and play a pivotal role in the growth of our business, working alongside an amazing team and some of the world’s experts to shape how AI transforms cybersecurity

Fulltime

New

Backend Engineer

We’re looking for a Backend Engineer to join Team Events and help us build and e...

Location

United States

Salary:

143900.00 - 215900.00 USD / Year

Zapier

Expiration Date

March 31, 2026

Requirements

4+ years with software development in either Python, Go, Typescript
At least 2 years focused on building event / streaming systems at scale
Experience working with event architectures and services based on technologies like Kafka (MSK) and Avro
Supported event-system infrastructure to ensure resiliency and uptime
Participated in the design or maintenance of highly available, cloud-based infrastructure in AWS or another cloud provider
Understand how to leverage infrastructure-as-code tools (Terraform) and have learned best practices for reliability and observability
Strong experience with AWS services, cloud computing technologies, and distributed data stores
Experience with languages like Python or Go to create automated tools
Believe in hands-off deployments and infrastructure as code

Job Responsibility

Work with AWS services like MSK, SQS, Redis, S3, Lambda and Aurora to build scalable solutions that process billions of events per day
Use Terraform to maintain and build our infrastructure
Build toolkits, libraries, and scripts to ease challenges faced by other teams at Zapier when they wish to emit to and consume from the Events system as well as other queue solutions we are currently working on building
Contribute to data governance practices across Zapier
Influence proper data structure and data hygiene
Refactor or improve existing code as languages, frameworks, or techniques evolve
Help the team pick appropriate tools to solve new problems as they arise
Provide feedback on tools, processes, and documentation in place to help us become a better, more effective organization
Work with your colleagues to develop new skills, through code review, discussions and mentoring
Participate in on-call rotations to ensure the reliability and availability of our systems, providing timely and effective support when issues arise

What we offer

Offers Equity
Offers Bonus

Fulltime

Member of Technical Staff - GPU Performance Engineer

Liquid AI

Location:
United States , San Francisco ▼
Boston

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff - GPU Performance Engineer

Member of Technical Staff, Performance Optimization

Member of Technical Staff, GPU Optimization

Member of Technical Staff, Software Co-Design AI HPC Systems

Member of Technical Staff - Distributed Training Engineer

Member of Technical Staff, Synthetic Data

Member of Technical Staff - Inference

AI Research Engineer

Backend Engineer

Member of Technical Staff - GPU Performance Engineer

Liquid AI

Location:United States , San Francisco ▼Boston

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff - GPU Performance Engineer

Member of Technical Staff, Performance Optimization

Member of Technical Staff, GPU Optimization

Member of Technical Staff, Software Co-Design AI HPC Systems

Member of Technical Staff - Distributed Training Engineer

Member of Technical Staff, Synthetic Data

Member of Technical Staff - Inference

AI Research Engineer

Backend Engineer

Location:
United States , San Francisco ▼
Boston

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 21, 2026