CrawlJobs Logo

Member of Technical Staff - GPU Performance Engineer

Liquid AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our models and workflows require performance work that generic frameworks don’t solve. You’ll design and ship custom CUDA kernels, profile at the hardware level, and integrate research ideas into production code that delivers measurable speedups in real pipelines (training, post-training, and inference). Our team is small, fast-moving, and high-ownership. We're looking for someone who finds joy in memory hierarchies, tensor cores, and profiler output.

Job Responsibility:

  • Write high-performance GPU kernels for our novel model architectures
  • Integrate kernels into PyTorch pipelines (custom ops, extensions, dispatch, benchmarking)
  • Profile and optimize training and inference workflows to eliminate bottlenecks
  • Build correctness tests and numerics checks
  • Build/maintain performance benchmarks and guardrails to prevent regressions
  • Collaborate closely with researchers to turn promising ideas into shipped speedups

Requirements:

  • Authored custom CUDA kernels (not only calling cuDNN/cuBLAS)
  • Strong understanding of GPU architecture and performance: memory hierarchy, warps, shared memory/register pressure, bandwidth vs compute limits
  • Proficiency with low-level profiling (Nsight Systems/Compute) and performance methodology
  • Strong C/C++ skills

Nice to have:

  • CUTLASS experience and tensor core utilization strategies
  • Triton kernel experience and/or PyTorch custom op integration
  • Experience building benchmark harnesses and perf regression tests
What we offer:
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff - GPU Performance Engineer

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, GPU Optimization

We are building AI to simulate the world through merging art and science. We bel...
Location
Location
United States
Salary
Salary:
260000.00 - 325000.00 USD / Year
runwayml.com Logo
Runway
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant engineering or research experience in machine learning, computer vision and/or graphics
  • Experience with CUDA, C++ and systems level performance optimizations
  • Solid knowledge of at least one machine learning framework (e.g. PyTorch, Tensorflow)
  • Very strong programming skills and ability to write clean and maintainable research code
  • Deep interest in building human-in-the-loop systems for creativity
  • Ability to rapidly prototype solutions and iterate on them with tight product deadlines
  • Strong communication, collaboration, and documentation skills
Job Responsibility
Job Responsibility
  • Develop innovative research projects in computer vision, focusing on generative models for image and video
  • Work with a world-class engineering team pushing the boundaries of content creation on the browser
  • Collaborate closely with the rest of the product organization to bring cutting-edge machine learning models to production
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Distributed Training Engineer

Our Training Infrastructure team is building the distributed systems that power ...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience building distributed training infrastructure (PyTorch Distributed DDP/FSDP, DeepSpeed ZeRO, Megatron-LM TP/PP)
  • Experience diagnosing performance bottlenecks and failure modes (profiling, NCCL/collectives issues, hangs, OOMs, stragglers)
  • Understanding of hardware accelerators and networking topologies
  • Experience optimizing data pipelines for ML workloads
Job Responsibility
Job Responsibility
  • Design and build core systems that make large training runs fast and reliable
  • Build scalable distributed training infrastructure for GPU clusters
  • Implement and tune parallelism/sharding strategies for evolving architectures
  • Optimize distributed efficiency (topology-aware collectives, comm/compute overlap, straggler mitigation)
  • Build data loading systems that eliminate I/O bottlenecks for multimodal datasets
  • Develop checkpointing mechanisms balancing memory constraints with recovery needs
  • Create monitoring, profiling, and debugging tools for training stability and performance
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Synthetic Data

As a Machine Learning Engineer specializing in synthetic data, you will play a p...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with LLMs through work projects, open-source contributions or personal experimentation
  • Familiarity with LLM inference frameworks such as vLLM and TensorRT
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Design and build scalable inference pipelines that run on large GPU clusters
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Research and implement innovative synthetic data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - Inference

Prime Intellect is building the open superintelligence stack - from frontier age...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
Prime Intellect
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs
  • Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM
  • Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo
  • Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies
  • Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end
  • Python: Systems tooling and backend services
  • PyTorch: LLM Inference engine development and integration, deployment readiness
  • AWS/GCP service experience, cloud deployment patterns
  • Running infrastructure at scale with containers on Kubernetes
  • Architecture, CUDA runtime, NCCL, InfiniBand
Job Responsibility
Job Responsibility
  • Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets
  • Design placement and scheduling algorithms for heterogeneous accelerators
  • Implement multi‑region/zone failover and traffic shifting for resilience and cost control
  • Build autoscaling, routing, and load balancing to meet throughput/latency SLOs
  • Optimize model distribution and cold-start times across clusters
  • Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM
  • Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance
  • Profile kernels, memory bandwidth and transport
  • apply techniques such as quantization and speculative decoding
  • Develop reproducible performance suites (latency, throughput, context length, batch size, precision)
What we offer
What we offer
  • Competitive compensation with significant equity incentives
  • Flexible work arrangement (remote or San Francisco office)
  • Full visa sponsorship and relocation support
  • Professional development budget
  • Regular team off-sites and conference attendance
  • Opportunity to shape decentralized AI and RL at Prime Intellect
  • Fulltime
Read More
Arrow Right
New

AI Research Engineer

Build the future of offensive security with XBOW. Attackers are already using AI...
Location
Location
Salary
Salary:
150000.00 - 350000.00 USD / Year
Xbow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with building software around LLMs: prompting, agentic orchestration, fault-tolerance, and integration of LLM parts with hard-coded logic
  • Strong software engineering skills: architecting and building production-grade software that runs reliably and can be maintained
  • Experience with TypeScript or proven ability to learn a new programming language quickly
  • Strong skills in structured and independently-driven problem-solving. Able to work with incomplete information and rapidly testing hypotheses
  • Comfortable with an energetic environment that mixes the fast-paced agile prioritisation of a startup with the curiosity mentality of a research lab
  • Eager to own projects and jump into the deep end, learning as you go. Curious, adaptable and collaborative
  • MSc or equivalent or higher in computer science, math, physics or machine learning
Job Responsibility
Job Responsibility
  • Build LLM-powered software that actually works, by designing prompt flows and orchestrations that ensures great performance with no false positives
  • Architect and build an AI-powered software stack that is production-grade, testable and maintainable
  • Design and build experiments and evaluation frameworks for performance testing of the system at scale. Conduct data analysis to draw conclusions
  • Collaborate with the rest of the AI team, with security experts, and both frontend and backend developers to create end-to-end systems that work and customers love
  • Own projects end-to-end: from basic ideation and experimentation to deployment and production monitoring
  • Continuously conduct research on how to harness the advancements in LLMs to make our system better and faster
What we offer
What we offer
  • Competitive salary and a generous equity package, making you a true owner of the company
  • Shape your role, lead the function, and grow with the company as we redefine cybersecurity
  • You will tackle technically complex challenges and play a pivotal role in the growth of our business, working alongside an amazing team and some of the world’s experts to shape how AI transforms cybersecurity
  • Fulltime
Read More
Arrow Right
New

Backend Engineer

We’re looking for a Backend Engineer to join Team Events and help us build and e...
Location
Location
United States
Salary
Salary:
143900.00 - 215900.00 USD / Year
Zapier
Expiration Date
March 31, 2026
Flip Icon
Requirements
Requirements
  • 4+ years with software development in either Python, Go, Typescript
  • At least 2 years focused on building event / streaming systems at scale
  • Experience working with event architectures and services based on technologies like Kafka (MSK) and Avro
  • Supported event-system infrastructure to ensure resiliency and uptime
  • Participated in the design or maintenance of highly available, cloud-based infrastructure in AWS or another cloud provider
  • Understand how to leverage infrastructure-as-code tools (Terraform) and have learned best practices for reliability and observability
  • Strong experience with AWS services, cloud computing technologies, and distributed data stores
  • Experience with languages like Python or Go to create automated tools
  • Believe in hands-off deployments and infrastructure as code
Job Responsibility
Job Responsibility
  • Work with AWS services like MSK, SQS, Redis, S3, Lambda and Aurora to build scalable solutions that process billions of events per day
  • Use Terraform to maintain and build our infrastructure
  • Build toolkits, libraries, and scripts to ease challenges faced by other teams at Zapier when they wish to emit to and consume from the Events system as well as other queue solutions we are currently working on building
  • Contribute to data governance practices across Zapier
  • Influence proper data structure and data hygiene
  • Refactor or improve existing code as languages, frameworks, or techniques evolve
  • Help the team pick appropriate tools to solve new problems as they arise
  • Provide feedback on tools, processes, and documentation in place to help us become a better, more effective organization
  • Work with your colleagues to develop new skills, through code review, discussions and mentoring
  • Participate in on-call rotations to ensure the reliability and availability of our systems, providing timely and effective support when issues arise
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • Fulltime
Read More
Arrow Right