CrawlJobs Logo

Head of Inference Kernels

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

200000.00 - 300000.00 USD / Year

Job Description:

As a core member of the team, you will play a pivotal role in leading a high-performing team to build a suite of optimized kernels and implement highly optimized inference stacks for a variety of state-of-the-art transformer models (e.g., Llama-3, Llama-4, Deepseek-R1, Qwen-3, Stable Diffusion-3 etc.). You will be responsible for managing and scaling a high-performance team to pioneer novel model mapping strategies, while co-designing inference time algorithms (e.g., speculative and parallel decoding, prefill-decode disaggregation etc.).

Job Responsibility:

  • Architect Best-in-Class Inference Performance on Sohu: Deliver continuous batching throughput exceeding B200 by ≥10x on priority workloads
  • Develop Best-in-Performance Inference Mega Kernels: Develop complex, fused kernels that increase chip utilization and reduce inference latency, and validate these optimizations through benchmarking and regression-tested in production pipelines
  • Architect Model Mapping Strategies: Develop system level optimizations using a mix of techniques such tensor parallelism and expert parallelism for optimal performance
  • Hardware-Software Co-design of Inference-time Algorithmic Innovation: Develop and deploy production-ready inference-time algorithmic improvements (e.g., speculative decoding, prefill-decode disaggregation, KV cache offloading)
  • Build Scalable Team and Roadmap: Grow and retain a team of high-performing inference optimization engineers
  • Cross-Functional Performance Alignment: Ensure inference stack and performance goals are aligned with the software infrastructure teams, GTM and hardware teams for future generations of our hardware

Requirements:

  • Experience in designing and optimizing GPU kernels for deep learning on GPUs using CUDA, and assembly (ASM)
  • Experience with low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Deep fluency with transformer inference architecture, optimization levers, and full-stack systems (e.g., vLLM, custom runtimes)
  • History of delivering tangible perf wins on GPU hardware or custom AI accelerators
  • Solid understanding of roofline models of compute throughput, memory bandwidth and interconnect performance
  • Experienced in running large-scale workloads on heterogeneous compute clusters, optimizing for efficiency and scalability of AI workloads
  • Scopes projects crisply, sets aggressive but realistic milestones, and drives technical decision-making across the team
  • Anticipates blockers and shifts resources proactively

Nice to have:

  • Experience with implementation of state-of-the-art reasoning and chain-of-thought models at production scale
  • Experience with implementation of newer AI compute operations on hardware (e.g., flash attention, long-context attention variants and alternatives)
  • Analyzed and implemented strategies such as KV-cache offloading for efficient compute resource management
  • Familiarity with linear algebra (e.g. matrix decomposition, alternatives bases for vector spaces, matrix rank and its implications)
  • Managed lean, high-performing engineering teams and drove execution on timelines with high quality outcomes
What we offer:
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • significant equity package

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Head of Inference Kernels

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right
New

Maintenance Operative - Student Accommodation

Location
Location
United Kingdom , Stoke-on-Trent
Salary
Salary:
13.69 GBP / Hour
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Previous experience of Property maintenance within a similar environment, or experienced and qualified in a trade with the ability to work across a number of trade disciplines to a reasonable standard
  • Knowledge of safe working methods, including COSHH, Manual Handling, working at height etc.
  • Awareness of Health and Safety issues and legal requirements
  • Excellent customer care skills with the ability to report outstanding actions and to keep individuals informed of progress
  • Excellent organisation, communication, and interpersonal skills
  • Experience of prioritising workload to meet competing deadlines without close supervision
  • To be proactive in approach, with the ability to use initiative and resolve issues or problems quickly and effectively
  • Ability to work in a team and have a flexible approach to work
Job Responsibility
Job Responsibility
  • Carry out planned and reactive maintenance within the Property
  • Ensure that statutory compliance is maintained at all times
  • Maintain the condition of the property and ensure that any issues are correctly reported and responded to
  • Liaise with a range of internal and external contacts including suppliers and contractors
  • Drive a high-quality customer experience in relation to the speed and quality of the service provided to reported faults and the general condition, state, and repair of the property
What we offer
What we offer
  • Generous holiday package of 25 days, plus bank holidays, to recharge and enjoy life outside of work (pro rata for our part time colleagues)
  • Access to a range of exclusive retail discounts to make your money go further
  • Take your special day off! Enjoy your birthday with a well-deserved break from work
  • Stay active and eco-friendly with our cycle-to-work scheme
  • Make a difference in the community with 2 charity days per annum
  • Opportunity to work towards a nationally recognised qualification
  • Parttime
Read More
Arrow Right
New

Senior Operations Manager

Are you an experienced Healthcare FM professional ready to lead operations withi...
Location
Location
United Kingdom , City of London
Salary
Salary:
65000.00 - 75000.00 GBP / Year
boden-group.co.uk Logo
Boden Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Healthcare FM or critical environment experience
  • Proven operational leadership within hard FM or engineering services
  • Solid understanding of M&E systems, compliance, PPMs, and reactive maintenance
  • HNC, HND, NVQ Level 4, or equivalent technical qualification
  • Excellent stakeholder management and team leadership skills
Job Responsibility
Job Responsibility
  • Lead day-to-day hard FM operations across the hospital estate
  • Manage Operations Managers, engineering teams, and subcontractors to achieve SLA and KPI targets
  • Oversee PPM and reactive maintenance while ensuring NHS and health & safety compliance
  • Chair operational meetings and produce monthly performance and compliance reports
  • Drive service improvements and challenge underperformance across the contract
What we offer
What we offer
  • Car allowance
  • Bonus scheme
  • Career progression with a leading FM provider
  • Fulltime
Read More
Arrow Right
New

Casual Sales Assistant

As a Casual Sales Assistant, you’ll play a key role in delivering a high-energy,...
Location
Location
United Kingdom , Hull
Salary
Salary:
Not provided
flannels.com Logo
Flannels
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Customer-focused with a passion for retail
  • Confident, friendly, and a strong communicator
  • Flexible and adaptable to business needs
  • Driven to achieve goals and contribute to team success
  • Proud to represent the Sports Direct brand and values
  • Available to work a range of shifts, including evenings, weekends, and holidays
Job Responsibility
Job Responsibility
  • Engage with every customer to deliver outstanding service
  • Use product knowledge to provide tailored recommendations
  • Actively contribute to achieving and exceeding store targets
  • Maintain store presentation through stock replenishment and organisation
  • Support visual merchandising standards in line with the Sports Direct brand
  • Assist with deliveries and stock processing
  • Ensure pricing is accurate and up to date
  • Support stock counts and inventory accuracy
What we offer
What we offer
  • Competitive hourly rate + sales commission
  • Flexible working hours to fit around your lifestyle
  • Monthly group rewards and recognition
  • Uniform discount and 20% discount across Frasers Group brands
  • Discounted gym membership
  • Career development opportunities, including nationally recognised qualifications and internal training programmes
  • A fast-paced, supportive team environment
  • Parttime
Read More
Arrow Right
New

Tactical Sales Representative

WE’RE HIRING: Sales Representatives! Earn up to £17.40 per visit Are you ready t...
Location
Location
United Kingdom , Cardiff
Salary
Salary:
12.50 - 17.40 GBP / Hour
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Do you have experience selling into convenience or on trade outlets
  • Do you know someone that is looking for this type of work
Job Responsibility
Job Responsibility
  • Represent a market-leading clients, driving growth and securing category dominance across convenience and on trade outlets
What we offer
What we offer
  • Additional performance bonuses and incentives
  • Fulltime
Read More
Arrow Right
New

Casual Sales Assistant

We've come a long way since opening our first shop on Kennington Road, London in...
Location
Location
United Kingdom , St Helens
Salary
Salary:
Not provided
flannels.com Logo
Flannels
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Customer-focused with strong communication skills
  • Enthusiastic about cycling (no need to be an expert – we'll train you!)
  • A team player who thrives in a fast-paced retail environment
  • Reliable, flexible, and eager to learn.
Job Responsibility
Job Responsibility
  • Welcoming and engaging customers, offering friendly and professional advice
  • Developing strong product knowledge across bikes, accessories, and clothing
  • Supporting sales through upselling and cross-selling to meet customer needs
  • Assisting with stock management and keeping the shop floor organised
  • Handling payments accurately and efficiently at the till
  • Maintaining high standards of merchandising and store presentation
  • Promoting safe cycling practices and helping customers choose the right safety gear.
What we offer
What we offer
  • Great career opportunities across our retail network
  • Parttime
Read More
Arrow Right
New

Front of House

We are looking for an enthusiastic and customer-focused individual to join our t...
Location
Location
United Kingdom , Exeter
Salary
Salary:
Not provided
flannels.com Logo
Flannels
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Previous experience in a customer-facing role, ideally within the sports, fitness, or leisure industry
  • A passion for providing excellent customer service and creating a welcoming environment
  • Strong communication and interpersonal skills, with the ability to engage positively with members and staff
  • Highly organised with the ability to multitask and manage time effectively
  • A proactive and problem-solving mindset, with the ability to handle situations calmly and efficiently
  • Basic administrative skills, including familiarity with booking systems and handling payments
  • A team player with a positive attitude and a willingness to help wherever needed
  • Flexibility to work varied shifts, including evenings and weekends, to meet the needs of the club
Job Responsibility
Job Responsibility
  • Greet members and guests with a friendly and professional demeanour, ensuring they feel welcomed and valued
  • Handle member check-ins, bookings, and inquiries efficiently and accurately
  • Provide information about the club’s facilities, programs, and events, assisting members with any queries or needs
  • Maintain the reception area and lounge spaces, ensuring they are clean, organised, and inviting
  • Assist with membership sign-ups and renewals, providing information and guidance on membership options
  • Support the management team with administrative tasks, including scheduling, record-keeping, and reporting
  • Coordinate with other departments to ensure smooth operations and a seamless experience for members
  • Handle member feedback and complaints with professionalism, escalating issues as needed to ensure prompt resolution
  • Assist in promoting club events, classes, and services to enhance member engagement and participation
  • Ensure compliance with all health and safety regulations, maintaining a safe environment for all
  • Parttime
Read More
Arrow Right
New

Consultant, IT-Data Platform Management

This is where we value your strategic mindset, technical expertise and passion f...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of IaC, Data Platform, Data Engineering, AWS experience
  • 7+ years of IT industry experience
  • 5+ years of experience & proficiency with Python, Shell Scripting
  • Experience in AWS services like IAM, S3, KMS, EC2, EMR, Cloudformation, EKS
  • Experience in Terraform, git, JIRA, Jenkins, Airflow, Control-M
  • Strong knowledge of SQL, understanding of subqueries, able to write SQL code sufficient for most business requirements for pulling data from sources, applying rules to the data, and stocking target data
  • Proven track record in fix data pipelines and addressing production issues like performance tuning, permission issues
  • Can build clear and concise documentation and communications
  • Can detail technical specs from business communications
  • Ability to coordinate and aggressively follow up on incidents and problems, perform diagnosis, and provide resolution to minimize service interruption
Job Responsibility
Job Responsibility
  • Development, Improvement and Support of AWS infrastructure supporting Data Platform
  • Can explain technical solutions and resolutions with internal customers and communicate feedback
  • Perform technical code reviews for peers moving code into production
  • Perform and review integration testing before production migrations
  • Provide high level of technical support and perform root cause analysis for problems experienced within area of functional responsibility
  • Can document technical specs from business communications
  • Serves as SME for various AWS cloud technologies
What we offer
What we offer
  • Paid Time Off
  • Employee Heath & Well-Being Benefits
  • Continuing Education/ Professional Development
  • Support for Parents
  • Employee Assistance Program
  • Fulltime
Read More
Arrow Right