CrawlJobs Logo

AI Inference Intern

perplexity.ai Logo

Perplexity

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Perplexity is excited to announce the Internship Program for exceptional Master’s or PhD students studying Computer Science or Engineering in the UK, enrolled in the 2025-2026 academic year. This is an intensive program in which you will work directly with our AI Inference team. This program offers a unique opportunity to gain valuable experience in a rapidly growing AI startup. Outstanding performers might be offered a full time position at the end of the program. Our AI Inference team is responsible for running the models behind the Perplexity products. The team maintains the inference engine and deployments behind models ranging from single-node embeddings to distributed sparse Mixture-of-Experts models, maintaining large GPU clusters. With a keen focus on latency and throughput, the Inference team is responsible for the entire serving stack, from GPU kernels to networking and monitoring infrastructure.

Job Responsibility:

  • Work with the inference team to improve serving latency and throughput
  • Bring up support for new models and state-of-the art inference optimizations or quantization schemes
  • Optimize inference across the entire stack, from GPU kernels to serving endpoints

Requirements:

  • Strong engineering track record with proven knowledge of fundamentals and programming languages (multi-threaded programming, networking, compilation, systems programming, etc)
  • Pursuing a Master's or PhD in Computer Science with a focus on performance-related subjects (HPC, Compilers, Distributed Systems)
  • Experience with ML frameworks (Torch, JAX)
  • Experience with GPU programming (CUDA, Triton)
  • Experience with High-Performance Computing (OpenMPI)
What we offer:

Outstanding performers might be offered a full time position at the end of the program

Additional Information:

Job Posted:
February 21, 2026

Work Type:
Hybrid work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Inference Intern

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

Research Intern - AI Inference Architecture

Research Internships at Microsoft provide a dynamic environment for research car...
Location
Location
United States , Redmond
Salary
Salary:
6710.00 - 13270.00 USD / Month
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently enrolled in a PhD program in Computer Science, Computer Engineering or a related STEM field
  • At least 1 year of experience working with LLM inference software stack and systems
  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship
  • submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples
Job Responsibility
Job Responsibility
  • Research Interns put inquiry and theory into practice
  • learn, collaborate, and network for life
  • contribute to exciting research and development strides
  • paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community
  • Fulltime
Read More
Arrow Right

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United States , San Francisco; Palo Alto; New York City
Salary
Salary:
210000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • equity
  • health
  • dental
  • vision
  • retirement
  • fitness
  • commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • Equity may be part of the total compensation package
  • Fulltime
Read More
Arrow Right

Principal GPU/NPU AI System Architect

The AI Architect will define and drive end‑to‑end AI system architecture for emb...
Location
Location
United States , Austin; San Jose
Salary
Salary:
200000.00 - 300000.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in GPU and/or NPU architecture and execution models
  • Strong hands-on experience with AI models and inference pipelines
  • Proven background in embedded / edge AI systems
  • Strong understanding of hardware-aware model optimization techniques
  • Experience in robotics, automotive, or industrial AI domains
  • Ability to translate customer problems into scalable architectural solutions
  • Motivating leader with good interpersonal skills
  • cross-functional & external leadership
  • Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Develop deep architectural understanding of GPU, NPU, and heterogeneous SoC designs
  • Guide HW–SW co‑optimization strategies for AI workloads
  • Influence silicon and platform roadmaps using model‑driven architectural insights
  • Collaborate across silicon, system engineering, software, thermal/mechanical, security, and product teams
  • Technically lead internal AI engineers and work closely with partners, ISVs, and customers
  • Act as a technical authority and mentor
  • Architect AI solutions with strong understanding of model internals
  • Evaluate and map model characteristics onto GPU/NPU execution
  • Drive model optimization strategies
  • Define and optimize AI software stacks
Read More
Arrow Right

GPU Kernel Performance Engineer

AMD is looking for an influential software engineer who is passionate about impr...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in GPU, NPU, and FPGA architectures, with a deep understanding of accelerator micro‑architecture and computation pipelines
  • Solid knowledge of AI inference, including operator/kernel development, AI compilers, and inference frameworks such as PyTorch and ONNX Runtime
  • Extensive experience in GPU kernel development, with strong proficiency in CUDA and/or HIP programming models
  • Strong object‑oriented programming background
  • proficiency in C/C++ is highly preferred
  • Proven ability to write high‑quality, efficient, and maintainable code, with strong attention to detail and robustness
  • Excellent communication skills and strong analytical/problem‑solving capabilities
  • Doctor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Design and deliver high‑performance computing solutions, providing competitive architectures and implementations for customers
  • Develop high‑performance operators across GPU/NPU platforms, including GEMM, MHA, and CONV
  • Build and optimize inference frameworks and inference compilers
  • Conduct performance evaluation and benchmarking of models and operators
  • Track and study cutting‑edge research papers, reproduce key methodologies, and integrate them into production solutions
  • Document technical work, summarize team achievements, and contribute to patents and publications
  • Build and maintain strong technical relationships with internal teams, industry peers, and ecosystem partners
Read More
Arrow Right

Ai application engineer

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great prod...
Location
Location
China , Shanghai;Shenzhen;Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Developer enablement with leading open-source communities and AI frameworks, including PyTorch, vLLM, SGLang, Unsloth, PaddlePaddle, Mooncake, TileLang, LangChain, VERL, and LLaMA-Factory, across both training and inference workflows
  • Strong experience with LLMs and Generative AI, including transformer architectures, attention mechanisms, MoE models, and end-to-end AI pipelines
  • Solid understanding of GPU-accelerated computing
  • familiarity with the ROCm AI software stack is strongly preferred
  • Proven ability to collaborate effectively with open-source software communities to drive developer enablement and ecosystem activities
  • Excellent communication and presentation skills, with the ability to clearly articulate architectural proposals, technical trade-offs, and value propositions to diverse stakeholders
  • Bachelor's degree required
  • Master's degree preferred
Job Responsibility
Job Responsibility
  • Capture and prioritize developer and customer requirements to shape AMD's AI software feature planning and solutions roadmap
  • Lead and contribute to collaboration with AI open-source projects, strengthening the developer community and broader ecosystem
  • Partner with internal AI software engineering teams to drive developer enablement through performance optimization, OSS contributions, Discord/GitHub support, AI Academy initiatives, solutions, reference designs, blogs, tutorials, and user guides
  • Work closely with internal AI software teams to ensure the success of AI developers, communities, and customer proof-of-concepts (PoCs)
  • Provide actionable feedback and requirements for AI software across cloud, client, and edge deployments
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - AI Frameworks

The AI Frameworks team at Microsoft accelerates and optimizes large language mod...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python OR equivalent experience.
  • Experience with PyTorch internals, custom operators, hardware backend, or torch.compile/Dynamo-based optimization flows.
  • Experience with AI inference stacks such as vLLM, SGLang, or similar large-scale model serving systems.
  • Experience with NPU or GPU kernel development and optimization (e.g., CUDA, Triton, or accelerator-specific toolchains).
  • Familiarity with common LLM concepts such as attention mechanisms, KV caching, quantization (PTQ/QAT), and distributed parallelism strategies (TP, PP, DP).
Job Responsibility
Job Responsibility
  • Architect and implement efficient tensor computation primitives and software abstractions for custom AI accelerators
  • Develop and extend PyTorch features for model onboarding, optimization, and execution on custom AI accelerators
  • Contribute to and improve AI inference stacks such as vLLM and SGLang, including scheduling, KV cache management, and serving pipelines
  • Design, develop, profile, and optimize high-performance kernels for NPUs (MAIA) and GPUs to accelerate LLM inference and training workloads
  • Collaborate across disciplines to define requirements and deliver practical solutions to new technical challenges
  • Fulltime
Read More
Arrow Right