CrawlJobs Logo

AI Inference Intern

perplexity.ai Logo

Perplexity

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Perplexity is excited to announce the Internship Program for exceptional Master’s or PhD students studying Computer Science or Engineering in the UK, enrolled in the 2025-2026 academic year. This is an intensive program in which you will work directly with our AI Inference team. This program offers a unique opportunity to gain valuable experience in a rapidly growing AI startup. Outstanding performers might be offered a full time position at the end of the program. Our AI Inference team is responsible for running the models behind the Perplexity products. The team maintains the inference engine and deployments behind models ranging from single-node embeddings to distributed sparse Mixture-of-Experts models, maintaining large GPU clusters. With a keen focus on latency and throughput, the Inference team is responsible for the entire serving stack, from GPU kernels to networking and monitoring infrastructure.

Job Responsibility:

  • Work with the inference team to improve serving latency and throughput
  • Bring up support for new models and state-of-the art inference optimizations or quantization schemes
  • Optimize inference across the entire stack, from GPU kernels to serving endpoints

Requirements:

  • Strong engineering track record with proven knowledge of fundamentals and programming languages (multi-threaded programming, networking, compilation, systems programming, etc)
  • Pursuing a Master's or PhD in Computer Science with a focus on performance-related subjects (HPC, Compilers, Distributed Systems)
  • Experience with ML frameworks (Torch, JAX)
  • Experience with GPU programming (CUDA, Triton)
  • Experience with High-Performance Computing (OpenMPI)
What we offer:

Outstanding performers might be offered a full time position at the end of the program

Additional Information:

Job Posted:
February 21, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Inference Intern

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United States , San Francisco; Palo Alto; New York City
Salary
Salary:
210000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • equity
  • health
  • dental
  • vision
  • retirement
  • fitness
  • commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • Equity may be part of the total compensation package
  • Fulltime
Read More
Arrow Right

Principal GPU/NPU AI System Architect

The AI Architect will define and drive end‑to‑end AI system architecture for emb...
Location
Location
United States , Austin; San Jose
Salary
Salary:
200000.00 - 300000.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in GPU and/or NPU architecture and execution models
  • Strong hands-on experience with AI models and inference pipelines
  • Proven background in embedded / edge AI systems
  • Strong understanding of hardware-aware model optimization techniques
  • Experience in robotics, automotive, or industrial AI domains
  • Ability to translate customer problems into scalable architectural solutions
  • Motivating leader with good interpersonal skills
  • cross-functional & external leadership
  • Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Develop deep architectural understanding of GPU, NPU, and heterogeneous SoC designs
  • Guide HW–SW co‑optimization strategies for AI workloads
  • Influence silicon and platform roadmaps using model‑driven architectural insights
  • Collaborate across silicon, system engineering, software, thermal/mechanical, security, and product teams
  • Technically lead internal AI engineers and work closely with partners, ISVs, and customers
  • Act as a technical authority and mentor
  • Architect AI solutions with strong understanding of model internals
  • Evaluate and map model characteristics onto GPU/NPU execution
  • Drive model optimization strategies
  • Define and optimize AI software stacks
Read More
Arrow Right

GPU Kernel Performance Engineer

AMD is looking for an influential software engineer who is passionate about impr...
Location
Location
China , Beijing
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in GPU, NPU, and FPGA architectures, with a deep understanding of accelerator micro‑architecture and computation pipelines
  • Solid knowledge of AI inference, including operator/kernel development, AI compilers, and inference frameworks such as PyTorch and ONNX Runtime
  • Extensive experience in GPU kernel development, with strong proficiency in CUDA and/or HIP programming models
  • Strong object‑oriented programming background
  • proficiency in C/C++ is highly preferred
  • Proven ability to write high‑quality, efficient, and maintainable code, with strong attention to detail and robustness
  • Excellent communication skills and strong analytical/problem‑solving capabilities
  • Doctor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Design and deliver high‑performance computing solutions, providing competitive architectures and implementations for customers
  • Develop high‑performance operators across GPU/NPU platforms, including GEMM, MHA, and CONV
  • Build and optimize inference frameworks and inference compilers
  • Conduct performance evaluation and benchmarking of models and operators
  • Track and study cutting‑edge research papers, reproduce key methodologies, and integrate them into production solutions
  • Document technical work, summarize team achievements, and contribute to patents and publications
  • Build and maintain strong technical relationships with internal teams, industry peers, and ecosystem partners
Read More
Arrow Right

AI Solutions Engineer Specialist

The AI Solutions Engineer will play a crucial role in designing and deploying en...
Location
Location
Malaysia , Selangor Darul Ehsan
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5–10 years of relevant experience
  • Practical experience with: Python development (essential)
  • AI/ML frameworks or LLM integrations
  • APIs, backend development, or automation scripting
  • Familiarity with: Vector databases or semantic search concepts
  • Data processing and document parsing workflows
  • Containerization (Docker) or deployment environments
  • Understanding of Generative AI concepts (LLMs, embeddings, RAG basics)
  • Basic database knowledge (SQL/NoSQL)
  • Familiarity with Git-based development workflows
Job Responsibility
Job Responsibility
  • Build and maintain AI applications including: RAG pipelines and knowledge assistants
  • LLM integrations and prompt workflows
  • Agentic AI bots
  • Develop APIs, backend services, and integrations supporting AI solutions
  • Assist in optimizing model performance, inference latency, and system reliability
  • Prepare datasets for AI use cases including cleaning, structuring, and preprocessing
  • Manage vector databases, embeddings, and retrieval optimization
  • Support automation of data ingestion workflows
  • Assist with deploying AI solutions across development, staging, and production environments
  • Monitor performance, troubleshoot issues, and optimize resource utilization
  • Fulltime
Read More
Arrow Right

AI Platform Architect

EverOps partners with enterprise engineering organizations to solve their hardes...
Location
Location
Salary
Salary:
Not provided
EverOps
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in Cloud, Platform, SRE, or Infrastructure Engineering roles
  • Proven experience operating at an Architect level
  • Strong client-facing and consultative experience
  • Deep hands-on experience with AWS, including multi-account architectures and governance
  • Strong knowledge of infrastructure as code (Terraform preferred)
  • Experience designing secure, scalable platforms in AWS Organizations environments
  • Practical experience with AI/ML platforms, preferably AWS-native (Bedrock, SageMaker, Glue, Athena, OpenSearch)
  • Experience with GenAI architectures (RAG, embeddings, vector stores, agent frameworks)
  • Familiarity with model evaluation, prompt engineering, and inference optimization
  • Understanding of AI cost drivers and scaling considerations
Job Responsibility
Job Responsibility
  • Lead technical workshops to identify, refine, and prioritize high-impact AI and GenAI use cases aligned with business objectives
  • Translate business problems into system design requirements and AI workflows
  • Assess existing data platforms, pipelines, governance, and accessibility for AI workloads
  • Evaluate data quality, lineage, security, and suitability for training, RAG, and inference patterns
  • Design AI architectures that comply with enterprise security, privacy, and regulatory constraints (PII, PHI, internal policies)
  • Evaluate and design integrations across APIs, event streams, and existing systems
  • Evaluate and recommend foundation models and AI services, including Amazon Bedrock, Amazon Nova, and open-source models
  • Analyze tradeoffs across cost, latency, accuracy, and scalability
  • Design GenAI patterns such as RAG, agent workflows, and inference pipelines
  • Produce high-level and detailed AWS reference architectures for prioritized AI use cases
  • Fulltime
Read More
Arrow Right

Field Application Engineer - AI

We are seeking an AI Application Engineer to join the HPC Centre of Excellence (...
Location
Location
Taiwan , Taipei City
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Track record working within AI
  • Demonstrable hands-on expertise working with popular AI frameworks
  • Strong positive can-do attitude
  • Skilled in independently prioritizing opportunities
  • Excellent verbal and written communication skills
  • Open to travel both domestic and international, approximately 10-20% over a year
  • Good English
  • Demonstrated experience with training and inference workloads on GPUs
  • Executing applications in common frameworks (Pytorch, Tensorflow, Jax)
  • Customer-facing experience
Job Responsibility
Job Responsibility
  • Support winning new AI business
  • Liaise and advise customers and partners through Proof of Concepts ‘POCs’, presentations, and training
  • Engineering: execute popular and customer-driven AI inference and training workloads
  • Run training and inference performance investigations
  • Build a body of documentation for internal and external dissemination
  • Proactive engagement across AMD teams
  • Assist in creating Total Cost of Ownership models
  • Technically owning and resolving customer and partner issues
  • Fulltime
Read More
Arrow Right