CrawlJobs Logo

AI Inference Intern

perplexity.ai Logo

Perplexity

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Perplexity is excited to announce the Internship Program for exceptional Master’s or PhD students studying Computer Science or Engineering in the UK, enrolled in the 2025-2026 academic year. This is an intensive program in which you will work directly with our AI Inference team. This program offers a unique opportunity to gain valuable experience in a rapidly growing AI startup. Outstanding performers might be offered a full time position at the end of the program. Our AI Inference team is responsible for running the models behind the Perplexity products. The team maintains the inference engine and deployments behind models ranging from single-node embeddings to distributed sparse Mixture-of-Experts models, maintaining large GPU clusters. With a keen focus on latency and throughput, the Inference team is responsible for the entire serving stack, from GPU kernels to networking and monitoring infrastructure.

Job Responsibility:

  • Work with the inference team to improve serving latency and throughput
  • Bring up support for new models and state-of-the art inference optimizations or quantization schemes
  • Optimize inference across the entire stack, from GPU kernels to serving endpoints

Requirements:

  • Strong engineering track record with proven knowledge of fundamentals and programming languages (multi-threaded programming, networking, compilation, systems programming, etc)
  • Pursuing a Master's or PhD in Computer Science with a focus on performance-related subjects (HPC, Compilers, Distributed Systems)
  • Experience with ML frameworks (Torch, JAX)
  • Experience with GPU programming (CUDA, Triton)
  • Experience with High-Performance Computing (OpenMPI)
What we offer:

Outstanding performers might be offered a full time position at the end of the program

Additional Information:

Job Posted:
February 21, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Inference Intern

New

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right
New

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United States , San Francisco; Palo Alto; New York City
Salary
Salary:
210000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • equity
  • health
  • dental
  • vision
  • retirement
  • fitness
  • commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right
New

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • Equity may be part of the total compensation package
  • Fulltime
Read More
Arrow Right
New

AI Platform Architect

EverOps partners with enterprise engineering organizations to solve their hardes...
Location
Location
Salary
Salary:
Not provided
EverOps
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in Cloud, Platform, SRE, or Infrastructure Engineering roles
  • Proven experience operating at an Architect level
  • Strong client-facing and consultative experience
  • Deep hands-on experience with AWS, including multi-account architectures and governance
  • Strong knowledge of infrastructure as code (Terraform preferred)
  • Experience designing secure, scalable platforms in AWS Organizations environments
  • Practical experience with AI/ML platforms, preferably AWS-native (Bedrock, SageMaker, Glue, Athena, OpenSearch)
  • Experience with GenAI architectures (RAG, embeddings, vector stores, agent frameworks)
  • Familiarity with model evaluation, prompt engineering, and inference optimization
  • Understanding of AI cost drivers and scaling considerations
Job Responsibility
Job Responsibility
  • Lead technical workshops to identify, refine, and prioritize high-impact AI and GenAI use cases aligned with business objectives
  • Translate business problems into system design requirements and AI workflows
  • Assess existing data platforms, pipelines, governance, and accessibility for AI workloads
  • Evaluate data quality, lineage, security, and suitability for training, RAG, and inference patterns
  • Design AI architectures that comply with enterprise security, privacy, and regulatory constraints (PII, PHI, internal policies)
  • Evaluate and design integrations across APIs, event streams, and existing systems
  • Evaluate and recommend foundation models and AI services, including Amazon Bedrock, Amazon Nova, and open-source models
  • Analyze tradeoffs across cost, latency, accuracy, and scalability
  • Design GenAI patterns such as RAG, agent workflows, and inference pipelines
  • Produce high-level and detailed AWS reference architectures for prioritized AI use cases
  • Fulltime
Read More
Arrow Right
New

Distinguished Engineer- AI Agentics Engineering

At CVS Health, we’re building a world of health around every consumer and surrou...
Location
Location
United States , Woonsocket
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
March 31, 2026
Flip Icon
Requirements
Requirements
  • 15+ years of Software Engineering experience required
  • 7+ years in AI/ML engineering with 3+ years specifically in agentic AI or autonomous systems
  • Proven experience building multi-agent systems from scratch (not just fine-tuning existing models)
  • Deep expertise in: Multi-agent system architectures: Actor model frameworks, distributed consensus protocols, agent communication standards (FIPA-ACL, KQML, MCP, A2A), and coordination patterns (hierarchical, peer-to-peer, marketplace-based)
  • LLM Integration Platforms: OpenAI API, Anthropic Claude API, Azure OpenAI Service, Google Vertex AI, and on-premises LLM deployment (vLLM, TensorRT-LLM, Ollama)
  • Agentic Frameworks: LangChain, LangGraph, AutoGen, CrewAI, Semantic Kernel, and custom agent runtime environments
  • Tool-using AI Systems: Function calling implementations, API integration patterns, IDE (Cursor, Windsurf), Notebooks (Jupyter), tool selection algorithms, and sandbox execution environments for safe tool usage
  • Agent Orchestration Platforms: Kubernetes-based agent deployment, Apache Airflow for agent workflows, Temporal for durable agent executions, Agentspace, and event-driven architectures (Apache Kafka, RabbitMQ)
  • Vector Databases & Knowledge Systems: Pinecone, Weaviate, Chroma, Qdrant for agent memory systems, and knowledge graph technologies (Neo4j, Amazon Neptune, Apache Jena)
  • Real-time Inference Infrastructure: NVIDIA Triton Inference Server, Ray Serve, TorchServe, and streaming architectures for sub-100ms agent response times
Job Responsibility
Job Responsibility
  • Strategic Agentic Architecture & Design: Drive the end-to-end architecture for highly scalable, multi-agent systems that can operate autonomously across complex enterprise workflows
  • Partner with other Principal Engineers, AI Architects, and executive leadership to shape the long-term agentic roadmap
  • Champion best practices for agent reliability, interpretability, safety, and performance optimization
  • Agent Platform Development & Orchestration: Oversee the design and development of new AI agent platforms from the ground up
  • Implement robust agent lifecycle management, including spawning, monitoring, termination, and inter-agent communication protocols
  • Foster an engineering culture that values agent autonomy, emergent intelligence, and continuous learning capabilities
  • Multi-Agent Systems & Emerging AI Technologies: Provide thought leadership on how multi-agent systems, large language models, and reinforcement learning create unique demands on infrastructure
  • Understand how to move AI agents from proof-of-concept to production-ready autonomous systems
  • Evaluate and recommend emerging agentic technologies and guide their integration into the broader technology stack
  • Cross-Functional Leadership & AI Mentoring: Serve as a key technical advisor for C-level executives and product leaders
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right
New

Applied AI Engineer

The Applied AI team at Ramp is at the forefront of leveraging AI to drive innova...
Location
Location
United States , New York, NY (HQ), San Francisco, CA
Salary
Salary:
155000.00 - 339500.00 USD / Year
ramp.com Logo
Ramp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in full-stack development, with a strong understanding of web frameworks, backend systems, and cloud infrastructure
  • A track record of working on full-stack AI projects, particularly those involving production use cases of LLMs
  • Experience building backend systems and infrastructure that can support AI-driven products
Job Responsibility
Job Responsibility
  • Ship full-stack AI projects end to end
  • Build and integrate components for AI infrastructure, supporting production-level inference and fine-tuning
  • Develop and improve engineering processes, tools, and systems to scale AI solutions across Ramp
  • Create tools and internal platforms to enhance the productivity and capabilities of Ramp's AI and engineering teams
What we offer
What we offer
  • 100% medical, dental & vision insurance coverage for you
  • Partially covered for your dependents
  • One Medical annual membership
  • 401k (including employer match on contributions made while employed by Ramp)
  • Flexible PTO
  • Fertility HRA (up to $10,000 per year)
  • Parental Leave
  • Unlimited AI token usage
  • Pet insurance
  • Centralized home-office equipment ordering for all employees
  • Fulltime
Read More
Arrow Right

AI Engineer - Platform/MLOps

At hyperexponential, we’re building the AI-powered platform that enables the wor...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
hyperexponential.com Logo
hyperexponential
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Built and deployed production AI infrastructure that scaled to support enterprise-grade reliability and observability
  • Delivered self-service tools or APIs that enabled multiple product teams to accelerate their AI/ML development cycles
  • Implemented evaluation frameworks, A/B testing infrastructure, or monitoring solutions that measured and improved model performance, latency, cost, and quality in production
  • Led initiatives to reduce AI compute costs through optimisation strategies such as intelligent routing or caching
  • Successfully migrated teams from bespoke AI solutions to shared platform services, driving measurable adoption
  • Prioritised and improved developer experience through documentation, support, or workflow enhancements
Job Responsibility
Job Responsibility
  • Designing and operating scalable AI infrastructure for LLM inference, prompt management, and evaluation pipelines, supporting billions in premium flow
  • Building self-service tools, SDKs, and APIs that empower product teams to move from prototype to production 30% faster
  • Instrumenting production AI/ML workloads with standardised logging, tracing, and evaluation metrics, increasing observability coverage to 100% of deployed models
  • Implementing intelligent routing, caching, and provider optimisation via the LLM gateway, reducing AI compute costs by up to 25%
  • Driving adoption of shared platform services (LLM gateway, evaluation frameworks, monitoring) to replace bespoke solutions, increasing platform adoption across new AI features
  • Championing developer experience by delivering comprehensive documentation and responsive support, resulting in higher internal customer satisfaction
What we offer
What we offer
  • Share Options at a highly successful Series B company
  • 25 days of non-working + Polish bank holidays (B2B) / 26 days of holiday + Polish bank holidays (UoP)
  • £5,000 GBP budget for Learning & Development
  • Mental Health Support and Therapy via Spectrum Life
  • Optional for you: access to Private Healthcare via Luxmed + Multisport (fully funded by yourself as B2B Contractor)
  • Top-spec laptop (MacOS or Windows)
  • Company pension (UoP)
  • Company Sick Pay for 10 days at 100% salary (UoP)
  • Monthly Wellbeing allowance via Juno (UoP)
  • Private Healthcare Insurance via Luxmed (UoP)
  • Fulltime
Read More
Arrow Right

AI Engineer - Platform/MLOps

At hyperexponential, we’re building the AI-powered platform that enables the wor...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
hyperexponential.com Logo
hyperexponential
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Built and deployed production AI infrastructure that scaled to support enterprise-grade reliability and observability
  • Delivered self-service tools or APIs that enabled multiple product teams to accelerate their AI/ML development cycles
  • Implemented evaluation frameworks, A/B testing infrastructure, or monitoring solutions that measured and improved model performance, latency, cost, and quality in production
  • Led initiatives to reduce AI compute costs through optimisation strategies such as intelligent routing or caching
  • Successfully migrated teams from bespoke AI solutions to shared platform services, driving measurable adoption
  • Prioritised and improved developer experience through documentation, support, or workflow enhancements
Job Responsibility
Job Responsibility
  • Designing and operating scalable AI infrastructure for LLM inference, prompt management, and evaluation pipelines, supporting billions in premium flow
  • Building self-service tools, SDKs, and APIs that empower product teams to move from prototype to production 30% faster
  • Instrumenting production AI/ML workloads with standardised logging, tracing, and evaluation metrics, increasing observability coverage to 100% of deployed models
  • Implementing intelligent routing, caching, and provider optimisation via the LLM gateway, reducing AI compute costs by up to 25%
  • Driving adoption of shared platform services (LLM gateway, evaluation frameworks, monitoring) to replace bespoke solutions, increasing platform adoption across new AI features
  • Championing developer experience by delivering comprehensive documentation and responsive support, resulting in higher internal customer satisfaction
What we offer
What we offer
  • £5,000 training and conference budget for individual and group development
  • 25 days of holiday plus 8 bank holidays (33 days total)
  • Company pension scheme via Penfold
  • Mental health support and therapy via Spectrum.life
  • Individual wellbeing allowance via Juno
  • Private healthcare insurance through AXA
  • Income protection and Life Insurance
  • Cycle to Work Scheme
  • Top-spec equipment (laptop, screens, adjustable desks, etc.)
  • Regular remote and in-person hackathons, lunch and learns, socials, and game nights
  • Fulltime
Read More
Arrow Right