AI Inference Intern Job at Perplexity (London)

New

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...

Location

United States , San Francisco

Salary:

300000.00 - 385000.00 USD / Year

Perplexity

Expiration Date

Until further notice

Requirements

5+ years of engineering experience with 2+ years in a technical leadership or management role
Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Track record of building and leading high-performing engineering teams
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Strong technical communication and cross-functional collaboration skills

Job Responsibility

Lead and grow a high-performing team of AI inference engineers
Develop APIs for AI inference used by both internal and external customers
Architect and scale our inference infrastructure for reliability and efficiency
Benchmark and eliminate bottlenecks throughout our inference stack
Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
Improve the reliability and observability of our systems and lead incident response
Own technical decisions around batching, throughput, latency, and GPU utilization
Partner with ML research teams on model optimization and deployment
Recruit, mentor, and develop engineering talent

What we offer

Equity
Health
Dental
Vision
Retirement
Fitness
Commuter and dependent care accounts

Fulltime

New

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...

Location

United States , San Francisco; Palo Alto; New York City

Salary:

210000.00 - 385000.00 USD / Year

Perplexity

Expiration Date

Until further notice

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Responsibility

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

What we offer

equity
health
dental
vision
retirement
fitness
commuter and dependent care accounts

Fulltime

New

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...

Location

United Kingdom , London

Salary:

Not provided

Perplexity

Expiration Date

Until further notice

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Responsibility

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

What we offer

Equity may be part of the total compensation package

Fulltime

New

AI Platform Architect

EverOps partners with enterprise engineering organizations to solve their hardes...

Location

Salary:

Not provided

EverOps

Expiration Date

Until further notice

Requirements

8+ years in Cloud, Platform, SRE, or Infrastructure Engineering roles
Proven experience operating at an Architect level
Strong client-facing and consultative experience
Deep hands-on experience with AWS, including multi-account architectures and governance
Strong knowledge of infrastructure as code (Terraform preferred)
Experience designing secure, scalable platforms in AWS Organizations environments
Practical experience with AI/ML platforms, preferably AWS-native (Bedrock, SageMaker, Glue, Athena, OpenSearch)
Experience with GenAI architectures (RAG, embeddings, vector stores, agent frameworks)
Familiarity with model evaluation, prompt engineering, and inference optimization
Understanding of AI cost drivers and scaling considerations

Job Responsibility

Lead technical workshops to identify, refine, and prioritize high-impact AI and GenAI use cases aligned with business objectives
Translate business problems into system design requirements and AI workflows
Assess existing data platforms, pipelines, governance, and accessibility for AI workloads
Evaluate data quality, lineage, security, and suitability for training, RAG, and inference patterns
Design AI architectures that comply with enterprise security, privacy, and regulatory constraints (PII, PHI, internal policies)
Evaluate and design integrations across APIs, event streams, and existing systems
Evaluate and recommend foundation models and AI services, including Amazon Bedrock, Amazon Nova, and open-source models
Analyze tradeoffs across cost, latency, accuracy, and scalability
Design GenAI patterns such as RAG, agent workflows, and inference pipelines
Produce high-level and detailed AWS reference architectures for prioritized AI use cases

Fulltime

New

Distinguished Engineer- AI Agentics Engineering

At CVS Health, we’re building a world of health around every consumer and surrou...

Location

United States , Woonsocket

Salary:

175100.00 - 334750.00 USD / Year

CVS Health

Expiration Date

March 31, 2026

Requirements

15+ years of Software Engineering experience required
7+ years in AI/ML engineering with 3+ years specifically in agentic AI or autonomous systems
Proven experience building multi-agent systems from scratch (not just fine-tuning existing models)
Deep expertise in: Multi-agent system architectures: Actor model frameworks, distributed consensus protocols, agent communication standards (FIPA-ACL, KQML, MCP, A2A), and coordination patterns (hierarchical, peer-to-peer, marketplace-based)
LLM Integration Platforms: OpenAI API, Anthropic Claude API, Azure OpenAI Service, Google Vertex AI, and on-premises LLM deployment (vLLM, TensorRT-LLM, Ollama)
Agentic Frameworks: LangChain, LangGraph, AutoGen, CrewAI, Semantic Kernel, and custom agent runtime environments
Tool-using AI Systems: Function calling implementations, API integration patterns, IDE (Cursor, Windsurf), Notebooks (Jupyter), tool selection algorithms, and sandbox execution environments for safe tool usage
Agent Orchestration Platforms: Kubernetes-based agent deployment, Apache Airflow for agent workflows, Temporal for durable agent executions, Agentspace, and event-driven architectures (Apache Kafka, RabbitMQ)
Vector Databases & Knowledge Systems: Pinecone, Weaviate, Chroma, Qdrant for agent memory systems, and knowledge graph technologies (Neo4j, Amazon Neptune, Apache Jena)
Real-time Inference Infrastructure: NVIDIA Triton Inference Server, Ray Serve, TorchServe, and streaming architectures for sub-100ms agent response times

Job Responsibility

Strategic Agentic Architecture & Design: Drive the end-to-end architecture for highly scalable, multi-agent systems that can operate autonomously across complex enterprise workflows
Partner with other Principal Engineers, AI Architects, and executive leadership to shape the long-term agentic roadmap
Champion best practices for agent reliability, interpretability, safety, and performance optimization
Agent Platform Development & Orchestration: Oversee the design and development of new AI agent platforms from the ground up
Implement robust agent lifecycle management, including spawning, monitoring, termination, and inter-agent communication protocols
Foster an engineering culture that values agent autonomy, emergent intelligence, and continuous learning capabilities
Multi-Agent Systems & Emerging AI Technologies: Provide thought leadership on how multi-agent systems, large language models, and reinforcement learning create unique demands on infrastructure
Understand how to move AI agents from proof-of-concept to production-ready autonomous systems
Evaluate and recommend emerging agentic technologies and guide their integration into the broader technology stack
Cross-Functional Leadership & AI Mentoring: Serve as a key technical advisor for C-level executives and product leaders

What we offer

Affordable medical plan options
401(k) plan (including matching company contributions)
Employee stock purchase plan
No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
Paid time off
Flexible work schedules
Family leave
Dependent care resources
Colleague assistance programs
Tuition assistance

Fulltime

New

Applied AI Engineer

The Applied AI team at Ramp is at the forefront of leveraging AI to drive innova...

Location

United States , New York, NY (HQ), San Francisco, CA

Salary:

155000.00 - 339500.00 USD / Year

Ramp

Expiration Date

Until further notice

Requirements

Proficiency in full-stack development, with a strong understanding of web frameworks, backend systems, and cloud infrastructure
A track record of working on full-stack AI projects, particularly those involving production use cases of LLMs
Experience building backend systems and infrastructure that can support AI-driven products

Job Responsibility

Ship full-stack AI projects end to end
Build and integrate components for AI infrastructure, supporting production-level inference and fine-tuning
Develop and improve engineering processes, tools, and systems to scale AI solutions across Ramp
Create tools and internal platforms to enhance the productivity and capabilities of Ramp's AI and engineering teams

What we offer

100% medical, dental & vision insurance coverage for you
Partially covered for your dependents
One Medical annual membership
401k (including employer match on contributions made while employed by Ramp)
Flexible PTO
Fertility HRA (up to $10,000 per year)
Parental Leave
Unlimited AI token usage
Pet insurance
Centralized home-office equipment ordering for all employees

Fulltime

AI Engineer - Platform/MLOps

At hyperexponential, we’re building the AI-powered platform that enables the wor...

Location

Poland , Warsaw

Salary:

Not provided

hyperexponential

Expiration Date

Until further notice

Requirements

Built and deployed production AI infrastructure that scaled to support enterprise-grade reliability and observability
Delivered self-service tools or APIs that enabled multiple product teams to accelerate their AI/ML development cycles
Implemented evaluation frameworks, A/B testing infrastructure, or monitoring solutions that measured and improved model performance, latency, cost, and quality in production
Led initiatives to reduce AI compute costs through optimisation strategies such as intelligent routing or caching
Successfully migrated teams from bespoke AI solutions to shared platform services, driving measurable adoption
Prioritised and improved developer experience through documentation, support, or workflow enhancements

Job Responsibility

Designing and operating scalable AI infrastructure for LLM inference, prompt management, and evaluation pipelines, supporting billions in premium flow
Building self-service tools, SDKs, and APIs that empower product teams to move from prototype to production 30% faster
Instrumenting production AI/ML workloads with standardised logging, tracing, and evaluation metrics, increasing observability coverage to 100% of deployed models
Implementing intelligent routing, caching, and provider optimisation via the LLM gateway, reducing AI compute costs by up to 25%
Driving adoption of shared platform services (LLM gateway, evaluation frameworks, monitoring) to replace bespoke solutions, increasing platform adoption across new AI features
Championing developer experience by delivering comprehensive documentation and responsive support, resulting in higher internal customer satisfaction

What we offer

Share Options at a highly successful Series B company
25 days of non-working + Polish bank holidays (B2B) / 26 days of holiday + Polish bank holidays (UoP)
£5,000 GBP budget for Learning & Development
Mental Health Support and Therapy via Spectrum Life
Optional for you: access to Private Healthcare via Luxmed + Multisport (fully funded by yourself as B2B Contractor)
Top-spec laptop (MacOS or Windows)
Company pension (UoP)
Company Sick Pay for 10 days at 100% salary (UoP)
Monthly Wellbeing allowance via Juno (UoP)
Private Healthcare Insurance via Luxmed (UoP)

Fulltime

AI Engineer - Platform/MLOps

At hyperexponential, we’re building the AI-powered platform that enables the wor...

Location

United Kingdom , London

Salary:

Not provided

hyperexponential

Expiration Date

Until further notice

Requirements

Built and deployed production AI infrastructure that scaled to support enterprise-grade reliability and observability
Delivered self-service tools or APIs that enabled multiple product teams to accelerate their AI/ML development cycles
Implemented evaluation frameworks, A/B testing infrastructure, or monitoring solutions that measured and improved model performance, latency, cost, and quality in production
Led initiatives to reduce AI compute costs through optimisation strategies such as intelligent routing or caching
Successfully migrated teams from bespoke AI solutions to shared platform services, driving measurable adoption
Prioritised and improved developer experience through documentation, support, or workflow enhancements

Job Responsibility

Designing and operating scalable AI infrastructure for LLM inference, prompt management, and evaluation pipelines, supporting billions in premium flow
Building self-service tools, SDKs, and APIs that empower product teams to move from prototype to production 30% faster
Instrumenting production AI/ML workloads with standardised logging, tracing, and evaluation metrics, increasing observability coverage to 100% of deployed models
Implementing intelligent routing, caching, and provider optimisation via the LLM gateway, reducing AI compute costs by up to 25%
Driving adoption of shared platform services (LLM gateway, evaluation frameworks, monitoring) to replace bespoke solutions, increasing platform adoption across new AI features
Championing developer experience by delivering comprehensive documentation and responsive support, resulting in higher internal customer satisfaction

What we offer

£5,000 training and conference budget for individual and group development
25 days of holiday plus 8 bank holidays (33 days total)
Company pension scheme via Penfold
Mental health support and therapy via Spectrum.life
Individual wellbeing allowance via Juno
Private healthcare insurance through AXA
Income protection and Life Insurance
Cycle to Work Scheme
Top-spec equipment (laptop, screens, adjustable desks, etc.)
Regular remote and in-person hackathons, lunch and learns, socials, and game nights

Fulltime

AI Inference Intern

Perplexity

Location:
United Kingdom , London

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for AI Inference Intern

Engineering Manager - Inference

AI Inference Engineer

AI Inference Engineer

AI Platform Architect

Distinguished Engineer- AI Agentics Engineering

Applied AI Engineer

AI Engineer - Platform/MLOps

AI Engineer - Platform/MLOps

AI Inference Intern

Perplexity

Location:United Kingdom , London

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:February 21, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for AI Inference Intern

Engineering Manager - Inference

AI Inference Engineer

AI Inference Engineer

AI Platform Architect

Distinguished Engineer- AI Agentics Engineering

Applied AI Engineer

AI Engineer - Platform/MLOps

AI Engineer - Platform/MLOps

Location:
United Kingdom , London

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 21, 2026