AI Inference Engineer Job at Perplexity (San Francisco)

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...

Location

United Kingdom , London

Salary:

Not provided

Perplexity

Expiration Date

Until further notice

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Responsibility

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

What we offer

Equity may be part of the total compensation package

Fulltime

Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)

At Capital One, we are creating responsible and reliable AI systems, changing ba...

Location

United States , San Jose, California; San Francisco, California; New York, New York; Cambridge, Massachusetts; McLean, Virginia

Salary:

229900.00 - 286200.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 6 years of experience developing AI and ML algorithms or technologies, or a Master's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 4 years of experience developing AI and ML algorithms or technologies
At least 6 years of experience programming with Python, Go, Scala, or Java

Job Responsibility

Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One
Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One

What we offer

Cash bonus(es)
Long term incentives (LTI)
Comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being

Fulltime

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...

Location

Canada , Markham

Salary:

106400.00 - 159600.00 CAD / Year

AMD

Expiration Date

Until further notice

Requirements

Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline

Job Responsibility

Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.

Fulltime

Principal Engineer, AI Inference Reliability

We’re looking for a hands-on Reliability Tech Lead (IC) to own the mission of ma...

Location

United States; Canada , Sunnyvale; Toronto

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

Bachelor's or master's degree in computer science or related field
7+ years of experience in backend, infrastructure, or reliability engineering for large-scale distributed systems
Strong programming skills in at least one popular backend programming language such as Python, C++, Go, or Rust
Deep and hard-earned experience of reliability principles: SLO/SLI/SLA design, incident response, and postmortem culture
Excellent communication and cross-functional leadership skills

Job Responsibility

Define and drive reliability strategy: establish SLOs and ensure alignment across engineering
Design and implement reliability mechanisms: build and evolve systems for fault detection, graceful degradation, failover, throttling, and recovery across multiple regions and data centers
Lead large-scale incident management: own postmortems, root-cause analysis, and prevention loops for reliability-related incidents
Architect for reliability and observability: influence system design for redundancy, durability, and debuggability
Develop reliability tooling: create internal tools and frameworks for chaos testing, load simulation, and distributed fault injection
Collaborate broadly: work across software, infrastructure, and hardware teams to ensure reliability is embedded into every layer of our inference service
Monitor and communicate reliability metrics: build dashboards and alerts that measure service health and provide actionable insights
Mentor and influence: guide engineers and set best practices for designing, testing, and operating reliable large-scale systems

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Fulltime

Sr. Deployment Engineer, AI Inference

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. ...

Location

United States; Canada , Sunnyvale; Toronto

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

5-7 years of experience in operating on-prem compute infrastructure (ideally in Machine Learning or High-Performance Compute) or in developing and managing complex AWS plane infrastructure for hybrid deployments
Strong proficiency in Python for automation, orchestration, and deployment tooling
Solid understanding of Linux-based systems and command-line tools
Extensive knowledge of Docker containers and container orchestration platforms like K8S
Familiarity with spine-leaf (Clos) networking architecture
Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana
Strong ownership mindset and accountability for complex deployments
Ability to work effectively in a fast-paced environment.

Job Responsibility

Deploy AI inference replicas and cluster software across multiple datacenters
Operate across heterogeneous datacenter environments undergoing rapid 10x growth
Maximize capacity allocation and optimize replica placement using constraint-solver algorithms
Operate bare-metal inference infrastructure while supporting transition to K8S-based platform
Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale
Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale
Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams
Stay up to date with the latest advancements in AI compute infrastructure and related technologies.

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs.

Fulltime

Deployment Engineer, AI Inference

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. ...

Location

United States; Canada , Sunnyvale; Toronto

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

2-5 years of experience in operating on-prem compute infrastructure (ideally in Machine Learning or High-Performance Compute) or in developing and managing complex AWS plane infrastructure for hybrid deployments
Strong proficiency in Python for automation, orchestration, and deployment tooling
Solid understanding of Linux-based systems and command-line tools
Extensive knowledge of Docker containers and container orchestration platforms like K8S
Familiarity with spine-leaf (Clos) networking architecture
Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana
Strong ownership mindset and accountability for complex deployments
Ability to work effectively in a fast-paced environment

Job Responsibility

Deploy AI inference replicas and cluster software across multiple datacenters
Operate across heterogeneous datacenter environments undergoing rapid 10x growth
Maximize capacity allocation and optimize replica placement using constraint-solver algorithms
Operate bare-metal inference infrastructure while supporting transition to K8S-based platform
Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale
Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale
Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams
Stay up to date with the latest advancements in AI compute infrastructure and related technologies

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Fulltime

Senior Lead Ai Engineer (Mlx, Agentic Ai, Gen Ai Platform Services)

Location

United States , San Francisco; New York; San Jose; Cambridge; McLean

Salary:

229900.00 - 286200.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 6 years of experience developing AI and ML algorithms or technologies, or a Master's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 4 years of experience developing AI and ML algorithms or technologies
At least 6 years of experience programming with Python, Go, Scala, or Java

Job Responsibility

Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One
Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One

What we offer

Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being

Fulltime

Lead AI Engineer (MLX, Agentic AI, Gen AI platform Services)

Location

United States , New York; San Francisco; San Jose; Cambridge; McLean

Salary:

197300.00 - 245600.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 4 years of experience developing AI and ML algorithms or technologies, or a Master's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 2 years of experience developing AI and ML algorithms or technologies
At least 4 years of experience programming with Python, Go, Scala, or Java

Job Responsibility

Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One
Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc
Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One

What we offer

performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
comprehensive, competitive, and inclusive set of health, financial and other benefits

Fulltime

Select Country

AI Inference Engineer

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

AI Inference Engineer

AI Inference Engineer

Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)

AI Systems Engineer – AI Model (Training & Inference)

Principal Engineer, AI Inference Reliability

Sr. Deployment Engineer, AI Inference

Deployment Engineer, AI Inference

Senior Lead Ai Engineer (Mlx, Agentic Ai, Gen Ai Platform Services)

Lead AI Engineer (MLX, Agentic AI, Gen AI platform Services)

Our AI answers in your language