CrawlJobs Logo

Engineering Manager - Inference

perplexity.ai Logo

Perplexity

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

300000.00 - 385000.00 USD / Year

Job Description:

We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs, serving millions of users with state-of-the-art AI capabilities. You will own the technical direction and execution of our inference systems while building and leading a world-class team of inference engineers. Our current stack includes Python, PyTorch, Rust, C++, and Kubernetes. You will help architect and scale the large-scale deployment of machine learning models behind Perplexity's Comet, Sonar, Search, Deep Research products.

Job Responsibility:

  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
  • Establish team processes, engineering standards, and operational excellence

Requirements:

  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills

Nice to have:

  • Experience with CUDA, Triton, or custom kernel development
  • Background in training infrastructure and RL workloads
  • Experience with Kubernetes and container orchestration at scale
  • Published work or contributions to inference optimization research
What we offer:
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager - Inference

Engineering Manager - Machine Learning

We’re looking for an experienced Engineering Manager to lead the ML Soundtrack t...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
epidemicsound.com Logo
Epidemic Sound
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep ML engineering background with hands-on experience in generative diffusion models for audio/music (including PyTorch and modern training stacks)
  • Proven experience deploying ML systems into production at scale, with a focus on latency, stability, and cost
  • Strong ML system design and architecture skills across the full machine learning lifecycle
  • Track record of managing engineering teams
  • Demonstrated ability to set clear goals, manage performance, and grow engineers through mentorship and feedback
Job Responsibility
Job Responsibility
  • Own the technical roadmap and model strategy for generative music, including diffusion and transformer-based approaches
  • Lead the full lifecycle from research to production, championing training, evaluation, and deployment for real-time inference
  • Drive the productionisation of inference through model optimisation (distillation, quantisation), caching, and cost controls
  • Build and maintain team health through effective rituals, 1:1s, and fostering a psychologically safe, high-ownership culture
  • Manage cross-team dependencies and delivery with data, MLOps, and product engineering teams
  • Fulltime
Read More
Arrow Right

Engineering Manager - Machine Learning Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
241200.00 - 400000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8–10 years of experience in ML infrastructure, including direct hands-on expertise as an engineer, IC/TL
  • 2+ years of experience managing infrastructure or ML platform engineers
  • Proven experience delivering and operating ML or AI infrastructure at scale
  • Solid technical depth across ML/AI infrastructure domains (e.g., feature stores, pipelines, deployment, inference, observability)
  • Demonstrated ability to drive execution on complex technical projects with cross-team stakeholders
  • Strong communication and stakeholder management skills
Job Responsibility
Job Responsibility
  • Lead and support the ML Infra team, driving project execution and ensuring delivery on key commitments
  • Build and launch Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Define and drive adoption of an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines, deployment tooling, and inference systems
  • Partner with ML product teams to understand requirements and deliver solutions that accelerate model development and iteration
  • Recruit, mentor, and develop engineers, fostering a collaborative and high-performing team culture
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineering Manager, Gen AI

We're seeking a Senior Machine Learning Manager (M60) to lead a cross-functional...
Location
Location
United States
Salary
Salary:
193500.00 - 303150.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in ML, search, or backend engineering roles, with 3+ years leading teams
  • Strong track record of shipping ML-powered or LLM-integrated user-facing products
  • Experience with RAG systems (vector search, hybrid retrieval, LLM orchestration)
  • Deep experience in either modeling (e.g., LLMs, search, NLP) or engineering (e.g., backend infra, full-stack), with the ability to lead end-to-end
  • Deep understanding of LLM ecosystems (OpenAI, Claude, Mistral, OSS), orchestration frameworks (LangChain, LlamaIndex), and vector databases (Weaviate, Pinecone, FAISS, etc.)
  • Strong product intuition and ability to translate complex tech into valuable user features
  • Familiarity with GenAI evaluation methods: hallucination detection, groundedness scoring, and human-in-the-loop feedback loops
  • Master’s or PhD in Computer Science, Machine Learning, or related field preferred—or equivalent practical experience
Job Responsibility
Job Responsibility
  • Lead the vision, design, and execution of LLM-powered AI products, leveraging advance AI modeling (e.g. SLM post-training/fine-tuning), RAG architectures and hybrid ranking system
  • Define system architecture across retrievers, rankers, orchestration layers, prompt templates, and feedback mechanisms
  • Work closely with product and design teams to ensure delightful, fast, and grounded user experiences
  • Build and manage a cross-disciplinary team including ML engineers, backend/frontend engineers, and applied scientists
  • Foster a culture of E2E ownership — empowering the team to move from prototype to production quickly and iteratively
  • Mentor individuals to grow in both technical depth and product acumen
  • Shape the technical roadmap and long-term strategy for GenAI search across Atlassian’s product suite
  • Partner with platform and infra teams to scale inference, evaluate performance, and integrate usage signals for continuous improvement
  • Champion data quality, grounding, and responsible AI practices in all deployed features
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Staff Product Manager, Managed Inference

As a core member of the Crusoe Managed AI Services team, you will own the comple...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
204000.00 - 247000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in technical product management or engineering roles with product responsibilities
  • Experience building and launching cloud infrastructure, platform, or AI/ML services used in production
  • Strong understanding of cloud infrastructure (e.g., AWS, GCP, Azure) and modern compute architectures
  • Familiarity with the machine learning lifecycle, particularly model deployment, inference, and monitoring
  • Strong communication and collaboration skills, with experience working across engineering, product, and business teams
  • Demonstrated ability to operate independently with strong product judgment and a bias for action
  • Bachelor’s degree in Computer Science or a related technical field (or equivalent experience)
Job Responsibility
Job Responsibility
  • Own the end-to-end product lifecycle for Crusoe’s Managed Inference services, including roadmap definition, execution, and iteration
  • Translate customer needs, market signals, and technical constraints into clear product requirements and prioritization
  • Partner closely with Engineering, Infrastructure, and Platform teams to deliver scalable, reliable inference services
  • Drive product decisions across performance, reliability, cost efficiency, and developer experience
  • Define and track success metrics for inference services in production environments
  • Collaborate with go-to-market teams to support product launches, positioning, and customer adoption
  • Communicate product strategy and tradeoffs clearly to cross-functional partners and leadership
What we offer
What we offer
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Staff Product Manager, Managed Inference

As a core member of the Crusoe Managed AI Services team, you will own the comple...
Location
Location
United States , San Francisco; Sunnyvale; New York
Salary
Salary:
204000.00 - 247000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in technical product management or engineering roles with product responsibilities
  • Experience building and launching cloud infrastructure, platform, or AI/ML services used in production
  • Strong understanding of cloud infrastructure (e.g., AWS, GCP, Azure) and modern compute architectures
  • Familiarity with the machine learning lifecycle, particularly model deployment, inference, and monitoring
  • Strong communication and collaboration skills, with experience working across engineering, product, and business teams
  • Demonstrated ability to operate independently with strong product judgment and a bias for action
  • Bachelor’s degree in Computer Science or a related technical field (or equivalent experience)
Job Responsibility
Job Responsibility
  • Own the end-to-end product lifecycle for Crusoe’s Managed Inference services, including roadmap definition, execution, and iteration
  • Translate customer needs, market signals, and technical constraints into clear product requirements and prioritization
  • Partner closely with Engineering, Infrastructure, and Platform teams to deliver scalable, reliable inference services
  • Drive product decisions across performance, reliability, cost efficiency, and developer experience
  • Define and track success metrics for inference services in production environments
  • Collaborate with go-to-market teams to support product launches, positioning, and customer adoption
  • Communicate product strategy and tradeoffs clearly to cross-functional partners and leadership
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Director of Engineering, Cloud Availability

As the Director of Engineering, Cloud Availability, you will lead our engineerin...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of engineering leadership experience with a proven track record of managing high-performing technical teams
  • Deep technical knowledge of public cloud infrastructure and experience building or operating large-scale platforms (Public, Private, or Hybrid)
  • Expert-level understanding of availability, observability, SLIs/SLOs, and modern incident management frameworks
  • Proven ability to lead remote teams and successfully collaborate with US-based engineering organizations
  • Demonstrated success navigating and leading within a matrix organizational structure
  • Strong familiarity with virtual and managed Kubernetes platforms, such as EKS, GKE, or AKS
  • The ability to balance long-term organizational strategy with the immediate tactical needs of a fast-growing engineering site
Job Responsibility
Job Responsibility
  • Organizational Leadership: Partner closely with Data Center, Network, and SRE teams to build and scale a world-class engineering organization in Dublin
  • Site Leadership & Culture: Serve as the primary point of contact and face of Crusoe leadership in Dublin, proactively managing office sentiment and ensuring the team remains focused on high-impact objectives
  • Global Strategic Alignment: Build high-trust partnerships with US-based leadership to ensure local priorities are perfectly synchronized with the global business roadmap
  • Operational Excellence: Implement and refine "follow-the-sun" protocols to enable smooth hand-offs between time zones, ensuring zero customer disruption and 24/7 reliability
  • Unified Team Vision: Foster a "one-team" mindset across geographic boundaries, breaking down silos and promoting deep collaboration between Dublin and US offices
  • Talent Development: Level up the Dublin engineering team by identifying individual strengths and establishing a culture of mentorship to grow the next generation of Engineering Leads and ICs
  • Reliability Initiatives: Lead the development of SRE functions for IaaS and managed services, including Inference, SLURM, and automated cluster management
What we offer
What we offer
  • pension contributions
  • private health and dental insurance
  • income protection
  • life assurance
  • Fulltime
Read More
Arrow Right

Director of Engineering, Cloud Availability

As the Director of Engineering, Cloud Availability, you will lead our engineerin...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of engineering leadership experience with a proven track record of managing high-performing technical teams
  • Deep technical knowledge of public cloud infrastructure and experience building or operating large-scale platforms (Public, Private, or Hybrid)
  • Expert-level understanding of availability, observability, SLIs/SLOs, and modern incident management frameworks
  • Proven ability to lead remote teams and successfully collaborate with US-based engineering organizations
  • Demonstrated success navigating and leading within a matrix organizational structure
  • Strong familiarity with virtual and managed Kubernetes platforms, such as EKS, GKE, or AKS
  • The ability to balance long-term organizational strategy with the immediate tactical needs of a fast-growing engineering site
Job Responsibility
Job Responsibility
  • Organizational Leadership: Partner closely with Data Center, Network, and SRE teams to build and scale a world-class engineering organization in Dublin
  • Site Leadership & Culture: Serve as the primary point of contact and face of Crusoe leadership in Dublin, proactively managing office sentiment and ensuring the team remains focused on high-impact objectives
  • Global Strategic Alignment: Build high-trust partnerships with US-based leadership to ensure local priorities are perfectly synchronized with the global business roadmap
  • Operational Excellence: Implement and refine "follow-the-sun" protocols to enable smooth hand-offs between time zones, ensuring zero customer disruption and 24/7 reliability
  • Unified Team Vision: Foster a "one-team" mindset across geographic boundaries, breaking down silos and promoting deep collaboration between Dublin and US offices
  • Talent Development: Level up the Dublin engineering team by identifying individual strengths and establishing a culture of mentorship to grow the next generation of Engineering Leads and ICs
  • Reliability Initiatives: Lead the development of SRE functions for IaaS and managed services, including Inference, SLURM, and automated cluster management
What we offer
What we offer
  • pension contributions
  • private health and dental insurance
  • income protection
  • life assurance
  • Fulltime
Read More
Arrow Right

Ml Ops Engineer

We are hiring a ML Ops Engineer for our GCC client — Europe’s top retail brands....
Location
Location
India , Bangalore
Salary
Salary:
Not provided
srkay.com Logo
SRKay Consulting Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Workflow Management: Experience in managing Apache Airflow and Composer to support the Data Engineering components of grounded AI solutions
  • MLflow: Deep knowledge of MLflow Tracking, Projects, and Registry. Experience migrating MLflow backends between cloud providers
  • Workflow Tools: Familiarity with Vertex AI Pipelines and Azure DevOps for automation
  • GCP AI Services: Practical experience with Vertex AI (Workbench, Model Garden, Feature Store) and BigQuery ML
  • Containerization: Expert-level Docker and Kubernetes (GKE/AKS) skills. Must understand K8s operators and resource management for ML workloads
  • Infrastructure as Code (IaC): Proficiency in Terraform to manage reproducible cloud environments
  • Programming: Advanced Python skills with a focus on software engineering best practices (unit testing, modular design)
  • Data Engineering: Experience with Change Data Capture (CDC), Spark/PySpark, and optimizing data flow from BigQuery to training nodes
  • Access Control: Knowledge of IAM roles, VPC Service Controls, and securing ML endpoints
  • Experience with LLMOps (managing large-scale foundation models, prompt versioning, and vector database scaling)
Job Responsibility
Job Responsibility
  • Pipeline Orchestration: Design, develop, and maintain complex ML workflows using Apache Airflow (Cloud Composer) to automate data ingestion, preprocessing, and model training
  • Lifecycle Management: Administer and scale MLflow for experiment tracking, model packaging, and maintaining a centralized Model Registry across the organization
  • Cloud & Hybrid Ops: Create and optimize training environments for custom ML/LLM models
  • Model Serving & Scaling: Architect high-performance inference endpoints and serve models via FastAPI/Flask with API Gateway
  • Infrastructure Management: Manage auto-scaling CUDA clusters on Google Kubernetes Engine (GKE)
  • CI/CD: Manage end-to-end delivery with Continuous Integration & Continuous Delivery (CI/CD)
  • Observability & Monitoring: Build dashboards to track model health, latency, and data drift
  • Fulltime
Read More
Arrow Right