CrawlJobs Logo

AI Inference Engineer

perplexity.ai Logo

Perplexity

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

210000.00 - 385000.00 USD / Year

Job Description:

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Job Responsibility:

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

Requirements:

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
What we offer:
  • equity
  • health
  • dental
  • vision
  • retirement
  • fitness
  • commuter and dependent care accounts

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Inference Engineer

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...
Location
Location
Canada; United States
Salary
Salary:
300000.00 - 450000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
  • Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
  • Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
  • Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
  • Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
  • Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
  • Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
  • Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
  • Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
  • Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency
Job Responsibility
Job Responsibility
  • Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
  • Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
  • Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
  • Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
  • Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
  • Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
  • Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
  • Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
  • Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
  • Ensure robust observability, monitoring, and safety guardrails for all AI systems in production
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA
  • Fulltime
Read More
Arrow Right

Data Engineer – AI Insights

We are looking for an experienced Data Engineer with AI Insights to design and d...
Location
Location
United States
Salary
Salary:
Not provided
thirdeyedata.ai Logo
Thirdeye Data
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of Data Engineering experience with exposure to AI/ML workflows
  • Advanced expertise in Python programming and SQL
  • Hands-on experience with Snowflake (data warehousing, schema design, performance tuning)
  • Experience building scalable ETL/ELT pipelines and integrating structured/unstructured data
  • Familiarity with LLM and RAG workflows, and how data supports these AI applications
  • Experience with reporting/visualization tools (Tableau)
  • Strong problem-solving, communication, and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Develop and optimize ETL/ELT pipelines using Python, SQL, and Snowflake to ensure high-quality data for analytics, AI, and LLM workflows
  • Build and manage Snowflake data models and warehouses, focusing on performance, scalability, and security
  • Collaborate with AI/ML teams to prepare datasets for model training, inference, and LLM/RAG-based solutions
  • Automate data workflows, validation, and monitoring for reliable AI/ML execution
  • Support RAG pipelines and LLM data integration, enabling AI-driven insights and knowledge retrieval
  • Partner with business and analytics teams to transform raw data into actionable AI-powered insights
  • Contribute to dashboarding and reporting using Tableau, Power BI, or equivalent tools
  • Fulltime
Read More
Arrow Right

AI Software Engineer

Join Qargo as an AI Software Engineer and help build intelligent, user-centric A...
Location
Location
Belgium , Ghent
Salary
Salary:
Not provided
qargo.com Logo
Qargo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Min. 2 years of experience in software engineering, applied AI, or similar technical roles
  • Strong programming skills (preferably Python and/or modern backend languages)
  • Experience with AI/ML tools and frameworks such as PyTorch, Hugging Face, LangChain/LangGraph, vector databases, and inference tooling
  • Proven experience deploying and operating AI/ML systems in a production environment
  • Ability to experiment quickly, iterate fast, and validate assumptions
  • Strong problem-solving skills and the ability to work autonomously in a fast-paced environment
  • Clear communication skills and the ability to collaborate with engineers, product managers, and domain experts
Job Responsibility
Job Responsibility
  • Evaluate and prototype with new AI models and techniques to solve document, workflow, and conversational tasks
  • Bring AI prototypes to production, ensuring quality, scalability, and observability
  • Monitor and maintain AI systems running in production, optimising cost, latency, and reliability
  • Collaborate with cross-functional teams to define clear AI tasks (e.g., document classification, summarisation, task prediction)
  • Develop and enhance AI-driven features such as document extraction, matching flows, quality checks, chatbots, and automated bookings
  • Stay up to date with advancements in AI and identify opportunities to improve the product
What we offer
What we offer
  • Real impact and ownership in a growing international scale-up
  • A supportive and collaborative team culture
  • Hybrid working setup with flexibility and trust
  • Opportunities to learn, grow, and expand your technical knowledge
  • Competitive salary and benefits package
Read More
Arrow Right

Senior Software Engineer – AI

NStarX is seeking a highly skilled Senior Software Engineer – AI with a strong f...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field (PhD is a plus)
  • 9+ years of experience in AI/ML engineering or related roles
  • 3+ years of experience in Generative AI with team leadership responsibilities
  • Proven track record of production-grade ML and GenAI model development and deployment
  • Programming: Python (preferred)
  • GenAI Frameworks: Hugging Face Transformers, Diffusers, LangChain, TGI
  • Serving & Inference: FastAPI, gRPC, NVIDIA Triton, TorchServe
  • Cloud Platforms: AWS (SageMaker, EKS), GCP (Vertex AI, GKE), Azure (Azure ML, AKS)
  • MLOps & DevOps: Kubeflow, MLflow, GitHub Actions, Jenkins, Helm, Terraform
  • Optimization Techniques: Model quantization, distillation, pipeline and tensor parallelism
Job Responsibility
Job Responsibility
  • Design, develop, and deploy machine learning models and AI algorithms to address complex business challenges
  • Lead and mentor a team of AI/ML engineers, ensuring quality and scalability in solution design and implementation
  • Collaborate closely with cross-functional teams including data scientists, software engineers, product managers, and UX designers
  • Lead the development and deployment of Generative AI applications across text, code, image, and audio modalities using state-of-the-art LLMs
  • Design and implement CI/CD pipelines for the GenAI model lifecycle including training, validation, packaging, and deployment
  • Apply best practices for model performance tuning, cost optimization, and scalable deployment in cloud and hybrid environments
  • Develop prompt engineering, fine-tuning strategies (LoRA, QLoRA, PEFT), and evaluation protocols tailored to business use cases
  • Stay current with emerging trends in AI, ML, and Generative AI and drive adoption across teams
  • Document processes, model architectures, and deployment strategies for traceability and knowledge sharing
  • Work closely with cross-functional teams to gather requirements and deliver high-quality solutions
What we offer
What we offer
  • Competitive salary aligned with market standards
  • Opportunities for professional development and skill enhancement
  • A collaborative and innovative work environment
  • Fulltime
Read More
Arrow Right

AI Software Engineer III

Planet DDS is a leading provider of a platform of cloud-based solutions that emp...
Location
Location
United Kingdom , Glasgow
Salary
Salary:
Not provided
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-7 years of professional software engineering experience
  • At least 4 years in AI/ML-focused roles
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Artificial Intelligence, or related field
  • Experience working in a SaaS or enterprise software environment
  • Publications or contributions to open-source AI/ML projects
  • Exposure to reinforcement learning, generative AI (LLMs, diffusion models), or real-time inference systems
Job Responsibility
Job Responsibility
  • Design, develop, and deploy AI and machine learning models in production environments
  • Architect scalable solutions that integrate AI capabilities into our products and services
  • Collaborate with data scientists, product managers, and backend/front-end engineers to translate prototypes into reliable, maintainable code
  • Own end-to-end development of AI systems, including data ingestion, model training, evaluation, and deployment
  • Implement best practices in model versioning, monitoring, and continuous improvement
  • Contribute to the evolution of our AI/ML infrastructure, including CI/CD pipelines and MLOps tools
  • Stay current on advancements in AI, ML, and deep learning and assess their applicability to business needs
  • Ensure AI solutions are ethical, interpretable, and aligned with regulatory requirements
  • Fulltime
Read More
Arrow Right

Principal AI Engineer

We are looking for a Principal AI Engineer to lead the design and deployment of ...
Location
Location
United States
Salary
Salary:
200000.00 - 300000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience
  • at least 3 years in applied LLM or agentic AI systems (2023–present)
  • proven success in deploying LLM-powered products used by real users at scale
  • deep backend & systems engineering expertise with Python, distributed systems, and scalable APIs
  • familiarity with LangChain, LlamaIndex, or similar orchestration frameworks
  • experience with RAG pipelines, vector DBs, embedding models, and semantic search tuning
  • experience managing performance across cloud providers (e.g., AWS Bedrock, OpenAI, Anthropic, etc.)
  • demonstrated experience building multi-step agents, planning workflows, chaining reasoning steps, and integrating APIs with agent memory/state
  • comfort with advanced prompting strategies, few-shot and chain-of-thought reasoning, and embedding retrieval setups
  • strong understanding of AI system evaluation, human ratings, A/B experimentation, and feedback loop pipelines
Job Responsibility
Job Responsibility
  • Architect and lead the development of multi-agent systems capable of long-horizon planning, reasoning, and API orchestration
  • build reusable agentic components that integrate deeply into sales and marketing processes
  • own and evolve our in-house platform for scalable, low-latency, and cost-efficient LLM and agent deployments
  • lead design of interfaces powered by natural language understanding and retrieval-augmented generation (RAG)
  • build embedding-based, intent-aware search and personalization systems tuned to business user needs
  • drive innovation in personalized outreach generation using context-aware generation pipelines
  • tune inference pipelines, caching layers, and model selection logic for high-scale, cost-aware performance
  • define and drive robust offline and online testing methodologies (A/B, sandboxing, human evals) across agents and LLM flows
  • architect human-in-the-loop systems and telemetry to improve accuracy, UX, and explainability over time
What we offer
What we offer
  • equity
  • company bonus or sales commissions/bonuses
  • 401(k) plan
  • at least 10 paid holidays per year
  • flex PTO
  • parental leave
  • employee assistance program
  • wellbeing benefits
  • global travel coverage
  • life/AD&D/STD/LTD insurance
  • Fulltime
Read More
Arrow Right

Senior Devops & AI Engineer

This role presents a unique opportunity to contribute to the future of impactful...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
fissionlabs.com Logo
Fission Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • 6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred)
  • Hands-on experience with operations (DevSecOps) principles and best practices
  • Proficiency in scripting languages such as Python, PowerShell, or Bash
  • Excellent communication and collaboration skills
  • In-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration
  • Hands-on experience with a wide range of AWS and Azure services
  • Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation
  • Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
  • Should have worked AIOps/MLOP
Job Responsibility
Job Responsibility
  • Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration
  • Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness
  • Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services
  • Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes
What we offer
What we offer
  • Opportunity to work on impactful technical challenges with global reach
  • Vast opportunities for self-development, including online university access and knowledge sharing opportunities
  • Sponsored Tech Talks & Hackathons to foster innovation and learning
  • Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more
  • Supportive work environment with forums to explore passions beyond work
  • Fulltime
Read More
Arrow Right

Artificial (AI) Engineer

VELOX is hiring an AI Developer to help design and implement intelligent systems...
Location
Location
United States , Boise
Salary
Salary:
Not provided
veloxmedia.com Logo
VELOX Media
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in Python (Pandas, NumPy, scikit-learn, etc.)
  • Experience with deep learning frameworks such as TensorFlow or PyTorch
  • Hands-on experience with natural language processing, retrieval-augmented generation (RAG), or LLMs (e.g., OpenAI, Claude, Mistral)
  • Understanding of data pipelines, model deployment, and performance monitoring
  • Experience working with APIs and integrating ML models into production systems
  • Familiarity with vector databases (e.g., Pinecone, Weaviate, FAISS) and embedding generation
  • Comfort working in cloud environments (GCP, AWS, or Azure)
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • 3+ years of experience in applied AI/ML roles
  • Track record of launching AI tools or systems into production
Job Responsibility
Job Responsibility
  • Research, design, and deploy AI/ML models that drive value across client-facing and internal applications
  • Build tools that support predictive analytics, natural language querying, and campaign automation
  • Collaborate with product and engineering teams to integrate AI functionality into web platforms
  • Integrate AI solutions with our PHP/Laravel backend and MySQL databases via REST APIs or microservices
  • Write clean, scalable code for inference pipelines, model training, and testing environments
  • Monitor model performance and retrain or refine when necessary
  • Stay ahead of LLMs, vector DBs, and open-source innovations to enhance our AI roadmap
  • Contribute to a long-term AI strategy that makes VELOX more automated, intelligent, and insightful
What we offer
What we offer
  • Competitive compensation and performance bonuses
  • Health insurance & 401k options
  • Paid vacation and holidays
  • Casual dress and regular team events
  • On-site gym and personal trainer access
  • Kombucha on tap
  • Fulltime
Read More
Arrow Right