CrawlJobs Logo

LLM Inference Frameworks and Optimization Engineer

together.ai Logo

Together AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

160000.00 - 230000.00 USD / Year

Job Description:

At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency. We are seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and language models at scale. This role will focus on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design, ensuring efficient large-scale deployment of LLMs and vision models.

Job Responsibility:

  • Design and develop fault-tolerant, high-concurrency distributed inference engine for text, image, and multimodal generation models
  • Implement and optimize distributed inference strategies, including Mixture of Experts (MoE) parallelism, tensor parallelism, pipeline parallelism for high-performance serving
  • Apply CUDA graph optimizations, TensorRT/TRT-LLM graph optimizations, and PyTorch-based compilation (torch.compile), and speculative decoding to enhance efficiency and scalability
  • Collaborate with hardware teams on performance bottleneck analysis, co-optimize inference performance for GPUs, TPUs, or custom accelerators
  • Work closely with AI researchers and infrastructure engineers to develop efficient model execution plans and optimize E2E model serving pipelines

Requirements:

  • 3+ years of experience in deep learning inference frameworks, distributed systems, or high-performance computing
  • Familiar with at least one LLM inference frameworks (e.g., TensorRT-LLM, vLLM, SGLang, TGI(Text Generation Inference))
  • Background knowledge and experience in at least one of the following: GPU programming (CUDA/Triton/TensorRT), compiler, model quantization, and GPU cluster scheduling
  • Deep understanding of KV cache systems like Mooncake, PagedAttention, or custom in-house variants
  • Proficient in Python and C++/CUDA for high-performance deep learning inference
  • Deep understanding of Transformer architectures and LLM/VLM/Diffusion model optimization
  • Knowledge of inference optimization, such as workload scheduling, CUDA graph, compiled, efficient kernels
  • Strong analytical problem-solving skills with a performance-driven mindset
  • Excellent collaboration and communication skills across teams

Nice to have:

  • Experience in developing software systems for large-scale data center networks with RDMA/RoCE
  • Familiar with distributed filesystem(e.g., 3FS, HDFS, Ceph)
  • Familiar with open source distributed scheduling/orchestration frameworks, such as Kubernetes (K8S)
  • Contributions to open-source deep learning inference projects
What we offer:
  • competitive compensation
  • startup equity
  • health insurance
  • other competitive benefits

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLM Inference Frameworks and Optimization Engineer

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...
Location
Location
Canada; United States
Salary
Salary:
300000.00 - 450000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
  • Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
  • Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
  • Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
  • Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
  • Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
  • Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
  • Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
  • Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
  • Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency
Job Responsibility
Job Responsibility
  • Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
  • Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
  • Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
  • Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
  • Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
  • Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
  • Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
  • Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
  • Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
  • Ensure robust observability, monitoring, and safety guardrails for all AI systems in production
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA
  • Fulltime
Read More
Arrow Right

Senior Product Manager, AI Agents

This role owns AI research, messaging, and context—spanning both the user experi...
Location
Location
United States
Salary
Salary:
187000.00 - 250000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in product management
  • 2+ years experience launching AI/ML new products and scaling existing products
  • Track record of shipping AI features that drove measurable business outcomes
  • Experience with LLM-powered applications, prompt engineering, evaluation frameworks, and model selection tradeoffs
  • Comfortable working in Python/SQL to analyze data, prototype prompts, and evaluate outputs
  • Understanding of LLM architectures, RAG pipelines, agent frameworks, and inference optimization
  • Obsession with quality over speed
  • GTM or sales tech experience (strongly preferred)
  • Familiarity with sales workflows, prospecting tools, or CRM systems
  • Understanding of why sales teams are skeptical of AI tools and what it takes to earn their trust
Job Responsibility
Job Responsibility
  • Develop and execute a strategic roadmap for AI research, messaging, and context capabilities
  • Enhance Apollo's AI research agents to surface actionable insights from the web
  • Define how AI understands each user's business
  • Own AI-powered messaging tools that create personalized, context-aware emails at scale
  • Build and scale evaluation infrastructure across accuracy, relevance, clarity, and tone
  • Partner with engineering, design, prompt writers, and sales to deliver cohesive AI experiences
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

AI Software Engineer - NLP/LLM

At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s o...
Location
Location
United States , New York
Salary
Salary:
159300.00 - 230850.00 USD / Year
moodys.com Logo
Moody's
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of demonstrated experience building production-grade machine learning systems with measurable impacts
  • expertise in NLP and search and recommendation systems is preferred
  • Hands-on experience with large language model (LLM) applications and AI agents, including retrieval-augmented generation, prompt optimization, fine-tuning, agent design, and evaluation methodologies
  • familiarity with prompt optimization frameworks like DSPy is preferred
  • Deep expertise in machine learning models and systems design, including classic models (e.g., XGBoost), modern deep learning and graph machine learning architectures (e.g., transformers-based models, graph neural networks (GNN)), and reinforcement learning systems
  • Proven ability to take models and agents from research to production, including optimization for latency and cost, implementation of monitoring and tracing, and development of reusable platforms or frameworks
  • Strong technical leadership and mentorship skills, with a track record of growing engineers, improving team velocity through automation, documentation, and tooling, and influencing architectural decisions without direct authority
  • Excellent communication and strategic thinking abilities, capable of aligning technical decisions with business outcomes, navigating ambiguity, and driving cross-functional collaboration
  • Bachelor’s degree or higher in Computer Science, Engineering, or a related field
Job Responsibility
Job Responsibility
  • Design and deploy end to end AI and machine learning solutions including machine learning and graph-based models, natural language processing (NLP) models, and large language model (LLM) based AI agents
  • Build robust pipelines for data ingestion, feature engineering, model training, validation, and real-time or batch inference
  • Develop and integrate large language model (LLM) applications using techniques such as fine-tuning, retrieval-augmented generation, and reinforcement learning
  • Build autonomous agents capable of multi-step reasoning and tool use in production environments
  • Lead the full model and agent development lifecycle, from problem definition and data exploration through experimentation, implementation, deployment, and monitoring
  • Ensure solutions are scalable, reliable, and aligned with business goals
  • Advocate and implement machine learning operations (MLOps) best practices including data monitoring and tracing, error analysis, automated retraining, model and prompt versioning, business metrics monitoring, and incident response
  • Collaborate across disciplines and provide technical leadership, working with product managers, engineers, and researchers to deliver impactful solutions
  • Mentor team members, lead design reviews, and promote best practices in AI and machine learning systems development
What we offer
What we offer
  • medical
  • dental
  • vision
  • parental leave
  • paid time off
  • a 401(k) plan with employee and company contribution opportunities
  • life, disability, and accident insurance
  • a discounted employee stock purchase plan
  • tuition reimbursement
  • Fulltime
Read More
Arrow Right

Technical Lead

At Spectro Cloud, we are in search of a talented individual to become an integra...
Location
Location
United States , San Jose
Salary
Salary:
Not provided
spectrocloud.com Logo
Spectro Cloud
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or related technical field
  • 8+ years of software development experience (or 6+ years with a Master's degree)
  • Strong LLM/GenAI fundamentals: Solid understanding of large language models, prompt engineering, embeddings, vector search, RAG systems, and lightweight fine-tuning (LoRA/PEFT preferred)
  • Python expertise: Proficiency in Python and hands-on experience with AI/ML libraries such as Hugging Face, PyTorch, LangChain, LangGraph, FastAPI, or similar frameworks
  • LLM deployment experience: Familiarity with Kubernetes-based inference stacks including vLLM, llm-d, TensorRT, PyTorch Serve, or comparable deployment frameworks
  • Proficiency in at least one modern programming language such as Go, Java, or equivalent
  • Solid understanding of containerization and orchestration concepts, including Kubernetes
  • Deep understanding of microservices architecture and REST API design principles
  • Experience designing and building scalable, cloud-native applications
  • Analytical problem-solving: Ability to debug model outputs, improve retrieval accuracy, optimize latency, and iterate quickly through experiments
Job Responsibility
Job Responsibility
  • Building production-grade AI systems - designing, implementing, and maintaining LLM-powered applications, agentic AI workflows, and RAG pipelines across multiple product use-cases
  • Actively participate in guided technical labs covering prompt engineering, vector databases, LLM deployment tooling, multi-agent orchestration, fine-tuning strategies, and evaluation techniques
  • Develop, refine, and operationalize LLM solutions, including prompt design, retrieval strategies, embedding pipelines, LangChain/LangGraph workflows, and API integrations using Python, Hugging Face, FastAPI, and similar frameworks
  • Ensuring the seamless operation of our platform through a combination of automation, scripting, and rigorous testing
  • Stay ahead of emerging AI trends - small models, efficient inference (vLLM/TensorRT), multimodal systems, on-device LLMs - and recommend tools, frameworks, or integrations that enhance our platform
  • Work closely with cross-functional teams to create scalable, dependable, and secure solutions that push boundaries
  • Stay current with industry trends and emerging technologies, thereby ensuring that our solutions remain innovative and ahead of the curve
Read More
Arrow Right

AI / ML Engineer, Software Engineering

iCapital is seeking an experienced and forward-thinking AI/ML Engineer Vice Pres...
Location
Location
United States , New York
Salary
Salary:
180000.00 - 220000.00 USD / Year
icapital.com Logo
iCapital Network
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering, with at least 2+ years focused on AI/ML systems
  • Proven experience in building and deploying ML models in production environments
  • Hands-on experience with AI agent frameworks (e.g., LangChain, Semantic Kernel, AutoGen, or custom-built systems)
  • Strong understanding of the ML lifecycle, including data pipelines, model training, evaluation, deployment, and monitoring
  • Familiar with MLOps tools such as MLflow, Kubeflow, or SageMaker
  • Deep understanding of LLM orchestration, prompt engineering, tool use, and memory architectures
  • Familiar with various LLM inference engines such as vLLM or SGLang
  • Experience in integrating agents with APIs, databases, and external systems
  • Familiar with retrieval-augmented generation (RAG), vector databases, and knowledge graphs
  • Experience deploying AI systems in cloud environments (AWS, GCP, Azure) and utilizing containerization tools (Docker, Kubernetes)
Job Responsibility
Job Responsibility
  • Design, build, and optimize scalable AI/ML infrastructure and services powering intelligent features across our platform
  • Lead the development of AI agents capable of autonomous decision-making, task execution, and multi-step reasoning across internal and customer-facing applications
  • Architect and implement modular agent frameworks by integrating tools, APIs, and memory systems for dynamic and context-aware behavior
  • Collaborate with product, data, and infrastructure teams to embed AI capabilities into production systems
  • Drive the architecture and development of ML pipelines, model serving frameworks, and real-time inference systems
  • Evaluate and integrate state-of-the-art AI tools and frameworks to accelerate development and deployment
  • Provide technical mentorship and guidance to engineers, contributing to team growth and best practices
  • Partner with Data Science teams to operationalize models, ensuring a smooth transition from experimentation to production
  • Contribute to technical roadmaps and help define long-term AI/ML platform and agent strategy
  • Optimize agent performance for latency, reliability, and safety in production environments
What we offer
What we offer
  • Equity for all full-time employees
  • Annual performance bonus
  • Employer matched retirement plan
  • Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
  • Parental leave
  • Unlimited paid time off (PTO)
  • Fulltime
Read More
Arrow Right

Principal Engineer, Data Analytics Engineering

As a GenAI Solution Architect, you will design and implement enterprise-grade Ge...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
sandisk.com Logo
Sandisk
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Sciences, or related fields
  • 8–12+ years in technology roles, with at least 3–5 years in AI/ML solution architecture or enterprise AI implementation
  • Certifications in Cloud Architecture
  • Experience with Agentic frameworks
  • Excellent communication and stakeholder management skills
Job Responsibility
Job Responsibility
  • Design and implement GenAI workflows for enterprise use cases
  • Develop prompt engineering strategies and feedback loops for LLM optimization
  • Capture and normalize LLM interactions into reusable Knowledge Artifacts
  • Integrate GenAI systems into enterprise apps (APIs, microservices, workflow engines)
  • Architect and maintain Lakehouse environments for structured and unstructured data
  • Implement pipelines for document parsing, chunking, and vectorization
  • Maintain knowledge stores, indexing, metadata governance
  • Enable semantic search and retrieval using embeddings and vector databases
  • Build and maintain domain-specific ontologies and taxonomies
  • Establish taxonomy governance and versioning
  • Fulltime
Read More
Arrow Right

Applied AI Engineer

Soliton is a high-technology software company working with global customers acro...
Location
Location
India , Bangalore; Coimbatore
Salary
Salary:
Not provided
solitontech.com Logo
Soliton
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • Proven experience in AI/ML engineering and related technologies
  • 3+ years of experience building applications using Python and asynchronous programming
  • Experience working with SQL and NoSQL databases
  • Strong problem-solving skills and ability to work in a fast-paced environment
  • Excellent communication and teamwork skills
  • Experience building Generative AI applications using Python and FastAPI
  • Hands-on knowledge of LLM frameworks such as LangChain or LlamaIndex
  • Ability to work with unstructured data (PDFs, documents, chunking, search) and structured data
  • Experience designing RAG-based systems, including prompt engineering and retrieval optimization
Job Responsibility
Job Responsibility
  • Design, implement, and optimize Generative AI applications using Python and frameworks such as FastAPI
  • Build AI solutions using LLM frameworks like LlamaIndex and LangChain
  • Implement containerized deployments using Docker
  • Develop and optimize Retrieval-Augmented Generation (RAG) pipelines for improved information retrieval
  • Work with self-hosted and cloud-based vector databases for efficient search and retrieval
  • Design and manage knowledge graphs and graph-based RAG systems
  • Implement re-ranking models and retrieval optimization techniques
  • Apply prompt engineering and context engineering to enhance model performance
  • Establish guardrails to ensure safe, ethical, and compliant AI deployments
  • Build data preprocessing and transformation pipelines for structured and unstructured data
What we offer
What we offer
  • Flexible work hours
  • Special support for mothers
  • Profit sharing starting from the second year
  • Health insurance for employees and families
  • Gym and cycle allowance
  • Fulltime
Read More
Arrow Right

AI Systems Engineer - Agentic Autonomy

We are seeking an AI Systems Engineer with deep expertise in large language mode...
Location
Location
United States , Greater Boston
Salary
Salary:
140000.00 - 180000.00 USD / Year
havocai.com Logo
HavocAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, Robotics, or a related field
  • Deep hands-on experience building with LLMs and multi-agent/agentic AI frameworks
  • Strong software engineering background in modern ML frameworks, cloud orchestration, and API development
  • Experience integrating AI systems into larger software architectures or robotics/autonomy workflows
  • Understanding of RAG pipelines, tool-use frameworks, LLM function-calling, memory systems, and agent orchestration
  • Experience with safety evaluation, model alignment, or mission-critical AI system validation
  • Ability to lead system-level design discussions and coordinate across multiple engineering disciplines
  • Must be a U.S. Citizen and eligible to obtain a Secret Clearance
Job Responsibility
Job Responsibility
  • Lead the design and development of LLM-powered software modules for mission reasoning, planning, operator interaction, and autonomous decision support
  • Integrate LLMs and agentic systems into HavocAI’s autonomy architecture, including ROS/ROS2 systems, planning engines, and mission software
  • Build multi-agent, tool-using AI systems that interact with perception data, mission databases, simulation systems, and operator inputs
  • Develop APIs, wrappers, and orchestration layers enabling LLMs to interface safely with embedded, cloud, and edge compute environments
  • Optimize LLM inference pipelines for performance, latency, and reliability in field-deployed systems
  • Evaluate model behavior, perform safety testing, and develop guardrails for mission-critical use cases
  • Collaborate with autonomy, embedded, simulation, and full-stack teams to define requirements and ensure robust system-level integration
  • Guide strategic decisions on model selection, fine-tuning approaches, safety frameworks, and long-term AI architecture
  • Contribute to field testing, operator evaluations, and iterative deployment cycles for AI-augmented autonomy systems
What we offer
What we offer
  • 100% Employer paid Health, Dental and Vision Insurance for you and your families
  • Life Insurance (Employer Paid)
  • Ability to participate in the companies 401k program (Matching)
  • Unlimited PTO policy with an enforced 2 week minimum
  • Equity Package
  • Work / Home Office Stipend
  • Global Entry
  • 16 Week Paid Parental Leave
  • Monthly Health and Wellness Stipend
  • Fulltime
Read More
Arrow Right