LLM Inference Frameworks and Optimization Engineer Job at Together AI (San Francisco)

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...

Location

Canada; United States

Salary:

300000.00 - 450000.00 USD / Year

Apollo.io

Expiration Date

Until further notice

Requirements

10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency

Job Responsibility

Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
Ensure robust observability, monitoring, and safety guardrails for all AI systems in production

What we offer

Equity
Company bonus or sales commissions/bonuses
401(k) plan
At least 10 paid holidays per year
Flex PTO
Parental leave
Employee assistance program and wellbeing benefits
Global travel coverage
Life/AD&D/STD/LTD insurance
FSA/HSA

Fulltime

Senior Product Manager, AI Agents

This role owns AI research, messaging, and context—spanning both the user experi...

Location

United States

Salary:

187000.00 - 250000.00 USD / Year

Apollo.io

Expiration Date

Until further notice

Requirements

5+ years in product management
2+ years experience launching AI/ML new products and scaling existing products
Track record of shipping AI features that drove measurable business outcomes
Experience with LLM-powered applications, prompt engineering, evaluation frameworks, and model selection tradeoffs
Comfortable working in Python/SQL to analyze data, prototype prompts, and evaluate outputs
Understanding of LLM architectures, RAG pipelines, agent frameworks, and inference optimization
Obsession with quality over speed
GTM or sales tech experience (strongly preferred)
Familiarity with sales workflows, prospecting tools, or CRM systems
Understanding of why sales teams are skeptical of AI tools and what it takes to earn their trust

Job Responsibility

Develop and execute a strategic roadmap for AI research, messaging, and context capabilities
Enhance Apollo's AI research agents to surface actionable insights from the web
Define how AI understands each user's business
Own AI-powered messaging tools that create personalized, context-aware emails at scale
Build and scale evaluation infrastructure across accuracy, relevance, clarity, and tone
Partner with engineering, design, prompt writers, and sales to deliver cohesive AI experiences

What we offer

Equity
Company bonus or sales commissions/bonuses
401(k) plan
At least 10 paid holidays per year
Flex PTO
Parental leave
Employee assistance program and wellbeing benefits
Global travel coverage
Life/AD&D/STD/LTD insurance
FSA/HSA and medical, dental, and vision benefits

Fulltime

AI Software Engineer - NLP/LLM

At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s o...

Location

United States , New York

Salary:

159300.00 - 230850.00 USD / Year

Moody's

Expiration Date

Until further notice

Requirements

5+ years of demonstrated experience building production-grade machine learning systems with measurable impacts
expertise in NLP and search and recommendation systems is preferred
Hands-on experience with large language model (LLM) applications and AI agents, including retrieval-augmented generation, prompt optimization, fine-tuning, agent design, and evaluation methodologies
familiarity with prompt optimization frameworks like DSPy is preferred
Deep expertise in machine learning models and systems design, including classic models (e.g., XGBoost), modern deep learning and graph machine learning architectures (e.g., transformers-based models, graph neural networks (GNN)), and reinforcement learning systems
Proven ability to take models and agents from research to production, including optimization for latency and cost, implementation of monitoring and tracing, and development of reusable platforms or frameworks
Strong technical leadership and mentorship skills, with a track record of growing engineers, improving team velocity through automation, documentation, and tooling, and influencing architectural decisions without direct authority
Excellent communication and strategic thinking abilities, capable of aligning technical decisions with business outcomes, navigating ambiguity, and driving cross-functional collaboration
Bachelor’s degree or higher in Computer Science, Engineering, or a related field

Job Responsibility

Design and deploy end to end AI and machine learning solutions including machine learning and graph-based models, natural language processing (NLP) models, and large language model (LLM) based AI agents
Build robust pipelines for data ingestion, feature engineering, model training, validation, and real-time or batch inference
Develop and integrate large language model (LLM) applications using techniques such as fine-tuning, retrieval-augmented generation, and reinforcement learning
Build autonomous agents capable of multi-step reasoning and tool use in production environments
Lead the full model and agent development lifecycle, from problem definition and data exploration through experimentation, implementation, deployment, and monitoring
Ensure solutions are scalable, reliable, and aligned with business goals
Advocate and implement machine learning operations (MLOps) best practices including data monitoring and tracing, error analysis, automated retraining, model and prompt versioning, business metrics monitoring, and incident response
Collaborate across disciplines and provide technical leadership, working with product managers, engineers, and researchers to deliver impactful solutions
Mentor team members, lead design reviews, and promote best practices in AI and machine learning systems development

What we offer

medical
dental
vision
parental leave
paid time off
a 401(k) plan with employee and company contribution opportunities
life, disability, and accident insurance
a discounted employee stock purchase plan
tuition reimbursement

Fulltime

Technical Lead

At Spectro Cloud, we are in search of a talented individual to become an integra...

Location

United States , San Jose

Salary:

Not provided

Spectro Cloud

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science or related technical field
8+ years of software development experience (or 6+ years with a Master's degree)
Strong LLM/GenAI fundamentals: Solid understanding of large language models, prompt engineering, embeddings, vector search, RAG systems, and lightweight fine-tuning (LoRA/PEFT preferred)
Python expertise: Proficiency in Python and hands-on experience with AI/ML libraries such as Hugging Face, PyTorch, LangChain, LangGraph, FastAPI, or similar frameworks
LLM deployment experience: Familiarity with Kubernetes-based inference stacks including vLLM, llm-d, TensorRT, PyTorch Serve, or comparable deployment frameworks
Proficiency in at least one modern programming language such as Go, Java, or equivalent
Solid understanding of containerization and orchestration concepts, including Kubernetes
Deep understanding of microservices architecture and REST API design principles
Experience designing and building scalable, cloud-native applications
Analytical problem-solving: Ability to debug model outputs, improve retrieval accuracy, optimize latency, and iterate quickly through experiments

Job Responsibility

Building production-grade AI systems - designing, implementing, and maintaining LLM-powered applications, agentic AI workflows, and RAG pipelines across multiple product use-cases
Actively participate in guided technical labs covering prompt engineering, vector databases, LLM deployment tooling, multi-agent orchestration, fine-tuning strategies, and evaluation techniques
Develop, refine, and operationalize LLM solutions, including prompt design, retrieval strategies, embedding pipelines, LangChain/LangGraph workflows, and API integrations using Python, Hugging Face, FastAPI, and similar frameworks
Ensuring the seamless operation of our platform through a combination of automation, scripting, and rigorous testing
Stay ahead of emerging AI trends - small models, efficient inference (vLLM/TensorRT), multimodal systems, on-device LLMs - and recommend tools, frameworks, or integrations that enhance our platform
Work closely with cross-functional teams to create scalable, dependable, and secure solutions that push boundaries
Stay current with industry trends and emerging technologies, thereby ensuring that our solutions remain innovative and ahead of the curve

AI / ML Engineer, Software Engineering

iCapital is seeking an experienced and forward-thinking AI/ML Engineer Vice Pres...

Location

United States , New York

Salary:

180000.00 - 220000.00 USD / Year

iCapital Network

Expiration Date

Until further notice

Requirements

8+ years of experience in software engineering, with at least 2+ years focused on AI/ML systems
Proven experience in building and deploying ML models in production environments
Hands-on experience with AI agent frameworks (e.g., LangChain, Semantic Kernel, AutoGen, or custom-built systems)
Strong understanding of the ML lifecycle, including data pipelines, model training, evaluation, deployment, and monitoring
Familiar with MLOps tools such as MLflow, Kubeflow, or SageMaker
Deep understanding of LLM orchestration, prompt engineering, tool use, and memory architectures
Familiar with various LLM inference engines such as vLLM or SGLang
Experience in integrating agents with APIs, databases, and external systems
Familiar with retrieval-augmented generation (RAG), vector databases, and knowledge graphs
Experience deploying AI systems in cloud environments (AWS, GCP, Azure) and utilizing containerization tools (Docker, Kubernetes)

Job Responsibility

Design, build, and optimize scalable AI/ML infrastructure and services powering intelligent features across our platform
Lead the development of AI agents capable of autonomous decision-making, task execution, and multi-step reasoning across internal and customer-facing applications
Architect and implement modular agent frameworks by integrating tools, APIs, and memory systems for dynamic and context-aware behavior
Collaborate with product, data, and infrastructure teams to embed AI capabilities into production systems
Drive the architecture and development of ML pipelines, model serving frameworks, and real-time inference systems
Evaluate and integrate state-of-the-art AI tools and frameworks to accelerate development and deployment
Provide technical mentorship and guidance to engineers, contributing to team growth and best practices
Partner with Data Science teams to operationalize models, ensuring a smooth transition from experimentation to production
Contribute to technical roadmaps and help define long-term AI/ML platform and agent strategy
Optimize agent performance for latency, reliability, and safety in production environments

What we offer

Equity for all full-time employees
Annual performance bonus
Employer matched retirement plan
Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
Parental leave
Unlimited paid time off (PTO)

Fulltime

Principal Engineer, Data Analytics Engineering

As a GenAI Solution Architect, you will design and implement enterprise-grade Ge...

Location

India , Bengaluru

Salary:

Not provided

Sandisk

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Sciences, or related fields
8–12+ years in technology roles, with at least 3–5 years in AI/ML solution architecture or enterprise AI implementation
Certifications in Cloud Architecture
Experience with Agentic frameworks
Excellent communication and stakeholder management skills

Job Responsibility

Design and implement GenAI workflows for enterprise use cases
Develop prompt engineering strategies and feedback loops for LLM optimization
Capture and normalize LLM interactions into reusable Knowledge Artifacts
Integrate GenAI systems into enterprise apps (APIs, microservices, workflow engines)
Architect and maintain Lakehouse environments for structured and unstructured data
Implement pipelines for document parsing, chunking, and vectorization
Maintain knowledge stores, indexing, metadata governance
Enable semantic search and retrieval using embeddings and vector databases
Build and maintain domain-specific ontologies and taxonomies
Establish taxonomy governance and versioning

Fulltime

Applied AI Engineer

Soliton is a high-technology software company working with global customers acro...

Location

India , Bangalore; Coimbatore

Salary:

Not provided

Soliton

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
Proven experience in AI/ML engineering and related technologies
3+ years of experience building applications using Python and asynchronous programming
Experience working with SQL and NoSQL databases
Strong problem-solving skills and ability to work in a fast-paced environment
Excellent communication and teamwork skills
Experience building Generative AI applications using Python and FastAPI
Hands-on knowledge of LLM frameworks such as LangChain or LlamaIndex
Ability to work with unstructured data (PDFs, documents, chunking, search) and structured data
Experience designing RAG-based systems, including prompt engineering and retrieval optimization

Job Responsibility

Design, implement, and optimize Generative AI applications using Python and frameworks such as FastAPI
Build AI solutions using LLM frameworks like LlamaIndex and LangChain
Implement containerized deployments using Docker
Develop and optimize Retrieval-Augmented Generation (RAG) pipelines for improved information retrieval
Work with self-hosted and cloud-based vector databases for efficient search and retrieval
Design and manage knowledge graphs and graph-based RAG systems
Implement re-ranking models and retrieval optimization techniques
Apply prompt engineering and context engineering to enhance model performance
Establish guardrails to ensure safe, ethical, and compliant AI deployments
Build data preprocessing and transformation pipelines for structured and unstructured data

What we offer

Flexible work hours
Special support for mothers
Profit sharing starting from the second year
Health insurance for employees and families
Gym and cycle allowance

Fulltime

AI Systems Engineer - Agentic Autonomy

We are seeking an AI Systems Engineer with deep expertise in large language mode...

Location

United States , Greater Boston

Salary:

140000.00 - 180000.00 USD / Year

HavocAI

Expiration Date

Until further notice

Requirements

Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, Robotics, or a related field
Deep hands-on experience building with LLMs and multi-agent/agentic AI frameworks
Strong software engineering background in modern ML frameworks, cloud orchestration, and API development
Experience integrating AI systems into larger software architectures or robotics/autonomy workflows
Understanding of RAG pipelines, tool-use frameworks, LLM function-calling, memory systems, and agent orchestration
Experience with safety evaluation, model alignment, or mission-critical AI system validation
Ability to lead system-level design discussions and coordinate across multiple engineering disciplines
Must be a U.S. Citizen and eligible to obtain a Secret Clearance

Job Responsibility

Lead the design and development of LLM-powered software modules for mission reasoning, planning, operator interaction, and autonomous decision support
Integrate LLMs and agentic systems into HavocAI’s autonomy architecture, including ROS/ROS2 systems, planning engines, and mission software
Build multi-agent, tool-using AI systems that interact with perception data, mission databases, simulation systems, and operator inputs
Develop APIs, wrappers, and orchestration layers enabling LLMs to interface safely with embedded, cloud, and edge compute environments
Optimize LLM inference pipelines for performance, latency, and reliability in field-deployed systems
Evaluate model behavior, perform safety testing, and develop guardrails for mission-critical use cases
Collaborate with autonomy, embedded, simulation, and full-stack teams to define requirements and ensure robust system-level integration
Guide strategic decisions on model selection, fine-tuning approaches, safety frameworks, and long-term AI architecture
Contribute to field testing, operator evaluations, and iterative deployment cycles for AI-augmented autonomy systems

What we offer

100% Employer paid Health, Dental and Vision Insurance for you and your families
Life Insurance (Employer Paid)
Ability to participate in the companies 401k program (Matching)
Unlimited PTO policy with an enforced 2 week minimum
Equity Package
Work / Home Office Stipend
Global Entry
16 Week Paid Parental Leave
Monthly Health and Wellness Stipend

Fulltime

LLM Inference Frameworks and Optimization Engineer

Together AI

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for LLM Inference Frameworks and Optimization Engineer

Director of AI Engineering

Senior Product Manager, AI Agents

AI Software Engineer - NLP/LLM

Technical Lead

AI / ML Engineer, Software Engineering

Principal Engineer, Data Analytics Engineering

Applied AI Engineer

AI Systems Engineer - Agentic Autonomy

LLM Inference Frameworks and Optimization Engineer

Together AI

Location:United States , San Francisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for LLM Inference Frameworks and Optimization Engineer

Director of AI Engineering

Senior Product Manager, AI Agents

AI Software Engineer - NLP/LLM

Technical Lead

AI / ML Engineer, Software Engineering

Principal Engineer, Data Analytics Engineering

Applied AI Engineer

AI Systems Engineer - Agentic Autonomy

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 18, 2026