This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking an experienced and highly skilled LLMOps Engineer to join our team at Thrive. This newly created role will be responsible for deploying, optimizing, and scaling large language model (LLM) applications across our platform. The successful candidate will own the operational backbone of our AI-driven products, ensuring performance, reliability, and cost-efficiency while collaborating closely with our AI and engineering teams. If you are someone who thrives in fast-paced environments, enjoys building scalable AI infrastructure, and is excited about shaping the future of LLM capabilities at Thrive, this is the role for you.
Job Responsibility:
Lead LLM infrastructure efforts across multiple engineering teams, ensuring scalable, secure, and efficient delivery of AI-powered features
Design, build, and maintain production-grade systems for deploying and managing LLMs, including versioning, A/B testing, and rollback strategies
Collaborate with the AI team to implement prompt management systems, prompt versioning, and token optimization strategies
Monitor and optimize inference latency, throughput, caching strategies, and multi-provider cost management (OpenAI, Anthropic, AWS Bedrock, etc.)
Develop observability pipelines including quality metrics, evaluation workflows, error monitoring, and user feedback loops
Implement and maintain Retrieval-Augmented Generation (RAG) systems, embedding pipelines, and vector database operations
Support fine-tuning workflows and manage model registries for both proprietary and open-source models
Implement AI safety guardrails, content filtering, and compliance measures to ensure responsible deployment
Support general DevOps initiatives ~10% of the time, including CI/CD improvements and cloud infrastructure updates
Maintain thorough documentation of all LLM infrastructure, processes, and best practices
Requirements:
3+ years of experience in LLMOps, MLOps, or similar production-focused AI/ML roles
Strong Python programming skills and familiarity with LLM libraries and frameworks