CrawlJobs Logo

LLMOps Engineer

Thrive Career Wellness Inc

Location Icon

Location:
Canada , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

140000.00 - 160000.00 CAD / Year

Job Description:

We are seeking an experienced and highly skilled LLMOps Engineer to join our team at Thrive. This newly created role will be responsible for deploying, optimizing, and scaling large language model (LLM) applications across our platform. The successful candidate will own the operational backbone of our AI-driven products, ensuring performance, reliability, and cost-efficiency while collaborating closely with our AI and engineering teams. If you are someone who thrives in fast-paced environments, enjoys building scalable AI infrastructure, and is excited about shaping the future of LLM capabilities at Thrive, this is the role for you.

Job Responsibility:

  • Lead LLM infrastructure efforts across multiple engineering teams, ensuring scalable, secure, and efficient delivery of AI-powered features
  • Design, build, and maintain production-grade systems for deploying and managing LLMs, including versioning, A/B testing, and rollback strategies
  • Collaborate with the AI team to implement prompt management systems, prompt versioning, and token optimization strategies
  • Monitor and optimize inference latency, throughput, caching strategies, and multi-provider cost management (OpenAI, Anthropic, AWS Bedrock, etc.)
  • Develop observability pipelines including quality metrics, evaluation workflows, error monitoring, and user feedback loops
  • Implement and maintain Retrieval-Augmented Generation (RAG) systems, embedding pipelines, and vector database operations
  • Support fine-tuning workflows and manage model registries for both proprietary and open-source models
  • Implement AI safety guardrails, content filtering, and compliance measures to ensure responsible deployment
  • Support general DevOps initiatives ~10% of the time, including CI/CD improvements and cloud infrastructure updates
  • Maintain thorough documentation of all LLM infrastructure, processes, and best practices

Requirements:

  • 3+ years of experience in LLMOps, MLOps, or similar production-focused AI/ML roles
  • Strong Python programming skills and familiarity with LLM libraries and frameworks
  • Hands-on experience with LLM providers (OpenAI, Anthropic, AWS Bedrock, Azure, Vertex, Databricks)
  • Experience with vector databases such as Pinecone, Weaviate, Qdrant, or Chroma
  • Knowledge of model serving tools (vLLM, TGI, Ray Serve)
  • Proficiency with Docker, Kubernetes, and cloud environments (AWS preferred)
  • Familiarity with prompt engineering, token optimization, chain-of-thought approaches, and evaluation metrics
  • Experience with LLM-specific tooling (LangSmith, Weights & Biases, Phoenix, MLflow)
  • Ability to troubleshoot LLM issues such as latency improvements, hallucination mitigation, and context window strategies
  • Strong communication skills with both technical and non-technical stakeholders

Nice to have:

  • Experience with open-source LLMs (Llama, Mistral, etc.)
  • Knowledge of advanced RAG techniques including hybrid search and re-ranking
  • Exposure to agent frameworks and real-time LLM applications
  • Background in traditional MLOps, data engineering, or multimodal models
  • Experience with Ruby on Rails
  • Understanding of AI safety and alignment principles
What we offer:
  • 3 weeks paid vacation + 1-week holiday shutdown
  • Health insurance & wellness coverage
  • Yearly Learning & Development Allowance
  • Yearly Workspace Allowance

Additional Information:

Job Posted:
February 24, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLMOps Engineer

New

AI Engineer

Guidepoint seeks an experienced AI Engineer as an integral member of the Toronto...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
modoras.com Logo
Modoras Accounting Syd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field with 6+ years of professional experience
  • or a Master’s degree with 4+ years of professional experience in backend software engineering and Generative AI
  • Proven track record of designing, building, and scaling distributed, production-grade systems
  • Deep expertise in Python, a major backend framework (e.g., FastAPI, Flask), and asynchronous programming (e.g., asyncio)
  • Proficiency in designing RESTful APIs, microservices, and the complete operational lifecycle, including comprehensive testing, CI/CD (e.g., ArgoCD), observability, monitoring, alerting, maintaining high uptime, and executing zero-downtime deployments
  • Hands-on experience deploying and managing applications on a major cloud platform (Azure preferred, AWS/GCP acceptable) using containerization (Docker) and orchestration (Kubernetes, Helm)
  • 2+ years of experience building applications that leverage large language models from providers like OpenAI, Anthropic, or Google Gemini
  • Direct experience with modern LLM patterns such as retrieval-augmented generation (RAG), hybrid search using vector databases (e.g., Pinecone, Elasticsearch), multi-agent AI systems with tool calls, and prompt engineering
  • Experience designing and implementing robust evaluation frameworks for LLM-based systems, including rubric-based scoring, LLM Judges, or using tools like MLflow, alongside monitoring for performance and drift
  • Familiarity with large-scale data processing platforms and tools (e.g., Databricks, Apache Spark)
Job Responsibility
Job Responsibility
  • Architect and Build Production Systems: Design, build, and operate scalable, low-latency backend services and APIs that serve Generative AI features, from retrieval-augmented generation (RAG) pipelines to complex agentic systems
  • Own the AI Application Lifecycle: Own the end-to-end lifecycle of AI-powered applications, including system design, development, deployment (CI/CD), monitoring, and optimization in production environments like Databricks and Azure Kubernetes Service (AKS)
  • Optimize RAG Pipelines: Continuously improve retrieval and generation quality through techniques like retrieval optimization (tuning k-values, chunk sizes), using re-rankers, advanced chunking strategies, and prompt engineering for hallucination reduction
  • Integrate Intelligent Systems: Engineer solutions that seamlessly combine LLMs with our proprietary knowledge repositories, external APIs, and real-time data streams to create powerful copilots and research assistants
  • Champion LLMOps and Engineering Best Practices: Collaborate with data science and engineering teams to establish and implement best practices for LLMOps, including automated evaluation using frameworks like LLM Judges or MLflow, AI observability, and system monitoring
  • Evaluate and Implement AI Strategies: Systematically evaluate and apply advanced prompt engineering methods (e.g., Chain-of-Thought, ReAct) and other model interaction techniques to optimize the performance and safety of proprietary and open-source LLMs
  • Mentor and Lead: Provide technical leadership to junior engineers through rigorous code reviews, mentorship, and design discussions, helping to elevate the team's engineering standards
  • Influence the Roadmap: Partner closely with product and business stakeholders to translate user needs into technical requirements, define priorities, and shape the future of our AI product offerings
What we offer
What we offer
  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform
Read More
Arrow Right
New

Data/AI Engineer

Guidepoint seeks an experienced Data/AI Engineer as an integral member of the To...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
modoras.com Logo
Modoras Accounting Syd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field with 6+ years of professional experience
  • or a Master’s degree with 4+ years of professional experience in backend software engineering and Generative AI
  • Proven track record of designing, building, and scaling distributed, production-grade systems
  • Deep expertise in Python, a major backend framework (e.g., FastAPI, Flask), and asynchronous programming (e.g., asyncio)
  • Proficiency in designing RESTful APIs, microservices, and the complete operational lifecycle, including comprehensive testing, CI/CD (e.g., ArgoCD), observability, monitoring, alerting, maintaining high uptime, and executing zero-downtime deployments
  • Hands-on experience deploying and managing applications on a major cloud platform (Azure preferred, AWS/GCP acceptable) using containerization (Docker) and orchestration (Kubernetes, Helm)
  • 2+ years of experience building applications that leverage large language models from providers like OpenAI, Anthropic, or Google Gemini
  • Direct experience with modern LLM patterns such as retrieval-augmented generation (RAG), hybrid search using vector databases (e.g., Pinecone, Elasticsearch), multi-agent AI systems with tool calls, and prompt engineering
  • Experience designing and implementing robust evaluation frameworks for LLM-based systems, including rubric-based scoring, LLM Judges, or using tools like MLflow, alongside monitoring for performance and drift
  • Familiarity with large-scale data processing platforms and tools (e.g., Databricks, Apache Spark)
Job Responsibility
Job Responsibility
  • Architect and Build Production Systems: Design, build, and operate scalable, low-latency backend services and APIs that serve Generative AI features, from retrieval-augmented generation (RAG) pipelines to complex agentic systems
  • Own the AI Application Lifecycle: Own the end-to-end lifecycle of AI-powered applications, including system design, development, deployment (CI/CD), monitoring, and optimization in production environments like Databricks and Azure Kubernetes Service (AKS)
  • Optimize RAG Pipelines: Continuously improve retrieval and generation quality through techniques like retrieval optimization (tuning k-values, chunk sizes), using re-rankers, advanced chunking strategies, and prompt engineering for hallucination reduction
  • Integrate Intelligent Systems: Engineer solutions that seamlessly combine LLMs with our proprietary knowledge repositories, external APIs, and real-time data streams to create powerful copilots and research assistants
  • Champion LLMOps and Engineering Best Practices: Collaborate with data science and engineering teams to establish and implement best practices for LLMOps, including automated evaluation using frameworks like LLM Judges or MLflow, AI observability, and system monitoring
  • Evaluate and Implement AI Strategies: Systematically evaluate and apply advanced prompt engineering methods (e.g., Chain-of-Thought, ReAct) and other model interaction techniques to optimize the performance and safety of proprietary and open-source LLMs
  • Mentor and Lead: Provide technical leadership to junior engineers through rigorous code reviews, mentorship, and design discussions, helping to elevate the team's engineering standards
  • Influence the Roadmap: Partner closely with product and business stakeholders to translate user needs into technical requirements, define priorities, and shape the future of our AI product offerings
What we offer
What we offer
  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform
Read More
Arrow Right

Senior LLMOps Engineer

Working closely with our Engineering Manager, you’ll be a Senior LLMOps Engineer...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
heidihealth.com Logo
Heidi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of designing, building, and maintaining MLOps or LLMOps infrastructure in a production environment
  • Previous hands-on experience building scalable, cloud-native infrastructure and platforms
  • Deployed and managed large-scale machine learning models in a production environment
  • Expert in Python, cloud platforms (AWS, GCP, or Azure), containerization (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, CloudFormation)
  • Deep and practical understanding of the entire machine learning lifecycle and the specific operational challenges of large language models
  • Ability to translate complex engineering and research requirements into concrete, robust, and automated platform solutions
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Lead the architecture, design, and implementation of our end-to-end LLMOps platform, from data ingestion and model training pipelines to production deployment and monitoring
  • Build and maintain robust CI/CD/CT (Continuous Integration/Continuous Delivery/Continuous Training) pipelines to automate the testing, validation, and deployment of large language models
  • Engineer highly available and scalable model serving solutions using modern infrastructure like Kubernetes, ensuring low latency and high throughput for our production services
  • Collaborate closely with AI research and engineering teams to understand their needs, streamline workflows, and create the tooling that accelerates their development cycles
  • Champion and implement best practices for model versioning, experiment tracking, monitoring, and governance across the organization
  • Mentor mid-level and junior engineers, sharing your deep expertise in infrastructure, automation, and operational excellence to foster a culture of reliability and scalability
What we offer
What we offer
  • Flexible hybrid working environment, with 3 days in the office
  • Additional paid day off for your birthday and wellness days
  • Special corporate rates at Anytime Fitness in Melbourne, Sydney tbc
  • A generous personal development budget of $500 per annum
  • Learn from some of the best engineers and creatives, joining a diverse team
  • Become an owner, with shares (equity) in the company
  • Fulltime
Read More
Arrow Right

LLM Engineer

You will join our global Machine Learning and Data Science unit — a core team of...
Location
Location
Spain , Barcelona
Salary
Salary:
Not provided
gipo.it Logo
Gipo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least one year of professional experience in LLM development or integration in a fast-paced, product-driven tech environment
  • Demonstrated expertise in production-grade LLM deployments, including prompt management systems, vector databases, semantic search implementation, and API integration with foundation models
  • Good understanding of transformer architectures and proficiency in LLM frameworks such as LangChain, LlamaIndex, or similar tools
  • Proficiency in Python
  • Experience in collaborative project development
  • Appreciation for good engineering practices and maintainable code
  • Proven experience in evaluating LLMs through systematic testing, benchmark design, and the development of custom metrics (e.g. accuracy, consistency, factuality, and bias), with a focus on aligning results to product and user needs
  • Proven ability to integrate, deploy, and optimize large language models in production-grade industry environments, ensuring scalability and robust performance
  • Strong knowledge in prompt engineering, agent-based workflows, and the generation and manipulation of embeddings
  • Experience with RAG (Retrieval-Augmented Generation) techniques, vector similarity search, and information retrieval methods to enhance LLM capabilities
Job Responsibility
Job Responsibility
  • Work closely with cross-functional teams, including scientists, engineers, and product stakeholders, to deliver LLM-driven initiatives that directly contribute to business objectives
  • Design, deploy and iterate over LLM services for text-based applications (and beyond), while proactively identifying and eliminating performance bottlenecks
  • Build small to medium-sized Python projects and collaborate with engineers on production code and deployments at scale
  • Assess platform engineering and LLMOps bottlenecks
  • research and design scalable prompt management strategies, and recommend solutions that balance performance, cost, and reliability
  • Research, architect, and deploy LLM-powered information retrieval solutions (e.g., RAG) to deliver accurate results in complex, multilingual product environments
  • Partner with the AI Platform team to refine LLMOps best practices, evolve frameworks, and establish efficient, scalable workflows
What we offer
What we offer
  • Flexible remuneration and benefits system via Flexoh, which includes: restaurant card, transportation card, kindergarten, and training tax savings
  • Share options plan after 6 months of working with us
  • Remote or hybrid work model with our hub in Barcelona
  • Flexible working hours (fully flexible, as in most cases you only have to be on a couple of meetings weekly)
  • Summer intensive schedule during July and August (work 7 hours, finish earlier)
  • 23 paid holidays, with exchangeable local bank holidays
  • Additional paid holiday on your birthday or work anniversary (you choose what you want to celebrate)
  • Private healthcare plan with Adeslas for you and subsidized for your family (medical and dental)
  • Access to hundreds of gyms for a symbolic fee in partnership for you and your family with Wellhub
  • Access to iFeel, a technological platform for mental wellness offering online psychological support and counseling
  • Fulltime
Read More
Arrow Right

LLM Engineer

You will join our global Machine Learning and Data Science unit — a core team of...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
gipo.it Logo
Gipo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least one year of professional experience in LLM development or integration in a fast-paced, product-driven tech environment
  • Demonstrated expertise in production-grade LLM deployments, including prompt management systems, vector databases, semantic search implementation, and API integration with foundation models
  • Good understanding of transformer architectures and proficiency in LLM frameworks such as LangChain, LlamaIndex, or similar tools
  • Proficiency in Python
  • Experience in collaborative project development
  • Appreciation for good engineering practices and maintainable code
  • Proven experience in evaluating LLMs through systematic testing, benchmark design, and the development of custom metrics (e.g. accuracy, consistency, factuality, and bias), with a focus on aligning results to product and user needs
  • Proven ability to integrate, deploy, and optimize large language models in production-grade industry environments, ensuring scalability and robust performance
  • Strong knowledge in prompt engineering, agent-based workflows, and the generation and manipulation of embeddings
  • Experience with RAG (Retrieval-Augmented Generation) techniques, vector similarity search, and information retrieval methods to enhance LLM capabilities
Job Responsibility
Job Responsibility
  • Work closely with cross-functional teams, including scientists, engineers, and product stakeholders, to deliver LLM-driven initiatives that directly contribute to business objectives
  • Design, deploy and iterate over LLM services for text-based applications (and beyond), while proactively identifying and eliminating performance bottlenecks
  • Build small to medium-sized Python projects and collaborate with engineers on production code and deployments at scale
  • Assess platform engineering and LLMOps bottlenecks
  • research and design scalable prompt management strategies, and recommend solutions that balance performance, cost, and reliability
  • Research, architect, and deploy LLM-powered information retrieval solutions (e.g., RAG) to deliver accurate results in complex, multilingual product environments
  • Partner with the AI Platform team to refine LLMOps best practices, evolve frameworks, and establish efficient, scalable workflows
What we offer
What we offer
  • Share options plan after 6 months of working with us
  • Remote or hybrid work model with or hub in Warsaw
  • Flexible working hours (fully flexible, as in most cases you only have to be on a couple of meetings weekly)
  • 20/26 days of paid time off (depending on your contract)
  • Additional paid day off on your birthday or work anniversary (you choose what you want to celebrate)
  • Private healthcare plan with Signal Iduna for you and subsidized for your family
  • Multisport card co-financing for you to have access to sports facilities across Poland
  • Access to iFeel, a technological platform for mental wellness offering online psychological support and counseling
  • Free English classes
  • Fulltime
Read More
Arrow Right
New

Senior Manager, Engineering, AI & ML Infrastructure

As the Sr. Engineering Manager of our AI/ML Platform team, you will be central t...
Location
Location
United States , San Francisco
Salary
Salary:
179100.00 - 240405.00 USD / Year
springhealth.com Logo
Spring Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven Leadership: 2-4+ years in a formal engineering management role
  • direct experience leading teams of 4+ engineers
  • a history of productionizing successful AI/ML platforms and solutions
  • LLM Operations Expertise: 1+ years of experience iteratively building AI-empowered tools and ensuring they are operating safely and at scale
  • hands-on experience with the modern AI stack, including orchestration frameworks like LangGraph, observability tools like LangSmith, and best practices for prompt engineering and building safety guardrails
  • Machine Learning Expertise: 5+ years of experience in software or machine learning engineering, with a background as a Senior MLE, SRE, or DevOps Engineer working on ML infrastructure
  • hands-on experience building, evaluating, and deploying machine learning models
  • Technical Proficiency: Strong understanding of the modern AI/ML stack, including cloud services (AWS, GCP, Azure), container orchestration (Kubernetes), IaC (Terraform), and CI/CD systems
  • proficient in Python
  • experience with LLM tools like LangGraph and LangSmith
Job Responsibility
Job Responsibility
  • Provide Technical Leadership: Guide the team through complex architectural decisions across the full AI/ML stack
  • Champion AI Trust & Safety: Work in close partnership with our AI Trust team to translate principles like clinical norms, fairness, and transparency into concrete technical controls and guardrails
  • Drive Operational Excellence: Improve our MLOps and LLMOps capabilities
  • establish robust, automated monitoring for model performance, latency, and cost
  • define SLOs for platform components
  • build CI/CD pipelines
  • Execute on Strategy and Drive Alignment: Break down large initiatives into clear, phased roadmaps
  • be a key partner for your product manager
  • Manage Stakeholders and Communicate Progress: Build strong relationships and manage dependencies across the organization
  • track and communicate KPI-focused metrics
What we offer
What we offer
  • Health, Dental, Vision benefits start on your first day
  • access to One Medical accounts
  • HSA and FSA plans are also available, with Spring contributing up to $1K for HSAs
  • Employer sponsored 401(k) match of up to 2%
  • A yearly allotment of no cost visits to the Spring Health network of therapists, coaches, and medication management providers for you and your dependents
  • competitive paid time off policies including vacation, sick leave and company holidays
  • parental leave of 18 weeks for birthing parents and 16 weeks for non-birthing parents at 6 months tenure
  • Access to Noom, a weight management program
  • Access to fertility care support through Carrot, in addition to $4,000 reimbursement for related fertility expenses
  • Access to Wellhub
  • Fulltime
Read More
Arrow Right

AI & Machine learning Engineer

We’re hiring an AI & Machine Learning Engineer to help design and deliver next‑g...
Location
Location
Spain , Barcelona
Salary
Salary:
Not provided
fsp.co Logo
FSP
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in machine learning, AI, data science, or software development, with recent focus on GenAI and LLMs
  • Experience with GenAI frameworks (e.g. Azure Foundry, CrewAI and Hugging Face)
  • Proficient in context engineering, RAG, and LLMOps
  • Experience deploying ML/AI solutions on Azure (Azure OpenAI, Azure AI Foundry, Azure ML Studio)
  • Experience with Azure data and analytics services (Data Factory, Data Lake, Synapse Analytics, SQL Database)
  • Programming skills in Python, R, or similar languages
  • Familiarity with ML frameworks and libraries (TensorFlow, PyTorch, Scikit-learn)
  • Experience with Azure DevOps, GitHub, or similar tools
  • Experience in Computer Vision for Optical Character Recognition (OCR) and object recognition
Job Responsibility
Job Responsibility
  • Deploy, fine-tune and monitor Generative AI models and Agentic AI for enterprise use cases
  • Develop and implement Retrieval-Augmented Generation (RAG) pipelines and advanced context engineering strategies
  • Integrate Agentic AI into business workflows
  • Collaborate with data engineers to bring Agentic capabilities to production
  • Stay current with AI trends, tools, and best practices, and drive innovation within the team
What we offer
What we offer
  • A collaborative and supportive environment in which you can grow and develop your career
  • The tools and opportunity to do work you can be proud of
  • A chance to work alongside some of the best people in the industry, who always seek to share their knowledge and experience
  • Hybrid working – we empower you to make smart choices about when and where to work to achieve great results
  • Industry leading coaching and mentoring
  • Competitive salary and an excellent benefits package
  • Fulltime
Read More
Arrow Right

AI & Machine learning Engineer

We’re hiring an AI & Machine Learning Engineer to help design and deliver next‑g...
Location
Location
United Kingdom , Glasgow or Reading, Berkshire
Salary
Salary:
Not provided
fsp.co Logo
FSP
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in machine learning, AI, data science, or software development, with recent focus on GenAI and LLMs
  • Experience with GenAI frameworks (e.g. Azure Foundry, CrewAI and Hugging Face)
  • Proficient in context engineering, RAG, and LLMOps
  • Experience deploying ML/AI solutions on Azure (Azure OpenAI, Azure AI Foundry, Azure ML Studio)
  • Experience with Azure data and analytics services (Data Factory, Data Lake, Synapse Analytics, SQL Database)
  • Programming skills in Python, R, or similar languages
  • Familiarity with ML frameworks and libraries (TensorFlow, PyTorch, Scikit-learn)
  • Experience with Azure DevOps, GitHub, or similar tools
  • Experience in Computer Vision for Optical Character Recognition (OCR) and object recognition
  • Strong alignment with FSP values and ethos
Job Responsibility
Job Responsibility
  • Deploy, fine-tune and monitor Generative AI models and Agentic AI for enterprise use cases
  • Develop and implement Retrieval-Augmented Generation (RAG) pipelines and advanced context engineering strategies
  • Integrate Agentic AI into business workflows
  • Collaborate with data engineers to bring Agentic capabilities to production
  • Stay current with AI trends, tools, and best practices, and drive innovation within the team
What we offer
What we offer
  • A collaborative and supportive environment in which you can grow and develop your career
  • The tools and opportunity to do work you can be proud of
  • A chance to work alongside some of the best people in the industry, who always seek to share their knowledge and experience
  • Hybrid working – we empower you to make smart choices about when and where to work to achieve great results
  • Industry leading coaching and mentoring
  • Competitive salary and an excellent benefits package
  • Fulltime
Read More
Arrow Right