This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)
Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server
Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Setup and operation of AI inference service monitoring for performance and availability
Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc
Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc
Managing scalable infrastructure for deploying and managing LLMs
Deploying models in production environments, including containerization, microservices, and API design
Triton Inference Server, including its architecture, configuration, and deployment
Model Optimization techniques using Triton with TRTLLM
Model optimization techniques, including pruning, quantization, and knowledge distillation