This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Design, implement, and optimize end-to-end ML training workflows including infrastructure setup, orchestration, fine-tuning, deployment, and monitoring
Evaluate and integrate multi-cloud and single-cloud training options across AWS and other major platforms
Lead cluster configuration, orchestration design, environment customization, and scaling strategies
Compare and recommend hardware options (GPUs, TPUs, accelerators) based on performance, cost, and availability
Requirements:
Experience with cloud-based platforms (AWS, Azure), API integrations, and data models
Exposure to AI/ML-enabled platforms or decision-intelligence systems
Deep understanding of hardware architectures for AI workloads (NVIDIA, AMD, Intel Habana, TPU)
Expert knowledge of inference optimization techniques including speculative decoding, KV cache optimization (MQA/GQA/PagedAttention), and dynamic batching
Deep understanding of prefill vs decode phases, memory-bound vs compute-bound operations
Experience with quantization methods (INT4/INT8, GPTQ, AWQ) and model parallelism strategies
Hands-on experience with production inference engines: vLLM, TensorRT-LLM, DeepSpeed-Inference, or TGI
Proficiency with serving frameworks: Triton Inference Server, KServe, or Ray Serve
Familiarity with kernel optimization libraries (FlashAttention, xFormers)
Proven ability to optimize inference metrics: TTFT (first token latency), ITL (inter-token latency), and throughput
Experience profiling and resolving GPU memory bottlenecks and OOM issues
Knowledge of hardware-specific optimizations for modern GPU architectures (A100/H100)
Drive end-to-end fine-tuning of LLMs, including model selection, dataset preparation/cleaning, tokenization, and evaluation with baseline metrics
Configure and execute fine-tuning experiments (LoRA, QLoRA, etc.) on large-scale compute setups, ensuring optimal hyperparameter tuning, logging, and checkpointing
Document fine-tuning outcomes by capturing performance metrics (losses, BERT/ROUGE scores, training time, resource utilization) and benchmark against baseline models
What we offer:
Opportunities for continuous learning and certification support
Collaborative and growth-oriented work culture
Competitive compensation and comprehensive benefits
Exposure to modern cloud and integration technologies