This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
At Helsing, we are pioneering the future of autonomous decision-making for defence. Our work spans the full AI landscape, including high-volume data processing, RL agent training, and large-scale foundation models. As a member of a cross-functional team, you will architect and implement the tools and platforms that enable these breakthroughs. Your focus will be on abstracting complex distributed systems to maximise training throughput and developer velocity.
Job Responsibility:
Extend our highly integrated deep learning frameworks (built on top of PyTorch), making them efficient and easy to use for a wide range of use cases
Scale our current infrastructure and tooling stack to support faster and larger distributed training
Design data strategy to support large scale datasets and efficient storage, ensuring GPUs stay warm
Requirements:
Hold an MSc or PhD in Computer Science or STEM field, with a focus on Machine Learning and Deep Learning
Have strong software engineering skills in Python and fluency with modern DL frameworks (PyTorch/JAX/TensorFlow)
Are a clear communicator who can build from complex theoretical concepts and contribute to the company's internal engineering culture
Have a "first-principles" mindset
You have debugged production ML pipelines
Nice to have:
You have hands-on experience training models on large-scale GPU clusters, implementing advanced parallelism strategies, and understanding the underlying cross-node communication patterns (NCCL, MPI)
You have worked with large-scale datasets made of different modalities
You are proficient with workload orchestrators like Slurm, Kubernetes, or Ray
You understand GPU architecture at a low level: memory hierarchies, warp execution, and what makes a GPU suited for training versus inference workloads