This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Lead a high-performing team focused on building large-scale distributed training infrastructure and workflows using cutting-edge technologies for digital pathology, powering our state-of-the-art Foundational Model development. This is a hands-on leadership role where you'll spend approximately 50% of your time on technical contributions while guiding your team to push the boundaries of machine learning for cancer research and diagnostics.
Job Responsibility:
Build and scale a high-performing team capable of tackling complex distributed ML challenges
Own the full employee lifecycle: recruiting, onboarding, performance management, career development, and retention
Empower your team members and help them grow in autonomy and technical expertise
Mentor engineers at all levels, fostering a culture of continuous learning and psychological safety
Create an inclusive environment where diverse perspectives drive innovation
Define and execute technical roadmaps aligned with company objectives and product needs
Lead resource allocation and capacity planning to balance team workload and business priorities
Own FinOps responsibilities: optimize cloud costs, track spending, and ensure efficient resource utilization
Ensure operational readiness through monitoring, incident response protocols, and system reliability practices
Establish and track KPIs for team performance, system efficiency and health
Design, develop, and maintain robust large-scale distributed training pipelines and ML infrastructure using cutting-edge technologies
Lead architecture decisions for distributed systems that enable efficient model development at scale
Hands-on contribution to critical technical challenges, including optimization of training pipelines and infrastructure
Drive technical excellence through code reviews and architectural guidance
Stay at the forefront of distributed training technologies and bring innovation to the team
Partner closely with Product teams to translate business requirements into technical solutions
Collaborate with (senior) Research Scientists to enable scalable model development and experimentation
Work with Platform Engineering to ensure robust infrastructure and tooling
Build strong relationships across engineering teams to drive alignment and knowledge sharing
Communicate technical concepts effectively to both technical and non-technical stakeholders
Requirements:
Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field
6+ years of software engineering or ML engineering experience, with at least 2 years in a technical leadership or team lead role
Proven track record of building and leading high-performing engineering teams
Experience guiding projects across the whole Software Development Life Cycle
Deep understanding of fundamental Machine Learning concepts and principles, familiarity with advanced model optimization techniques
Significant experience with large-scale distributed training systems and frameworks (especially PyTorch and NCCL)
Familiarity with GPUs, distributed systems, parallel computing and scaling laws
Advanced programming skills in Python, experience in performance-critical languages (C/C++ or CUDA) being a plus
Familiarity of MLOps/DevOps best practices including CI/CD, Docker, Kubernetes, and observability, cloud platforms (GCP, AWS or Azure) and infrastructure-as-code
Experience with Linux, version control, and container technologies
Demonstrated ability in resource allocation, capacity planning, and FinOps principles
Excellent problem-solving and data-driven decision-making skills in ambiguous situations
Effective communication and stakeholder management skills
Ability to give constructive feedback and navigate difficult conversations
Proven people leadership skills with experience managing the full employee lifecycle
Strategic thinking with ability to balance short-term execution and long-term vision
Experience with agile methodologies and iterative development processes
Proven ability to influence without authority and build consensus across teams
Track record of empowering team members and fostering autonomy
Nice to have:
Experience with production systems in a regulated or healthcare environments, familiarity with medical device standards (ISO 13485)
Experience working with biomedical or image data
Hands-on experience with Google Kubernetes Engine, SLURM and Ray distributed computing framework
Experience with advanced ML stack (TorchDyno, JAX, TensorRT)
Familiarity with Information Security standards (ISO 27001) in software development
Experience with FinOps tools and cloud cost optimization strategies
Demonstrated experience with leveraging LLM/Agentic systems to accelerate development
What we offer:
Learning & Development yearly budget of 1,000€ (plus 2 L&D days)
Language classes, and internal development programs
Access to leadership development programs and executive coaching
Flexible working hours and teleworking policy
30 paid vacation days per year
Family & pet friendly and support flexible parental leave options
Subsidized membership of your choice among public transport, sports, and well-being
Social gatherings, lunches, and off-site events for a fun and inclusive work environment