This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a rare and foundational opportunity to define the future of multimodal AI. You will be at the forefront of building and training large-scale multimodal models, directly impacting how users interact with pixels. This role offers the chance to bridge cutting-edge research with magical, shipped products, working end-to-end on novel problems with no existing playbook.
Job Responsibility:
Architect large-scale multimodal agentic models that use reasoning, planning, coding, and tool calling to achieve complex, multi-step multimodal work
Hillclimbing existing tasks and formulating new tasks through data
Design, implement, and run robust data pipelines for constructing, enriching, and filtering massive pixel datasets
Train large-scale multimodal models on massive datasets and GPU clusters
Define and build novel evaluation frameworks to measure multimodal agents
Requirements:
Strong foundation in machine learning, foundation models and agentic systems
Deep understanding of agentic systems and approaches in LLM/VLM reasoning, coding models, LLM/VLM tool calling
Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets)
Nice to have:
Experience in the following around data, modeling, or evaluation: State-of-the-art foundation models in reasoning
State-of-the-art foundation models in coding
State-of-the-art foundation models in tool calling