This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a rare and foundational opportunity to define the future of multimodal AI. You will be at the forefront of building and training large-scale multimodal models and systems that complete multimodal work. This role offers the chance to bridge cutting-edge research with magical, shipped products, working end-to-end on novel problems with no existing playbook. This opportunity involves both the “science” and “engineering” parts of research, two aspects that are of equal importance. This is a multi-stack opportunity where you will work on the intersection of modeling, data, systems, and evaluation that enable building agents that can complete multimodal work end-to-end.
Job Responsibility:
Architect large-scale multimodal agentic models that use reasoning, planning, coding, and tool calling to achieve complex, multi-step multimodal work
Design, implement, and run robust data pipelines for constructing, enriching, and filtering agentic datasets
Train large-scale multimodal agents on massive datasets and GPU clusters
Define and build novel evaluation frameworks to measure multimodal agents
Requirements:
Strong foundation in machine learning, foundation models and agentic systems
Deep understanding of agentic systems and approaches in LLM/VLM reasoning, coding models, LLM/VLM tool calling
Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets)
Able to contribute continuous 6 months in the internship
Nice to have:
Experience in the following around data, modeling, or evaluation: State-of-the-art foundation models in reasoning
State-of-the-art foundation models in coding
State-of-the-art foundation models in tool calling