This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The VLM team builds vision-language models that run on-device, under tight latency and memory constraints, without sacrificing quality. We have released four best-in-class models and we're just getting started. This team owns the full VLM pipeline end-to-end: from researching new architectures and training algorithms through data curation, evaluation, and deployment. You'll join a focused, hands-on group that works directly on models and collaborates closely with our pretraining, post-training, and infrastructure teams. Success here is measured by the capability of the models we ship.
Job Responsibility:
Lead a new model capability end-to-end from task spec through data curation, training recipe, ablations, evaluation, and into the final shipped model
Improve visual reasoning through reinforcement learning and preference optimization methods
Push the quality-efficiency frontier on token efficiency via encoder/connector design
Requirements:
Hands-on experience in training or evaluating VLMs with demonstrated experimental rigor
Ability to turn research ideas into scalable implementations, refine and iterate through hypotheses
Proficiency in Python and at least one deep learning framework
M.S. or Ph.D. in Computer Science, Mathematics, or a related field
or equivalent industry experience
Nice to have:
Building or optimizing multimodal training or data pipelines
Experience with distributed training (DeepSpeed, FSDP, Megatron-LM, etc.)