This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Thoth AI is seeking detail-driven, STEM-literate individuals to train vision-language models (VLMs) to understand and process complex scientific content. As a Scientific Data Labeler, you will transcribe and standardize mathematical, physical, and chemical content from educational images into structured LaTeX code, directly contributing to the development of smarter, more capable AI. We are hiring across four language tracks: Spanish, Indonesian, and Portuguese.
Job Responsibility:
Accurately transcribe mathematical formulas, chemical equations, and physical symbols from images using LaTeX
Clean and standardize educational text content in your native target language
Ensure structured data outputs meet VLM training quality specifications
Maintain high accuracy and consistency across large volumes of technical content
Follow annotation guidelines and meet productivity and quality targets
Requirements:
Proficient in LaTeX, including fractions, matrices, square roots, summations, and multi-line equations
First-language proficiency (C2 level) in targeted languages: Spanish, Indonesian, or Portuguese (one language per applicant)
Strong English reading and writing skills: B1/B2 or equivalent (mandatory for all non-English tracks) — all project documentation and guidelines are in English
High attention to detail with the ability to follow complex, structured annotation guidelines
Ability to work independently and deliver consistent output in a remote setting
Nice to have:
A current student or graduate in a STEM field (Mathematics, Physics, Chemistry, Engineering, or related discipline) is strongly preferred
Experience using LaTeX in academic contexts (e.g., thesis writing, research papers, or teaching assistant roles) is a strong advantage
Prior experience in data labeling, annotation, or content processing projects is a plus
Familiarity with AI training data workflows is beneficial but not required