This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Amazon Artificial General Intelligence (AGI) Data Services organization is responsible for developing diverse datasets to train and evaluate the Amazon AI models. We are looking for Language Engineers to join our science and engineering team to support the development of complex, multimodal datasets, using a range of approaches including synthetic data generation, model-supported data generation, and human-in-the-loop data collections. You will play a critical role in driving innovation and advancing the state-of-the-art in evaluating and training AI models. You will work closely with cross-functional teams, including product managers, engineers, and data scientists to ensure that our AI systems are best in class.
Job Responsibility:
Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
Analyze and extract insights from large amounts of data
Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
Use modeling tools to bootstrap or test new AI functionalities
Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models
Requirements:
Advanced degree in computer science, mathematics, statistics, machine learning or equivalent quantitative field
Experience with language annotation and other forms of data markup
Knowledge of one or more scripting languages (e.g., Python, Ruby, Perl)
2+ years experience in computational linguistics or language data processing or AI data creation
Experience working with speech, text, and multimodal data in multiple languages
Excellent communication, strong organizational skills and very detailed oriented
Comfortable working in a fast paced, highly collaborative, dynamic work environment
Nice to have:
Experience with database queries and data analysis processes (SQL, R, MATLAB, etc.), or Unix
PhD in Computational Linguistics (or equivalent field with computational emphasis)
Expertise in bootstrapping AI data collections for quickly evolving requirements
Extensive experience working with speech, text, and multimodal data in multiple languages
Experience in data creation for complex agentic workflows
Practical experience with Machine Learning
Familiarity with technical concepts such as APIs
Practical knowledge of version control and agile development
Willingness to support several projects at one time, and to accept reprioritization as necessary
Able to think creatively and possess strong analytical and problem solving skills