This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Appen’s GenAI research team advances how frontier models are evaluated, improved, and deployed in production environments. The purpose of this role is to design and implement research and engineering workflows that strengthen model performance, create new benchmarks, and improve production models without regressing on core characteristics. This role provides hands on ownership of training and evaluation pipelines, benchmark development, and model improvement initiatives that directly influence deployed systems.
Job Responsibility:
Design and implement a lightweight supervised fine tuning training pipeline using open source LLMs
Create new benchmarks to evaluate frontier models across defined scientific and performance criteria
Analyze production models to identify measurable areas for improvement
Improve model performance through targeted retraining and hyperparameter search
Deploy improved models while maintaining core model characteristics and avoiding regression
Build Python tooling to automate training, evaluation, benchmarking, and experimentation workflows
Implement structured evaluation methods, including rubric based scoring and LLM as a judge workflows
Document experimental design, benchmark methodology, and performance results with clarity and precision
Iterate rapidly in a research driven environment to increase model quality and reliability
Requirements:
Current enrollment in or recent completion of a Master’s or PhD in Computer Science, AI, Machine Learning, Computer Engineering, or a closely related technical field
Strong experience working with large language models, including supervised fine tuning, prompt engineering, or model evaluation
Hands on experience building machine learning pipelines or research infrastructure
Experience improving model performance through retraining or hyperparameter tuning
Proficiency in Python and comfort working with machine learning frameworks and open source model ecosystems
Familiarity with cloud environments such as AWS or Azure
Strong technical problem solving ability, including use of LLMs as development aids for building and iteration
Ability to work independently with minimal hand holding
Strong written communication skills for summarising research and drafting technical documentation
Ability to collaborate effectively in a remote research environment