Improving SOTA for Swedish LLMs through human preference training data
Title: |
Improving SOTA for Swedish LLMs through human preference training data |
DNr: |
Berzelius-2024-175 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Birger Moell <bmoell@kth.se> |
Affiliation: |
Kungliga Tekniska högskolan |
Duration: |
2024-05-14 – 2024-12-01 |
Classification: |
10208 |
Homepage: |
https://chatbotarena.se |
Keywords: |
|
Abstract
chatbotarena.se is a Swedish benchmark for LLMs where users can decide what LLM they prefer based on a prompt.
This output is useful for training LLMs because it gives highly valuable human preference data that can be used to train LLMs using state of the art methods such as Direct Preference Optimization (DPO) a method that improves on RLHF, the method used to train chat-GPT. DPO data is acquired when users select between two models.
Our goal is to combine our high quality DPO dataset and the newly releases Llama3 models to train a state of the art large language model for the Swedish language. We will fine-tune both 8b and 70b Llama-3 models.
We will also look at fine-tuning based on other DPO datasets as well as large Swedish datasets such as the pile.
The collaborative nature of the project containing stakeholders from KTH, AI Sweden and experts in industry gives us access to more data resources which improves our chances of training a state of the art model. We aim to release our trained models open source.