Continual Learning Methods for Large-Scale Multi-Lingual Language Models
|Continual Learning Methods for Large-Scale Multi-Lingual Language Models
|Magnus Boman <firstname.lastname@example.org>
|Kungliga Tekniska högskolan
|2023-08-28 – 2024-03-01
Continual learning (CL) focuses on the development of machine learning models that are able to accumulate knowledge about non-stationary input data, i.e. the input data distribution varies over time in their fixed model capacity, while effectively retaining previously learned information, i.e. avoiding catastrophic forgetting. By acquiring these two properties, the model will be able to adaptively learn from new data by leveraging previously learned knowledge, while avoiding the additional computational cost of re-training models from scratch. In this project, we will study the CL capabilities of large-scale generative language models (GPT architectures) in the scenario when each data distribution comes from a different language. Given the increasing amount of computational and data resources required to train a high-performing language model we would like these models to be adaptable to unsee languages while preventing them from forgetting existing knowledge. In this way, existing language models could be reused for learning new languages, easing the computational requirements of training a new model from scratch for each language separately. An additional aspect we want to investigate is how the scale, in terms of model size and data size used for adaptation, affects the CL capabilities of the language model.