Continual Learning Methods for Large-Scale Language Models
Title: Continual Learning Methods for Large-Scale Language Models
DNr: Berzelius-2023-119
Project Type: LiU Berzelius
Principal Investigator: Magnus Boman <>
Affiliation: Kungliga Tekniska högskolan
Duration: 2023-05-01 – 2023-11-01
Classification: 10208


Continual learning (CL) focuses on the development of machine learning models that are able to accumulate knowledge about non-stationary input data, i.e. the input data distribution varies over time in their fixed model capacity, while effectively retaining previously learned information, i.e. avoiding catastrophic forgetting. By acquiring these two properties, the model will be able to adaptively learn from new data by leveraging previously learned knowledge. In this project, we will study the CL capabilities of language models, for example transformer-based networks that learn language representations from large amounts of text data. Given the increasing amount of computational and data resources required to train a high-performing language model, such as GPT-4, we would like these models to be adaptable to unseen data distributions, while preventing them from forgetting existing knowledge. In this way, existing language models could be reused for learning new domains, i.e. medical, or even new languages, easing the computational requirements of training a new model from scratch for each case. An additional aspect we want to investigate is how the scale, in terms of model size and data size used for adaptation, affects the CL capabilities of the language model.