Improving null space learning by pruning and rank limitation
||Improving null space learning by pruning and rank limitation|
||Dorian Staudt <firstname.lastname@example.org>|
||Chalmers tekniska högskola|
||2022-10-24 – 2023-05-01|
Continual Learning describes the topic of training a model on sequential tasks, while minimising or eliminating negative backwards transfer, also known as "catastrophic forgetting", without greatly increasing the storage space required with each task. A recently proposed method (Wang et al., Training Networks in Null Space of Feature Covariance for Continual Learning, 2021) saves the uncentred covariance of the inputs each layer of a neural network receives from previously learned data. Due to the properties of this covariance, information from multiple tasks can simply be summed up in a fixed size matrix. From the covariances a transformation is computed, that then allows to project future gradients into the approximate null space of the data of previous tasks.
We plan to explore methods that ensure this approximate null space is large enough to accommodate future tasks while also keeping negative backwards transfer low. The first of this methods is a combination with a previously proposed pruning based method (Golkar et al., Continual Learning via Neural Pruning, 2019) in which neurons and channels not contributing much to a task are pruned ("graceful forgetting") and re-initialised, whereas the non-pruned neurons and channels are frozen for the training of future tasks. This increases the available null space, and would also benefit from a method that allows the remaining capacity of pruned neurons and channels to be used.
Second, we plan to restrict the rank of weight matrices with the use of a nuclear norm and factorisation, slowly increasing the allowed rank with the task number. While the calculation of the nuclear norm is normally slow, but factorisation allows to use the faster Frobenius norm instead (Burer & Monteiro, A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization, 2003). This allows a better control over the trade-off between stability and plasticity of the null space based method.