Memorization in Deep Networks
||Memorization in Deep Networks|
||Mårten Björkman <email@example.com>|
||Kungliga Tekniska högskolan|
||2022-04-01 – 2022-10-01|
Deep networks achieve state of the art performance on several tasks within the natural world. At the same time, modern architectures have enough capacity to perfectly fit common benchmark datasets and shatter them.
Measures of neural network expressivity, chiefly the number of local linear components realized by deep networks (linear region density), on the one hand theoretically motivate the benefits of deep networks versus wide and shallow ones, while on the other hand are hard to compute in practice for state of the art models.
At present, the mechanisms governing learning and memorization in deep networks are not fully understood, and there is growing debate over the practical utility of linear regions density in measuring expressivity of networks trained in practice.
In this work, we exploit the local geometry of convolutional and dense layers to empirically investigate the mechanisms underlying memorization and generalization. Particularly, we study the connection between the local geometry of linear regions and the variation of the functions learned by ReLU networks trained in practice, throughout learning epochs. The final aim of the study is to connect increased local redundancy of linear regions with improved generalization, in a epoch-wise double descent training regime.