Entropy regularized dynamics and birth-death dynamics for neural networks
||Entropy regularized dynamics and birth-death dynamics for neural networks|
||Viktor Nilsson <email@example.com>|
||Kungliga Tekniska högskolan|
||2021-09-01 – 2022-03-01|
Recently, a new collection of dynamics based on gradient flows associated with a particular form of birth-and-death processes has been proposed for the training of neural networks , GANs (more precisely, for zero-sum two person games)  and general MCMC simulation . In the setting of neural networks and GANs, the dynamics are introduced as a combination of regular gradient descent dynamics and a genealogical part where each “particle”—.e.g. parameter in a neural network, component in a mixed strategy—has an internal exponential clock that may cause the particle to duplicate or terminate; upon either a second particle is chosen at random to either terminate or duplicate, respectively, to maintain a fixed number of particles.
The dynamics are introduced based on a gradient flow formulation that can be viewed as the many-particles limit of an interacting particle system with some particular microscopic dynamics. This overall idea of formulating dynamics as flows in the space of probability measures, equipped with an appropriate metric, can be traced back to Chizat and Bach (at least in the machine learning setting). However, for actual implementations, one needs a process-level description of the dynamics — in practice a finite number of particles is used and thus the limit description, in terms of a PDE, is not enough.
In my teams project, we are further expanding upon these works by providing additional particle dynamics (i.e. training algorithms in the case of neural networks) and also tighter results concerning the many-particle limit behavior. Evaluating these dynamics requires theoretical insights but also numerical experiments. Therefore, we would like to run experiments on neural network training, using both real and simulated data, and two-player games such as in .
In our theoretical work on the dynamics/training algorithms, we have arrived at a conjectured large deviation principle, with respect to the number of particles. This concerns convergence to the mean field dynamics. The past works ([1, 2, 3]) do not provide any large deviation principle.
The goal of the project is thus to verify numerically that the conjectured large deviation principle holds. The impact of this would be a stronger basis for understanding neural networks and their training via tools from probability theory and statistical physics, more specifically the theory of large deviations, gradient flows and optimal transport.
 C. Domingo-Enrich, S. Jelassi, A. Mensch, G. Rotskoff and J. Bruna. A mean-field analysis of two-player zero-sum games arXiv:2002.06277, 2020.
 Y. Lu, J. Lu and J. Nolen. Accelerated Langevin sampling with birth-death arXiv:1905.09863,2019.
 G. Rotskoff, S. Jelassi, J. Bruna and E. Vanden-Eijnden. Global convergence of neuron birth-death dynamics.arXiv:1902.01843, 2019.