Parameter space representations¶

Parameter space representations are \(d \times d\) objects that define metrics in parameter space such as:

Fisher Information Matrices/Gauss-Newton matrix

Gradient 2nd moment (e.g. the sometimes called Empirical Fisher)

Other covariances such as in Bayesian Deep Learning

These matrices are often too large to fit in memory, for instance when \(d\) is in the order of \(10^6 - 10^8\) as is typical in current deep networks. Here is a list of parameter space representations that are available in NNGeometry, computed on a small network, represented as images where each pixel represent a component of the matrix, and the color is the magnitude of these components. These matrices are normalized by their diagonal (i.e. these are correlation matrices) for better visualization:

nngeometry.object.pspace.PMatDense representation: this is the usual dense matrix. Memory cost: \(d \times d\)

https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatDense.png

nngeometry.object.pspace.PMatBlockDiag representation: a block-diagonal representation where diagonal blocks are dense matrices corresponding to parameters of a single layer, and cross-layer interactions are ignored (their coefficients are set to \(0\)). Memory cost: \(\sum_l d_l \times d_l\) where \(d_l\) is the number of parameters of layer \(l\).

https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatBlockDiag.png

nngeometry.object.pspace.PMatKFAC representation [GM16, MG15]: a block-diagonal representation where diagonal blocks are factored as the Kronecker product of two smaller matrices, and cross-layer interactions are ignored (their coefficients are set to \(0\)). Memory cost: \(\sum_l g_l \times g_l + a_l \times a_l\) where \(a_l\) is the number of neurons of the input of layer \(l\) and \(g_l\) is the number of pre-activations of the output of layer \(l\).

https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatKFAC.png

nngeometry.object.pspace.PMatEKFAC representation [GLB+18]: a block-diagonal representation where diagonal blocks are factored as a diagonal matrix in a Kronecker factored eigenbasis, and cross-layer interactions are ignored (their coefficients are set to \(0\)). Memory cost: \(\sum_l g_l \times g_l + a_l \times a_l + d_l\) where \(a_l\) is the number of neurons of the input of layer \(l\) and \(g_l\) is the number of pre-activations of the output of layer \(l\), and \(d_l\) is

https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatEKFAC.png

nngeometry.object.pspace.PMatDiag representation: a diagonal representation that ignores all interactions between parameters. Memory cost: \(d\)

https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatDiag.png

nngeometry.object.pspace.PMatQuasiDiag representation [Oll15]: a diagonal representation where for each neuron, a coefficient is also stored that measures the interaction between this neuron’s weights and the corresponding bias. Memory cost: \(2 \times d\)