# Parameter space representations¶

Parameter space representations are $$d \times d$$ objects that define metrics in parameter space such as:

• Fisher Information Matrices/Gauss-Newton matrix

• Gradient 2nd moment (e.g. the sometimes called Empirical Fisher)

• Other covariances such as in Bayesian Deep Learning

These matrices are often too large to fit in memory, for instance when $$d$$ is in the order of $$10^6 - 10^8$$ as is typical in current deep networks. Here is a list of parameter space representations that are available in NNGeometry, computed on a small network, represented as images where each pixel represent a component of the matrix, and the color is the magnitude of these components. These matrices are normalized by their diagonal (i.e. these are correlation matrices) for better visualization:

nngeometry.object.pspace.PMatDense representation: this is the usual dense matrix. Memory cost: $$d \times d$$

nngeometry.object.pspace.PMatBlockDiag representation: a block-diagonal representation where diagonal blocks are dense matrices corresponding to parameters of a single layer, and cross-layer interactions are ignored (their coefficients are set to $$0$$). Memory cost: $$\sum_l d_l \times d_l$$ where $$d_l$$ is the number of parameters of layer $$l$$.

nngeometry.object.pspace.PMatKFAC representation [GM16, MG15]: a block-diagonal representation where diagonal blocks are factored as the Kronecker product of two smaller matrices, and cross-layer interactions are ignored (their coefficients are set to $$0$$). Memory cost: $$\sum_l g_l \times g_l + a_l \times a_l$$ where $$a_l$$ is the number of neurons of the input of layer $$l$$ and $$g_l$$ is the number of pre-activations of the output of layer $$l$$.

nngeometry.object.pspace.PMatEKFAC representation [GLB+18]: a block-diagonal representation where diagonal blocks are factored as a diagonal matrix in a Kronecker factored eigenbasis, and cross-layer interactions are ignored (their coefficients are set to $$0$$). Memory cost: $$\sum_l g_l \times g_l + a_l \times a_l + d_l$$ where $$a_l$$ is the number of neurons of the input of layer $$l$$ and $$g_l$$ is the number of pre-activations of the output of layer $$l$$, and $$d_l$$ is

nngeometry.object.pspace.PMatDiag representation: a diagonal representation that ignores all interactions between parameters. Memory cost: $$d$$

nngeometry.object.pspace.PMatQuasiDiag representation [Oll15]: a diagonal representation where for each neuron, a coefficient is also stored that measures the interaction between this neuron’s weights and the corresponding bias. Memory cost: $$2 \times d$$