Parameter space representations
===============================

Parameter space representations are :math:`d \times d` objects that define metrics in parameter space such as:

 - Fisher Information Matrices/Gauss-Newton matrix
 - Gradient 2nd moment (e.g. the sometimes called *Empirical Fisher*)
 - Other covariances such as in Bayesian Deep Learning

These matrices are often too large to fit in memory, for instance when :math:`d` is in the order of :math:`10^6 - 10^8`
as is typical in current deep networks. Here is a list of parameter space representations that are available in NNGeometry,
computed on a small network, represented as images where each pixel represent a component of the matrix, and the color is
the magnitude of these components. These matrices are normalized by their diagonal (i.e. these are correlation matrices) for
better visualization:

:class:`nngeometry.object.pspace.PMatDense` representation: this is the usual dense matrix. Memory cost: :math:`d \times d`

.. image:: https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatDense.png
  :width: 400
  
:class:`nngeometry.object.pspace.PMatBlockDiag` representation: a block-diagonal representation where diagonal blocks are
dense matrices corresponding to parameters of a single layer, and cross-layer interactions are ignored (their coefficients are
set to :math:`0`). Memory cost: :math:`\sum_l d_l \times d_l` where :math:`d_l` is the number of parameters of layer :math:`l`.

.. image:: https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatBlockDiag.png
  :width: 400

:class:`nngeometry.object.pspace.PMatKFAC` representation :cite:p:`martens2015optimizing, grosse2016kronecker`: a block-diagonal representation where diagonal blocks are
factored as the Kronecker product of two smaller matrices, and cross-layer interactions are ignored (their coefficients are
set to :math:`0`). Memory cost: :math:`\sum_l g_l \times g_l + a_l \times a_l` where :math:`a_l` is the number of neurons of the
input of layer :math:`l` and :math:`g_l` is the number of pre-activations of the output of layer :math:`l`.

.. image:: https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatKFAC.png
  :width: 400

:class:`nngeometry.object.pspace.PMatEKFAC` representation :cite:p:`george2018fast`: a block-diagonal representation where diagonal blocks are
factored as a diagonal matrix in a Kronecker factored eigenbasis, and cross-layer interactions are ignored (their coefficients are
set to :math:`0`). Memory cost: :math:`\sum_l g_l \times g_l + a_l \times a_l + d_l` where :math:`a_l` is the number of neurons of the
input of layer :math:`l` and :math:`g_l` is the number of pre-activations of the output of layer :math:`l`, and :math:`d_l` is 

.. image:: https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatEKFAC.png
  :width: 400

:class:`nngeometry.object.pspace.PMatDiag` representation: a diagonal representation that ignores all interactions between parameters. 
Memory cost: :math:`d`

.. image:: https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatDiag.png
  :width: 400

:class:`nngeometry.object.pspace.PMatQuasiDiag` representation :cite:p:`ollivier2015riemannian`: a diagonal representation where for each neuron, a coefficient is also
stored that measures the interaction between this neuron's weights and the corresponding bias. 
Memory cost: :math:`2 \times d`

.. image:: https://github.com/tfjgeorge/nngeometry/raw/main/docs/repr_img/PMatQuasiDiag.png
  :width: 400