dmgp.layers

Base Variational Layer

_BaseVariationalLayer

class dmgp.layers.base_variational_layer._BaseVariationalLayer[source]

The base variational layer is implemented as a torch.nn.Module that, when called on two distributions \(Q\) and \(P\) returns a torch.Tensor that represents the KL divergence between two Gaussians \(\left( Q\parallel P \right)\).

\[\begin{equation*} D_{\text{KL}}\left( Q\parallel P \right)= \sum_{x\in \mathcal{X}}Q(x)\log\left( \frac{Q(x)}{P(x)} \right) \end{equation*}\]

kl_div(mu_q, sigma_q, mu_p, sigma_p)[source]

Calculates kl divergence between two gaussians (Q || P)

Parameters:: mu_q (torch.Tensor) – mean of distribution Q
Sigma_q:: deviation of distribution Q
Mu_p:: mean of distribution P
Sigma_p:: deviation of distribution P
Returns:: the KL divergence between Q and P.

GP Activation Layers

Tensor Markov Kernel (TMK)

class dmgp.layers.TMK(in_features, n_level=2, input_lb=-2, input_ub=2, kernel=LaplaceProductKernel(), design_class=<class 'dmgp.utils.sparse_design.design_class.HyperbolicCrossDesign'>)[source]

Implements tensor markov GP as an activation layer using sparse grid structure.

\[\begin{equation*} k\left( \mathbf{x}, X^{SG} \right)R^{-1} \end{equation*}\]

Parameters:

in_features (int) – Size of each input sample.
n_level (int, optional) – Level of sparse grid design. (Default: 2.)
input_lb (float, optional) – Input lower boundary. (Default: -2.)
input_ub (float, optional) – Input upper boundary. (Default: 2.)
design_class (class, dmgp.utils.sparse_design.design_class, optional) – Base design class of sparse grid. (Default: HyperbolicCrossDesign.)
kernel (class, dmgp.kernels, optional) – Kernel function of deep GP. (Default: LaplaceProductKernel(lengthscale=1.).)

forward(x)[source]

Computes the tensor markov kernel activation of \(\mathbf x\).

Parameters:: x (torch.Tensor.float) – [N, C] size tensor, N is the batch size, C is the feature size of input
Returns:: [N, M] size tensor, kernel(input, sparse_grid) @ chol_inv

Additive Markov Kernel (AMK)

class dmgp.layers.AMK(in_features, n_level=3, input_lb=-2, input_ub=2, kernel=LaplaceProductKernel(), design_class=<class 'dmgp.utils.sparse_design.design_class.HyperbolicCrossDesign'>)[source]

Implements additive markov GP as an activation layer using additive structure.

\[\begin{equation*} \left\{ k\left( x_i, X^{SG} \right)R^{-1} \right\}^{d}_{i=1} \end{equation*}\]

Parameters:

in_features (int) – Size of each input sample.
n_level (int, optional) – Level of induced points for approximating GP. (Default: 3.)
input_lb (float, optional) – Input lower boundary. (Default: -2.)
input_ub (float, optional) – Input upper boundary. (Default: 2.)
design_class (class, dmgp.utils.sparse_design.design_class, optional) – Base design class of sparse grid. (Default: HyperbolicCrossDesign.)
kernel (class, dmgp.kernels, optional) – Kernel function of deep GP. (Default: LaplaceProductKernel(lengthscale=1.).)

forward(x)[source]

Computes the element-wise markov kernel activation of \(\mathbf x\).

Parameters:: x (torch.Tensor.float) – [N, C] size tensor, N is the batch size, C is the channels of input, L is the sequence length
Returns:: [N, C*L*M] size tensor, kernel(input, sparse_grid) @ chol_inv

Linear Layers

Linear Reparameterization

class dmgp.layers.LinearReparameterization(in_features, out_features, prior_mean=0, prior_variance=1, posterior_mu_init=0, posterior_rho_init=-3.0, bias=True)[source]

Implements Linear layer with reparameterization trick. Inherits from dmgp.layers._BaseVariationalLayer

Parameters:

in_features (int) – Size of each input sample.
out_features (int) – Size of each output sample.
prior_mean (float, optional) – Mean of the prior arbitrary distribution to be used on the complexity cost. (Default: 0.)
prior_variance (float, optional) – Variance of the prior arbitrary distribution to be used on the complexity cost. (Default: 1.0.)
posterior_mu_init (float, optional) – Initialized trainable mu parameter representing mean of the approximate posterior. (Default: 0.)
posterior_rho_init (float, optional) – Initialized trainable rho parameter representing the sigma of the approximate posterior through softplus function. (Default: -3.0.)
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (Default: True.)

forward(x, return_kl=True)[source]

Forward the bayesian Linear layer.

Parameters:

x (torch.Tensor.float) – Training data of shape \((n,d)\).
return_kl (bool, optional) – Return KL-divergence. Default: True.

Returns:

The output and KL-divergence.

Linear Flipout

class dmgp.layers.LinearFlipout(in_features, out_features, prior_mean=0, prior_variance=1, posterior_mu_init=0, posterior_rho_init=-3.0, bias=True)[source]

Implements Linear layer with Flipout reparameterization trick. Ref: https://arxiv.org/abs/1803.04386. Inherits from dmgp.layers._BaseVariationalLayer.

Parameters:

in_features (int) – Size of each input sample.
out_features (int) – Size of each output sample.
prior_mean (float, optional) – Mean of the prior arbitrary distribution to be used on the complexity cost. (Default: 0.)
prior_variance (float, optional) – Variance of the prior arbitrary distribution to be used on the complexity cost. (Default: 1.0.)
posterior_mu_init (float, optional) – Initialized trainable mu parameter representing mean of the approximate posterior. (Default: 0.)
posterior_rho_init (float, optional) – Initialized trainable rho parameter representing the sigma of the approximate posterior through softplus function. (Default: -3.0.)
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (Default: True.)

forward(x, return_kl=True)[source]

Forward the bayesian Linear layer.

Parameters:

x (torch.Tensor.float) – Training data of shape \((n,d)\).
return_kl (bool, optional) – Return KL-divergence. Default: True.

Returns:

The output and KL-divergence.