dmgp.layers

Base Variational Layer

_BaseVariationalLayer

class dmgp.layers.base_variational_layer._BaseVariationalLayer[source]

The base variational layer is implemented as a torch.nn.Module that, when called on two distributions \(Q\) and \(P\) returns a torch.Tensor that represents the KL divergence between two Gaussians \(\left( Q\parallel P \right)\).

\[\begin{equation*} D_{\text{KL}}\left( Q\parallel P \right)= \sum_{x\in \mathcal{X}}Q(x)\log\left( \frac{Q(x)}{P(x)} \right) \end{equation*}\]
kl_div(mu_q, sigma_q, mu_p, sigma_p)[source]

Calculates kl divergence between two gaussians (Q || P)

Parameters:

mu_q (torch.Tensor) – mean of distribution Q

Sigma_q:

deviation of distribution Q

Mu_p:

mean of distribution P

Sigma_p:

deviation of distribution P

Returns:

the KL divergence between Q and P.

GP Activation Layers

Tensor Markov Kernel (TMK)

class dmgp.layers.TMK(in_features, n_level=2, input_lb=-2, input_ub=2, kernel=LaplaceProductKernel(), design_class=<class 'dmgp.utils.sparse_design.design_class.HyperbolicCrossDesign'>)[source]

Implements tensor markov GP as an activation layer using sparse grid structure.

\[\begin{equation*} k\left( \mathbf{x}, X^{SG} \right)R^{-1} \end{equation*}\]
Parameters:
  • in_features (int) – Size of each input sample.

  • n_level (int, optional) – Level of sparse grid design. (Default: 2.)

  • input_lb (float, optional) – Input lower boundary. (Default: -2.)

  • input_ub (float, optional) – Input upper boundary. (Default: 2.)

  • design_class (class, dmgp.utils.sparse_design.design_class, optional) – Base design class of sparse grid. (Default: HyperbolicCrossDesign.)

  • kernel (class, dmgp.kernels, optional) – Kernel function of deep GP. (Default: LaplaceProductKernel(lengthscale=1.).)

forward(x)[source]

Computes the tensor markov kernel activation of \(\mathbf x\).

Parameters:

x (torch.Tensor.float) – [N, C] size tensor, N is the batch size, C is the feature size of input

Returns:

[N, M] size tensor, kernel(input, sparse_grid) @ chol_inv

Additive Markov Kernel (AMK)

class dmgp.layers.AMK(in_features, n_level=3, input_lb=-2, input_ub=2, kernel=LaplaceProductKernel(), design_class=<class 'dmgp.utils.sparse_design.design_class.HyperbolicCrossDesign'>)[source]

Implements additive markov GP as an activation layer using additive structure.

\[\begin{equation*} \left\{ k\left( x_i, X^{SG} \right)R^{-1} \right\}^{d}_{i=1} \end{equation*}\]
Parameters:
  • in_features (int) – Size of each input sample.

  • n_level (int, optional) – Level of induced points for approximating GP. (Default: 3.)

  • input_lb (float, optional) – Input lower boundary. (Default: -2.)

  • input_ub (float, optional) – Input upper boundary. (Default: 2.)

  • design_class (class, dmgp.utils.sparse_design.design_class, optional) – Base design class of sparse grid. (Default: HyperbolicCrossDesign.)

  • kernel (class, dmgp.kernels, optional) – Kernel function of deep GP. (Default: LaplaceProductKernel(lengthscale=1.).)

forward(x)[source]

Computes the element-wise markov kernel activation of \(\mathbf x\).

Parameters:

x (torch.Tensor.float) – [N, C] size tensor, N is the batch size, C is the channels of input, L is the sequence length

Returns:

[N, C*L*M] size tensor, kernel(input, sparse_grid) @ chol_inv

Linear Layers

Linear Reparameterization

class dmgp.layers.LinearReparameterization(in_features, out_features, prior_mean=0, prior_variance=1, posterior_mu_init=0, posterior_rho_init=-3.0, bias=True)[source]

Implements Linear layer with reparameterization trick. Inherits from dmgp.layers._BaseVariationalLayer

Parameters:
  • in_features (int) – Size of each input sample.

  • out_features (int) – Size of each output sample.

  • prior_mean (float, optional) – Mean of the prior arbitrary distribution to be used on the complexity cost. (Default: 0.)

  • prior_variance (float, optional) – Variance of the prior arbitrary distribution to be used on the complexity cost. (Default: 1.0.)

  • posterior_mu_init (float, optional) – Initialized trainable mu parameter representing mean of the approximate posterior. (Default: 0.)

  • posterior_rho_init (float, optional) – Initialized trainable rho parameter representing the sigma of the approximate posterior through softplus function. (Default: -3.0.)

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (Default: True.)

forward(x, return_kl=True)[source]

Forward the bayesian Linear layer.

Parameters:
  • x (torch.Tensor.float) – Training data of shape \((n,d)\).

  • return_kl (bool, optional) – Return KL-divergence. Default: True.

Returns:

The output and KL-divergence.

Linear Flipout

class dmgp.layers.LinearFlipout(in_features, out_features, prior_mean=0, prior_variance=1, posterior_mu_init=0, posterior_rho_init=-3.0, bias=True)[source]

Implements Linear layer with Flipout reparameterization trick. Ref: https://arxiv.org/abs/1803.04386. Inherits from dmgp.layers._BaseVariationalLayer.

Parameters:
  • in_features (int) – Size of each input sample.

  • out_features (int) – Size of each output sample.

  • prior_mean (float, optional) – Mean of the prior arbitrary distribution to be used on the complexity cost. (Default: 0.)

  • prior_variance (float, optional) – Variance of the prior arbitrary distribution to be used on the complexity cost. (Default: 1.0.)

  • posterior_mu_init (float, optional) – Initialized trainable mu parameter representing mean of the approximate posterior. (Default: 0.)

  • posterior_rho_init (float, optional) – Initialized trainable rho parameter representing the sigma of the approximate posterior through softplus function. (Default: -3.0.)

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (Default: True.)

forward(x, return_kl=True)[source]

Forward the bayesian Linear layer.

Parameters:
  • x (torch.Tensor.float) – Training data of shape \((n,d)\).

  • return_kl (bool, optional) – Return KL-divergence. Default: True.

Returns:

The output and KL-divergence.