Exact GP Regression Tutorial

Introduction

In this notebook, we use DMGP to train a simplest regression model. We’ll be modeling the function \begin{align} &y = \sin(2\pi x) + \epsilon, \\ &\epsilon \sim \mathcal{N}(0,0.04). \end{align} with 100 trianing examples, and testing on 50 test examples.

Note: this notebook is not necessarily intended to teach the mathematical background of Gaussian processes, but rather how to train a simple one and make predictions in DMGP. For a mathematical treatment, please see the reference “A sparse expansion for deep Gaussian processes”: https://arxiv.org/abs/2112.05888

[1]:

import math
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from dmgp.models import DMGP
from dmgp.layers.activation import TMK, AMK
from dmgp.layers.linear import LinearFlipout

Set up training data

In the next cell, we set up the training data for this example. We’ll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.

[2]:

# Training data is 100 points in [0,1] inclusive regularly spaced
train_x = torch.linspace(0, 1, 100)
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2
x_tensor = train_x.view(-1, 1)
y_tensor = train_y.view(-1, 1)

Setting up the model

The next cell demonstrates the most critical features of a user-defined DGP model in DMGP. Basically, there are two ways to define a customized DGP model: use dmgp.models.DMGP or construct your model from scratch using dmgp.layers.

If we define the model using dmgp.models.DMGP, you need to construct the following DMGP objects: 1. Input and output dimension (input_dim, output_dim) - For regression output_dim=1. 2. Number of hidden layers (num_layers) - Usually we use 2 hidden layers. 3. Level of inducing (num_inducing) - This sets the level of inducing points we use for GP inference. 4. Dimension of hidden layers (hidden_dim) - This sets the width of hidden layers. 5. A kernel (kernel) - This defines the prior covariance of the GP. (In DMGP, DMGP.kernels.laplace_kernel.LaplaceProductKernel is a good choice to start). 6. A layer type (layer_type) - This defines the linear Bayesian layer we use for learnable GP weights. (DMGP.layers.linear.LinearFlipout is used by default).

[3]:

from dmgp.utils.sparse_design.design_class import HyperbolicCrossDesign
from dmgp.kernels.laplace_kernel import LaplaceProductKernel

# Define the model using DMGP
dmgp_model = DMGP(in_features=1,
                  out_features=1,
                  num_layers=2,
                  hidden_dim=8,
                  num_inducing=3,
                  input_lb=0,
                  input_ub=1,
                  kernel=LaplaceProductKernel(1.),
                  design_class=HyperbolicCrossDesign,
                  linear_layer=LinearFlipout,
                  option='additive')

If we define the customized model from scratch, the following is an example which consists of two GP layers, each with level-8 and level-5 sparse grid design, respectively.

[4]:

# Define a DMGP model from scratch
class DMGP_regression(nn.Module):
    def __init__(self, input_dim, output_dim, design_class, kernel):
        super(DMGP_regression, self).__init__()

        # 1st layer of DGP: input:[n, input_dim] size tensor, output:[n, 1] size tensor
        self.mk1 = AMK(
            in_features=input_dim,
            n_level=5,
            input_lb=0,
            input_ub=1,
            kernel=kernel,
            design_class=design_class,
        )
        self.fc1 = LinearFlipout(self.mk1.out_features, 1)

        # 2nd layer of DGP: input:[n, 1] size tensor, output:[n, output_dim] size tensor
        self.mk2 = AMK(
            in_features=1,
            n_level=5,
            input_lb=0,
            input_ub=1,
            kernel=kernel,
            design_class=design_class,
        )
        self.fc2 = LinearFlipout(in_features=self.mk2.out_features, out_features=output_dim)

    def forward(self, x, return_kl=True):
        kl_sum = 0
        x = self.mk1(x)
        x, kl = self.fc1(x)
        kl_sum += kl
        x = self.mk2(x)
        x, kl = self.fc2(x)
        kl_sum += kl
        if return_kl:
            return x, kl_sum
        else:
            return x

model = DMGP_regression(
    input_dim=1,
    output_dim=1,
    design_class=HyperbolicCrossDesign,
    kernel=LaplaceProductKernel(1.),
)

Training the model

In the next cell, we handle using variational inference (VI) to train the parameters of the DMGP.

[5]:

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
epochs = 1000
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output, kl = model(x_tensor)
    loss = criterion(output, y_tensor) + kl/len(train_x)
    loss.backward()
    optimizer.step()

    if epoch % 100 == 0:
        print(f'Epoch [{epoch}/{epochs}], Loss: {loss.item()}')

Epoch [0/1000], Loss: 0.6518664360046387
Epoch [100/1000], Loss: 0.22241495549678802
Epoch [200/1000], Loss: 0.14881879091262817
Epoch [300/1000], Loss: 0.13053223490715027
Epoch [400/1000], Loss: 0.1292726993560791
Epoch [500/1000], Loss: 0.1213754415512085
Epoch [600/1000], Loss: 0.11744579672813416
Epoch [700/1000], Loss: 0.1487128734588623
Epoch [800/1000], Loss: 0.13028332591056824
Epoch [900/1000], Loss: 0.12365931272506714

Making predictions with the model

In the next cell, we make predictions with the model. To do this, we simply put the model in eval mode, and call both modules on the test data.

[6]:

# Test points are regularly spaced along [0,1]
test_x = torch.linspace(0, 1, 50)
test_y = torch.sin(test_x * (2 * math.pi))
test_x = test_x.view(-1, 1)
test_y = test_y.view(-1, 1)

# Make predictions by feeding test dat through DMGP model
test_loss = []
with torch.no_grad():
    predicts = []
    for mc_run in range(20):
        model.eval()
        output = model.forward(test_x, return_kl=False)
        loss = F.mse_loss(output, test_y).cpu().data.numpy()
        test_loss.append(loss)
        predicts.append(output.cpu().data.numpy())

    pred_mean = np.mean(predicts, axis=0)
    pred_std = np.std(predicts, axis=0)

print('test loss: ', np.mean(test_loss))

test loss:  0.06484331

Plot the model fit

In the next cell, we plot the mean and confidence region of the DMGP prediction.

[7]:

from matplotlib import pyplot as plt
from scipy import stats

# Confidence level
confidence_level = 0.95
z = stats.norm.ppf(1 - (1 - confidence_level) / 2)  # z-score for 95% confidence

# Calculate the confidence interval
lower = pred_mean - z * pred_std
upper = pred_mean + z * pred_std

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), pred_mean, 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy().squeeze(), lower.squeeze(), upper.squeeze(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

../../_images/examples_01_Exact_GPs_Simple_GP_Regression_13_0.png