block_gp uses multiple time series given as input to generate an attractor reconstruction, and then applies Gaussian process regression to approximate the dynamics and make forecasts. This method is the generalized version of tde_gp, which constructs the block from lags of a time series to pass into this function.

block_gp(block, lib = c(1, NROW(block)), pred = lib, tp = 1,
  phi = 0, v_e = 0, eta = 0, fit_params = TRUE, columns = NULL,
  target_column = 1, stats_only = TRUE,
  save_covariance_matrix = FALSE, first_column_time = FALSE,
  silent = FALSE, ...)

Arguments

block

either a vector to be used as the time series, or a data.frame or matrix where each column is a time series

lib

a 2-column matrix (or 2-element vector) where each row specifies the first and last *rows* of the time series to use for attractor reconstruction

pred

(same format as lib), but specifying the sections of the time series to forecast.

tp

the prediction horizon (how far ahead to forecast)

phi

length-scale parameter. see 'Details'

v_e

noise-variance parameter. see 'Details'

eta

signal-variance parameter. see 'Details'

fit_params

specify whether to use MLE to estimate params over the lib

columns

either a vector with the columns to use (indices or names), or a list of such columns

target_column

the index (or name) of the column to forecast

stats_only

specify whether to output just the forecast statistics or the raw predictions for each run

save_covariance_matrix

specifies whether to include the full covariance matrix with the output (and forces the full output as if stats_only were set to FALSE)

first_column_time

indicates whether the first column of the given block is a time column (and therefore excluded when indexing)

silent

prevents warning messages from being printed to the R console

...

other parameters. see 'Details'

Value

If stats_only, then a data.frame with components for the parameters and forecast statistics:

embedding
embedding
tp
prediction horizon
phi
length-scale parameter
v_e
noise-variance parameter
eta
signal-variance parameter
fit_params
whether params were fitted or not
num_pred
number of predictions
rho
correlation coefficient between observations and
predictions
mae
mean absolute error
rmse
root mean square error
perc
percent correct sign
p_val
p-value that rho is significantly greater than 0 usingFisher's z-transformation
model_output
data.frame with columns for the time index,observations, mean-value for predictions, and independent variance for
predictions (ifstats_only == FALSE
or
save_covariance_matrix == TRUE)
covariance_matrix
the full covariance matrix for predictions(if
save_covariance_matrix == TRUE)

Details

The default parameters are set so that passing a vector as the only argument will use that vector to predict itself one time step ahead. If a matrix or data.frame is given as the only argument, the first column will be predicted (one time step ahead), using the remaining columns as the embedding. Rownames will be converted to numeric if possible to be used as the time index, otherwise 1:NROW will be used instead. The default lib and pred are to perform maximum likelihood estimation of the phi, v_e, and eta parameters over the whole time series, and return just the forecast statistics.

If phi, v_e, and eta parameters are given, all combinations of their values will be tried. If fit_params is also set to TRUE, these values will be the initial values for subsequent optimization of likelihood.

The basic model is: $$y = f(x) + \textnormal{noise} $$ in which the function f(x) is modeled using a Gaussian process prior: $$f \sim \textnormal{GP}(0, C) $$ with mean = 0, and covariance function, C, which is given by the squared-exponential kernel: $$C_{ij} = eta * \exp(-phi^2 * ||x_i - x_j||^2) $$

y is a realization from process f with normally-distributed i.i.d. process noise, $$noise \sim \mathcal(N)(0, v_e) $$ such that the covariance of observations y_i and y_j is $$K_{ij} = C_{ij} + v_e * \delta_{ij}$$ where \(\delta_{ij}\) is the kronecker delta (i.e. it is 1 if \(i = j\) and 0 otherwise)

From the model definition, the variance in y, after marginalizing over f, is given by eta + v_e. Thus to simplify specification of priors for the hyperparameters eta and v_e, the outputs y are normalized to zero mean and unit variance prior to fitting. This allows us to set (0, 1) bounds on eta and v_e which facilitates parameter estimation. We set Beta(2, 2) priors for both eta and v_e to partition prior uncertainty equally across structural and process uncertainty.

For a scalar input, the length-scale parameter phi controls the expected number of zero crossings on the unit interval as $$E(crossings) = \frac{\sqrt(2)}{\pi} \phi \approx 0.45 \phi $$

Thus to facilitate interpretation and prior specification, the distances in C are scaled by the max distance so that a model with \(\phi = 2\) would have roughly one zero crossing over the range of the data. We assign phi a half-Normal prior with variance \(\pi / 2\) so that the prior mean phi is 1, which tends to avoid overfitting. To fit the GP we estimate eta, v_e, and phi by maximizing the posterior after marginalizing over f(x). This is given by the multivariate normal likelihood $$logL = -1/2 \log{|K_d|}^{-1/2} y_d^T [K_d]^{-1} y_d $$ where \(K_d\) is the matrix obtained by evaluating the covariance function at all pairs of inputs and \(y_d\) is the column vector of outputs. Predictions for new values of x are obtained by setting eta, v_e, and phi to the Maximum a Posteriori (MAP) estimates and using the GP conditional on the observed data. Specifically, given \(x_d\) and \(y_d\), the mean and variance for y evaluated at a new value of x are $$E(y) = C(x, x_d) [K_d]^{-1} y_d $$ $$V(y) = eta + v_e - C(x, x_d)[K_d]^{-1} C(x_d, x) $$ where the vector \(C(x, x_d)\) is obtained by evaluating C at x and each of the observed inputs while holding eta, phi, and v_e at the MAP estimates.

Examples

data("two_species_model") block <- two_species_model[1:200,] block_gp(block, columns = c("x", "y"), first_column_time = TRUE)
#> embedding tp phi v_e eta fit_params num_pred rho #> 1 1, 2 1 0.5302931 -170.4109 6.904724 TRUE 199 0.9999775 #> mae rmse perc p_val #> 1 0.001004117 0.001284706 1 0