Introduction

This document discusses parallelization options for this package. Because the full analysis can involve many independent calculations (esp. at the step of identifying the pairwise interactions), taking advantage of the ability to run code in parallel can speed things up.

To facilitate different computer setups, we use the future package, which enables the user to define the parallelization setup.

library(portalDS)
library(drake)

options(drake_make_menu = FALSE, 
        drake_clean_recovery_msg = FALSE)

Setup

Setup the data and compute simplex results.

data("block_3sp", package = "rEDM")

block <- setNames(block_3sp[, c("time", "x_t", "y_t", "z_t")],
                  c("time", "x", "y", "z"))

simplex_results <- compute_simplex(block = block,
                                   E_list = 3:5,
                                   surrogate_method = "random_shuffle",
                                   num_surr = 20, 
                                   id_var = "time")

Calculations

  1. No specific plan for parallelization; calculations will be done sequentially.
  1. Same calculations as previous, but using asynchronous processes via future.callr.
  1. Same calculations as previous, but setup within a Drake plan, and still allowing parallelization to occur within a single target.

Validation of results

Check that the calculations are the same. (Note that the call to compute_ccm is setup to do random subsampling, but we supply a fixed seed so that the random subsamples are selected identically across runs and for each pair of variables.)

identical(ccm_results, ccm_results_parallel)
#> [1] TRUE
ccm_results_drake <- readd(ccm_results_drake)
identical(ccm_results, ccm_results_drake)
#> [1] TRUE