Skip to contents

simulateCRT generates simulated data for a cluster randomized trial (CRT) with geographic spillover between arms.

Usage

simulateCRT(
  trial = NULL,
  effect = 0,
  outcome0 = NULL,
  generateBaseline = TRUE,
  matchedPair = TRUE,
  scale = "proportion",
  baselineNumerator = "base_num",
  baselineDenominator = "base_denom",
  denominator = NULL,
  ICC_inp = NULL,
  kernels = 200,
  sigma_m = NULL,
  spillover_interval = NULL,
  tol = 0.005
)

Arguments

trial

an object of class "CRTsp" or a data frame containing locations in (x,y) coordinates, cluster assignments (factor cluster), and arm assignments (factor arm). Each location may also be assigned a propensity (see details).

effect

numeric. The simulated effect size (defaults to 0)

outcome0

numeric. The anticipated value of the outcome in the absence of intervention

generateBaseline

logical. If TRUE then baseline data and the propensity will be simulated

matchedPair

logical. If TRUE then the function tries to carry out randomization using pair-matching on the baseline data (see details)

scale

measurement scale of the outcome. Options are: 'proportion' (the default); 'count'; 'continuous'.

baselineNumerator

optional name of numerator variable for pre-existing baseline data

baselineDenominator

optional name of denominator variable for pre-existing baseline data

denominator

optional name of denominator variable for the outcome

ICC_inp

numeric. Target intra cluster correlation, provided as input when baseline data are to be simulated

kernels

number of kernels used to generate a de novo propensity

sigma_m

numeric. standard deviation of the normal kernel measuring spatial smoothing leading to spillover

spillover_interval

numeric. input spillover interval

tol

numeric. tolerance of output ICC

Value

A list of class "CRTsp" containing the following components:

geom_fulllist:summary statistics describing the site cluster assignments, and randomization
designlist:values of input parameters to the design
trialdata frame:rows correspond to geolocated points, as follows:
xnumeric vector: x-coordinates of locations
ynumeric vector: y-coordinates of locations
clusterfactor: assignments to cluster of each location
armfactor: assignments to control or intervention for each location
nearestDiscordnumeric vector: signed Euclidean distance to nearest discordant location (km)
propensitynumeric vector: propensity for each location
base_denomnumeric vector: denominator for baseline
base_numnumeric vector: numerator for baseline
denomnumeric vector: denominator for the outcome
numnumeric vector: numerator for the outcome
...other objects included in the input "CRTsp" object or data.frame

Details

Synthetic data are generated by sampling around the values of variable propensity, which is a numerical vector (taking positive values) of length equal to the number of locations. There are three ways in which propensity can arise:

  1. propensity can be provided as part of the input trial object.

  2. Baseline numerators and denominators (values of baselineNumerator and baselineDenominator may be provided. propensity is then generated as the numerator:denominator ratio for each location in the input object

  3. Otherwise propensity is generated using a 2D Normal kernel density. The OOR::StoSOO is used to achieve an intra-cluster correlation coefficient (ICC) that approximates the value of 'ICC_inp' by searching for an appropriate value of the kernel bandwidth.

num[i], the synthetic outcome for location i is simulated with expectation:
$$E(num[i]) = outcome0[i] * propensity[i] * denom[i] * (1 - effect*I[i])/mean(outcome0[] * propensity[])$$
The sampling distribution of num[i] depends on the value of scale as follows:

  • scale=’continuous’: Values of num are sampled from a Normal distributions with means E(num[i]) and variance determined by the fitting to ICC_inp.

  • scale=’count’: Simulated events are allocated to locations via multivariate hypergeometric distributions parameterised with E(num[i]).

  • scale=’proportion’: Simulated events are allocated to locations via multinomial distributions parameterised with E(num[i]).

denominator may specify a vector of numeric (non-zero) values in the input "CRTsp" or data.frame which is returned as variable denom. It acts as a scale-factor for continuous outcomes, rate-multiplier for counts, or denominator for proportions. For discrete data all values of denom must be > 0.5 and are rounded to the nearest integer in calculations of num.

By default, denom is generated as a vector of ones, leading to simulation of dichotomous outcomes if scale=’proportion’.

If baseline numerators and denominators are provided then the output vectors base_denom and base_num are set to the input values. If baseline numerators and denominators are not provided then the synthetic baseline data are generated by sampling around propensity in the same way as the outcome data, but with the effect size set to zero.

If matchedPair is TRUE then pair-matching on the baseline data will be used in randomization providing there are an even number of clusters. If there are an odd number of clusters then matched pairs are not generated and an unmatched randomization is output.

Either sigma_m or spillover_interval must be provided. If both are provided then the value of sigma_m is overwritten by the standard deviation implicit in the value of spillover_interval. Spillover is simulated as arising from a diffusion-like process.

For further details see Multerer (2021)

Examples

{smalltrial <- readdata('smalltrial.csv')
 simulation <- simulateCRT(smalltrial,
  effect = 0.25,
  ICC_inp = 0.05,
  outcome0 = 0.5,
  matchedPair = FALSE,
  scale = 'proportion',
  sigma_m = 0.6,
  tol = 0.05)
 summary(simulation)
 }
#> 
#> =====================    SIMULATION OF CLUSTER RANDOMISED TRIAL    =================
#> *** computed distance to nearest measurements in discordant arm ***
#> Estimating the smoothing required to achieve the target ICC of 0.05
#> 
bandwidth: 40.5973869501909  ICC = 0.0539894078572401 loss = 1.59153750514092e-05 

#> ===============================CLUSTER RANDOMISED TRIAL ===========================
#> 
#> Summary of coordinates
#> ----------------------
#>                Min.   : 1st Qu.: Median : Mean   : 3rd Qu.: Max.   :
#>       x        -0.70    -0.23    -0.00     0.01     0.30     0.58   
#>       y        -0.77    -0.22    -0.00     0.05     0.24     1.55   
#> nearestDiscord -0.34    -0.21    -0.06    -0.00     0.13     1.06   
#> 
#> Total area (within  0.2 km of a location) :  2.06 sq.km
#> Total area (convex hull) :  1.32 sq.km
#> 
#> Locations and Clusters
#> ----------------------                                          -                        
#> Coordinate system                      (x, y)                        
#> Locations:                                                      208                        
#> Available clusters (across both arms)                           18                        
#>   Per cluster mean number of points                             11.6                        
#>   Per cluster s.d. number of points                             4.9                        
#> Cluster randomization:                      Independently randomized                        
#> No power calculations to report          -                        
#> 
#> Other variables in dataset
#> --------------------------          denom  propensity  num  base_denom  base_num