# Simulation of cluster randomized trial with spillover

`simulateCRT.Rd`

`simulateCRT`

generates simulated data for a cluster randomized trial (CRT) with geographic spillover between arms.

## Usage

```
simulateCRT(
trial = NULL,
effect = 0,
outcome0 = NULL,
generateBaseline = TRUE,
matchedPair = TRUE,
scale = "proportion",
baselineNumerator = "base_num",
baselineDenominator = "base_denom",
denominator = NULL,
ICC_inp = NULL,
kernels = 200,
sd = NULL,
theta_inp = NULL,
tol = 0.005
)
```

## Arguments

- trial
an object of class

`"CRTsp"`

or a data frame containing locations in (x,y) coordinates, cluster assignments (factor`cluster`

), and arm assignments (factor`arm`

). Each location may also be assigned a`propensity`

(see details).- effect
numeric. The simulated effect size (defaults to 0)

- outcome0
numeric. The anticipated value of the outcome in the absence of intervention

- generateBaseline
logical. If

`TRUE`

then baseline data and the`propensity`

will be simulated- matchedPair
logical. If

`TRUE`

then the function tries to carry out randomization using pair-matching on the baseline data (see details)- scale
measurement scale of the outcome. Options are: 'proportion' (the default); 'count'; 'continuous'.

- baselineNumerator
optional name of numerator variable for pre-existing baseline data

- baselineDenominator
optional name of denominator variable for pre-existing baseline data

- denominator
optional name of denominator variable for the outcome

- ICC_inp
numeric. Target intra cluster correlation, provided as input when baseline data are to be simulated

- kernels
number of kernels used to generate a de novo

`propensity`

- sd
numeric. standard deviation of the normal kernel measuring spatial smoothing leading to spillover

- theta_inp
numeric. input spillover interval

- tol
numeric. tolerance of output ICC

## Value

A list of class `"CRTsp"`

containing the following components:

`geom_full` | list: | summary statistics describing the site cluster assignments, and randomization |

`design` | list: | values of input parameters to the design |

`trial` | data frame: | rows correspond to geolocated points, as follows: |

`x` | numeric vector: x-coordinates of locations | |

`y` | numeric vector: y-coordinates of locations | |

`cluster` | factor: assignments to cluster of each location | |

`arm` | factor: assignments to `control` or `intervention` for each location | |

`nearestDiscord` | numeric vector: signed Euclidean distance to nearest discordant location (km) | |

`propensity` | numeric vector: propensity for each location | |

`base_denom` | numeric vector: denominator for baseline | |

`base_num` | numeric vector: numerator for baseline | |

`denom` | numeric vector: denominator for the outcome | |

`num` | numeric vector: numerator for the outcome | |

`...` | other objects included in the input `"CRTsp"` object
or `data.frame` |

## Details

Synthetic data are generated by sampling around the values of
variable `propensity`

, which is a numerical vector
(taking positive values) of length equal to the number of locations.
There are three ways in which `propensity`

can arise:

`propensity`

can be provided as part of the input`trial`

object.Baseline numerators and denominators (values of

`baselineNumerator`

and`baselineDenominator`

may be provided.`propensity`

is then generated as the numerator:denominator ratio for each location in the input objectOtherwise

`propensity`

is generated using a 2D Normal kernel density. The`OOR::StoSOO`

is used to achieve an intra-cluster correlation coefficient (ICC) that approximates the value of`'ICC_inp'`

by searching for an appropriate value of the kernel bandwidth.

`num[i]`

, the synthetic outcome for location `i`

is simulated with expectation:

$$E(num[i]) = outcome0[i] * propensity[i] * denom[i] * (1 - effect*I[i])/mean(outcome0[] * propensity[])$$

The sampling distribution of `num[i]`

depends on the value of `scale`

as follows:

`scale`

=’continuous’: Values of`num`

are sampled from a Normal distributions with means`E(num[i])`

and variance determined by the fitting to`ICC_inp`

.`scale`

=’count’: Simulated events are allocated to locations via multivariate hypergeometric distributions parameterised with`E(num[i])`

.`scale`

=’proportion’: Simulated events are allocated to locations via multinomial distributions parameterised with`E(num[i])`

.

`denominator`

may specify a vector of numeric (non-zero) values
in the input `"CRTsp"`

or `data.frame`

which is returned
as variable `denom`

. It acts as a scale-factor for continuous outcomes, rate-multiplier
for counts, or denominator for proportions. For discrete data all values of `denom`

must be > 0.5 and are rounded to the nearest integer in calculations of `num`

.

By default, `denom`

is generated as a vector of ones, leading to simulation of
dichotomous outcomes if `scale`

=’proportion’.

If baseline numerators and denominators are provided then the output vectors
`base_denom`

and `base_num`

are set to the input values. If baseline numerators and denominators
are not provided then the synthetic baseline data are generated by sampling around `propensity`

in the same
way as the outcome data, but with the effect size set to zero.

If `matchedPair`

is `TRUE`

then pair-matching on the baseline data will be used in randomization providing
there are an even number of clusters. If there are an odd number of clusters then matched pairs are not generated and
an unmatched randomization is output.

Either `sd`

or `theta_inp`

must be provided. If both are provided then
the value of `sd`

is overwritten
by the standard deviation implicit in the value of `theta_inp`

.
Spillover is simulated as arising from a diffusion-like process.

For further details see Multerer (2021)

## Examples

```
{smalltrial <- readdata('smalltrial.csv')
simulation <- simulateCRT(smalltrial,
effect = 0.25,
ICC_inp = 0.05,
outcome0 = 0.5,
matchedPair = FALSE,
scale = 'proportion',
sd = 0.6,
tol = 0.05)
summary(simulation)
}
#>
#> ===================== SIMULATION OF CLUSTER RANDOMISED TRIAL =================
#> Estimating the smoothing required to achieve the target ICC of 0.05
#>
bandwidth: 28.0316248945261 ICC = 0.0514723520788207 loss = 2.16782064400771e-06
#> ===============================CLUSTER RANDOMISED TRIAL ===========================
#>
#> Summary of coordinates
#> ----------------------
#> Min. : 1st Qu.: Median : Mean : 3rd Qu.: Max. :
#> x -0.70 -0.23 -0.00 0.01 0.30 0.58
#> y -0.77 -0.22 -0.00 0.05 0.24 1.55
#> nearestDiscord -0.34 -0.21 -0.06 -0.00 0.13 1.06
#>
#> Total area (within 0.2 km of a location) : 2.06 sq.km
#> Total area (convex hull) : 1.32 sq.km
#>
#> Locations and Clusters
#> ---------------------- -
#> Coordinate system (x, y)
#> row 3
#> Locations: 208
#> Available clusters (across both arms) 18
#> Per cluster mean number of points 11.6
#> Per cluster s.d. number of points 4.9
#> Cluster randomization: Independently randomized
#> row 10
#> row 11
#> row 12
#> row 13
#> row 14
#> row 15
#> row 16
#> No power calculations to report -
#> row 18
#> row 19
#> row 20
#> row 21
#> row 22
#>
#> Other variables in dataset
#> -------------------------- denom propensity num base_denom base_num
```