Use Case 05: Analysis of trials (including methods for analysing spillover)
Usecase5.Rmd
The CRTanalysis()
function is a wrapper for different statistical analysis packages that
can be used to analyse either simulated or real trial datasets. It is
designed for use in simulation studies of different analytical methods
for spatial CRTs by automating the data processing and selecting some
appropriate analysis options. It does not replace conventional use of
these packages. Real field trials very often entail complications that
are not catered for any of the analysis options in
CRTanalysis()
and it does not aspire to carry out the full
analytical workflow for a trial. It can be used as part of a wider
workflow. In particular the usual object output by the statistical
analysis package constitutes the model_object
element
within the CRTanalysis
object generated by
CRTanalysis()
. This can be accessed by the usual methods
(e.g predict()
, summary()
,
plot()
) which may be needed for diagnosing errors,
assessing goodness of fit, and for identifying needs for additional
analyses.
Statistical Methods
The options that can be specified using the method
parameter in the function call are:
-
method = "T"
summarises the outcome at the level of the cluster, and uses 2-sample t-tests to carry out statistical significance tests of the effect, and to compute confidence intervals for the effect size. The t.test function in thestats
package is used. -
method = "GEE"
uses Generalised Estimating Equations to estimate the efficacy in a model with iid random effects for the clusters. An estimate of the intracluster correlation (ICC) is also provided. This uses calls to the geepack package. -
method = "LME4"
fits linear (for continuous data) or generalized linear (for counts and proportions) mixed models with iid random effects for clusters in lme4. -
method = "MCMC"
uses Markov chain Monte Carlo simulation in package jagsUI, which calls r-JAGS. -
method = "INLA"
uses approximate Bayesian inference via the R-INLA package. This provides functionality for geostatistical analysis, which can be used for geographical mapping of model outputs (as illustrated in . INLA spatial analysis requires a prediction mesh. This can be generated usingCRTspat::new_mesh()
. This can be computationally expensive, so it is recommended to compute the mesh just once for each dataset.
All these analysis methods can be used to carry out a simple
comparision of outcomes between trial arms. Each offers different
additional functionality, and has its own limitations (see Table 5.1).
Some of these limitations are specific to the options offered within
CRTanalysis()
, which does not embrace the full range of
options of the packages that are ‘wrapped’. These are specified using
the method
argument of the function.
Table 5.1. Available statistical methods
method |
Package | What the CRTanalysis() implementation offers |
Limitations (as implemented) |
---|---|---|---|
T |
t.test | P-values and confidence intervals for efficacy based on comparison of cluster means | No analysis of spillover or degree of clustering |
GEE |
geepack | Interval estimates for efficacy and Intra-cluster correlations | No analysis of spillover or degree of clustering |
LME4 |
lme4 | Analysis of spillover | No geostatistical analysis |
INLA |
INLA | Analysis of spillover, geostatistical analysis and spatially structured outputs | Computationally intensive |
MCMC |
jagsUI | Interval estimates for spillover parameters | Identifiability issues and slow convergence are possible |
For the analysis of proportions, the outcome in the control arm is estimated as: \(\hat{p}_{C} = \frac{1}{1 + exp(-\beta_1)}\), in the intervention arm as \(\hat{p}_{I} = \frac{1}{1 + exp(-\beta_1-\beta_2)}\), and the efficacy is estimated as \(\tilde{E}_{s} = 1- \frac{\tilde{p}_{I}}{\tilde{p}_{C}}\) where \(\beta_1\) is the intercept term and \(\beta_2\) the incremental effect associated with the intervention.
summary("<analysis>"")
is used to view the key
results of the trial. To display the output from the statistical
procedure that is called, try <analysis>$model_object
or summary("<analysis>$model_object")
.
library(CRTspat)
example <- readdata("exampleCRT.txt")
analysisT <- CRTanalysis(example, method = "T")
summary(analysisT)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: T
## Link function: logit
## Model formula: arm + (1 | cluster)
## No modelling of spillover
## Estimates: Control: 0.364 (95% CL: 0.286 0.451 )
## Intervention: 0.21 (95% CL: 0.147 0.292 )
## Efficacy: 0.423 (95% CL: 0.208 0.727 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
##
## P-value (2-sided): 0.006879064
analysisT$model_object
##
## Two Sample t-test
##
## data: lp by arm
## t = 2.9818, df = 22, p-value = 0.006879
## alternative hypothesis: true difference in means between group control and group intervention is not equal to 0
## 95 percent confidence interval:
## 0.2332638 1.2989425
## sample estimates:
## mean in group control mean in group intervention
## -0.5561662 -1.3222694
Assessing model fit
The model = "LME4"
option outputs the deviance of the
model and the Akaike information criterion (AIC), which can be used to
select the best fitting model. The deviance information criterion (DIC)
and Bayesian information criterion (BIC) perform the same role for the
Bayesian methods ("INLA"
, and "MCMC"
). The
comparison of results with cfunc = "X"
and
cfunc = "Z"
is used to assess whether the intervention
effect is likely to be due to chance. With method = "T"
,
cfunc = "X"
provides a significance test of the
intervention effect directly. The models with spillover (see below) can
be compared by that with cfunc = "X"
to evaluate whether
spillover has led to an important bias.
Spillover
CRTanalysis()
provides options for analysing spillover
effects either as function of a Euclidean distance or as a function of a
surround measure:
Models that do not consider spillover
Models that do not consider spillover can be fitted using options
Z
and X
. These are included both to allow
conventional analyses (see above), and also to enable model selection
using and likelihood ratio tests, the Akaike information criterion
(AIC), deviance information criterion (DIC) or Bayesian information
criterion (BIC) .
Spillover as a function of distance
These methods require a measure of distance from the boundary between
the trial arms, with locations in the control arm assigned negative
values, and those in the intervention arm assigned positive values. The
functional forms for this relationship is specified by the value of
cfunc
(Table 5.2).
Table 5.2. Available spillover functions
cfunc |
Description | Formula for \(P\left( d \right)\) | Compatible method (s) |
---|---|---|---|
Z |
No intervention effect | \(P\left( d \right) = \ 0\ \) |
GEE LME4 INLA
MCMC
|
X |
Simple intervention effect | \(\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ 0 \\ P\left( d \right) = \ 1\ for\ d\ > \ 0 \\ \end{matrix}\ \) |
T GEE LME4 INLA
MCMC
|
L |
inverse logistic (sigmoid) | \(P\left( d \right) = \ \frac{1}{\left( 1\ + \ exp\left( - d/S \right) \right)}\) |
LME4 INLA MCMC
|
P |
inverse probit (error function) | \(P\left( d \right) = 1\ +\ erf\left(\frac{d}{S\sqrt2}\right)\) |
LME4 INLA MCMC
|
S |
piecewise linear | \(\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ - S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ P\left( d \right) = \ \left(S/2\ + \ d \right)/S\ for\ - S/2 < d\ < \ S/2\\ P\left( d \right) = \ 1\ for\ d\ > \ S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \end{matrix}\ \) |
LME4 INLA MCMC
|
R |
rescaled linear | \(P\left( d \right) =\frac{d\ -\ min(d)}{max(d)\ -\ min(d)}\) |
LME4 INLA MCMC
|
cfunc
options P
, L
and
S
lead to non-linear models in which the spillover scale
parameter (S
) must be estimated. This is done by selecting
scale_par
using a one-dimensional optimisation of the
goodness of fit of the model in stats::optimize()
.
The different values for cfunc
lead to the fitted curves
shown in Figure 5.1. The light blue shaded part of the plot corresponds
to the spillover interval in those cases where this is estimated.
analysisLME4_Z <- CRTanalysis(example, method = "LME4", cfunc = "Z")
summary(analysisLME4_Z)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Model formula: (1 | cluster)
## No comparison of arms
## Estimates: Control: 0.285 (95% CL: NA )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1387.609
## AIC : 1391.609
analysisLME4_X <- CRTanalysis(example, method = "LME4", cfunc = "X")
summary(analysisLME4_X)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Model formula: arm + (1 | cluster)
## No modelling of spillover
## Estimates: Control: 0.366 (95% CL: 0.292 0.449 )
## Intervention: 0.216 (95% CL: 0.162 0.281 )
## Efficacy: 0.41 (95% CL: 0.165 0.584 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1379.898
## AIC : 1385.898
analysisLME4_P <- CRTanalysis(example, method = "LME4", cfunc = "P")
summary(analysisLME4_P)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.45
## Model formula: pvar + (1 | cluster)
## Error function model for spillover
## Estimates: Control: 0.418 (95% CL: 0.331 0.509 )
## Intervention: 0.186 (95% CL: 0.136 0.25 )
## Efficacy: 0.553 (95% CL: 0.327 0.703 )
## Spillover interval(km): 4.22 (95% CL: 4.2 4.23 )
## % locations contaminated: 91.6 (95% CL: 90.6 92 %)
## Total effect : 0.23 (95% CL: 0.114 0.344 )
## Ipsilateral Spillover : 0.0233 (95% CL: 0.0127 0.0323 )
## Contralateral Spillover : 0.0417 (95% CL: 0.0192 0.0651 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.215
## AIC : 1382.215 including penalty for the spillover scale parameter
analysisLME4_L <- CRTanalysis(example, method = "LME4", cfunc = "L")
summary(analysisLME4_L)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.249
## Model formula: pvar + (1 | cluster)
## Sigmoid (logistic) function for spillover
## Estimates: Control: 0.417 (95% CL: 0.332 0.51 )
## Intervention: 0.186 (95% CL: 0.136 0.249 )
## Efficacy: 0.552 (95% CL: 0.329 0.7 )
## Spillover interval(km): 4.26 (95% CL: 4.24 4.28 )
## % locations contaminated: 92.7 (95% CL: 92.2 93.1 %)
## Total effect : 0.229 (95% CL: 0.115 0.342 )
## Ipsilateral Spillover : 0.0219 (95% CL: 0.0121 0.0304 )
## Contralateral Spillover : 0.0388 (95% CL: 0.0183 0.0604 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.201
## AIC : 1382.201 including penalty for the spillover scale parameter
analysisLME4_S <- CRTanalysis(example, method = "LME4", cfunc = "S")
summary(analysisLME4_S)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 1.674
## Model formula: pvar + (1 | cluster)
## Piecewise linear function for spillover
## Estimates: Control: 0.423 (95% CL: 0.334 0.516 )
## Intervention: 0.185 (95% CL: 0.135 0.247 )
## Efficacy: 0.561 (95% CL: 0.341 0.711 )
## Spillover interval(km): 4.1 (95% CL: 4.1 4.11 )
## % locations contaminated: 86.6 (95% CL: 86.6 87.1 %)
## Total effect : 0.237 (95% CL: 0.12 0.356 )
## Ipsilateral Spillover : 0.029 (95% CL: 0.016 0.0403 )
## Contralateral Spillover : 0.0522 (95% CL: 0.0248 0.0818 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.094
## AIC : 1382.094 including penalty for the spillover scale parameter
analysisLME4_R <- CRTanalysis(example, method = "LME4", cfunc = "R")
summary(analysisLME4_R)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.584 (95% CL: 0.381 0.758 )
## Intervention: 0.116 (95% CL: 0.0587 0.216 )
## Efficacy: 0.801 (95% CL: 0.465 0.92 )
## Spillover interval(km): 6.64 (95% CL: 6.61 6.65 )
## % locations contaminated: 99.8 (95% CL: 99.8 99.8 %)
## Total effect : 0.468 (95% CL: 0.181 0.694 )
## Ipsilateral Spillover : 0.117 (95% CL: 0.0564 0.157 )
## Contralateral Spillover : 0.238 (95% CL: 0.0831 0.368 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1378.711
## AIC : 1384.711
p0 <- plotCRT(analysisLME4_Z, map = FALSE)
p1 <- plotCRT(analysisLME4_X, map = FALSE)
p2 <- plotCRT(analysisLME4_P, map = FALSE)
p3 <- plotCRT(analysisLME4_L, map = FALSE)
p4 <- plotCRT(analysisLME4_S, map = FALSE)
p5 <- plotCRT(analysisLME4_R, map = FALSE)
library(cowplot)
plot_grid(p0, p1, p2, p3, p4, p5, labels = c('Z', 'X', 'P', 'L', 'S', 'R'), label_size = 10, ncol = 2)
Fig 5.1 Fitted curves for the
example dataset with different options for cfunc
The piecewise linear spillover function, cfunc = "S"
, is
only linear on the scale of the linear predictor. When used in a
logistic model, as here, the transformation via the inverse of the link
function leads to a slightly curved plot (Figure 5.1S). The rescaled
linear function, cfunc = "R"
, is provided as a comparator
and for use with distance
values other than
distance = "nearestDiscord"
see below (it should not be
used to estimate the spillover interval).
The full set of different cfunc
options are available
for each of model options "LME4"
, "INLA"
, and
"MCMC"
. The performance of all these different models has
not yet been thoroughly investigated. The analyses of Multerer
et al. (2021b) found that that a model equivalent to
method = "MCMC"
, cfunc = "L"
gave estimates of
efficacy with low bias, even in simulations with considerable
spillover.
Spillover as a function of surround
Spillover can also be analysed by assuming the effect size to be a
function of the number of intervened locations in the surroundings of
the location Anaya-Izquierdo
& Alexander(2021). Several different surround functions are
available. These are specified by the distance
parameter
(Table 5.3).
Table 5.3. Available surround functions
distance |
Description | Details |
---|---|---|
nearestDiscord |
Distance to nearest discordant location | The default. This is used for analyses by distance (see above) |
hdep |
Tukey half-depth | Algorithm of Rousseeuw & Ruts(1996) |
sdep |
Simplicial depth | Algorithm of Rousseeuw & Ruts(1996) |
disc |
disc | The number of intervened locations within the specified radius (excluding the location itself) as described by Anaya-Izquierdo & Alexander(2021) |
kern |
Sum of kernels | The sum of normal kernels |
The compute_distance()
function is provided to compute these quantities, so that they can be
described, compared, and analysed independently of
CRTanalysis()
. Note that the values of the surround
calculated by compute_distance()
are scaled to avoid
correlation with the spatial density of the points (see documentation) and so are
not equivalent to the quantities reported in the original
publications.
Users can also devise other measures of surround or distance, add
them to a trial
data frame and specify them using
distance
. CRTanalysis()
computes the minimum
value for the specified field
examples <- compute_distance(example, distance = "hdep")
ps1 <- plotCRT(examples, distance = "hdep", legend.position = c(0.6, 0.8))
ps2 <- plotCRT(examples, distance = "sdep")
examples <- compute_distance(examples, distance = "disc", scale_par = 0.5)
ps3 <- plotCRT(examples, distance = "disc")
examples <- compute_distance(examples, distance = "kern", scale_par = 0.5)
ps4 <- plotCRT(examples, distance = "kern")
plot_grid(ps1, ps2, ps3, ps4, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)
Fig 5.2 Stacked bar plots for
different surrounds
If distance
is assigned a value of either
hdep
, sdep
, then cfunc = "R"
is
used by default and the overall effect size is computed by comparing the
fitted values of the model for a surround value of zero with that of the
maximum of the surround in the data. If distance = "disc"
or distance = "kern"
and scale_par
is assigned
a value, then cfunc = "R"
is also used. If
cfunc = "E"
is specified then an escape function is fitted
with the scale parameter estimated in the same way as in the scale
parameter in other models (see above Table 5.2).
examples_hdep <- CRTanalysis(examples, method = "LME4", distance = "hdep", cfunc = 'R')
summary(examples_hdep)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Tukey half-depth
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.381 (95% CL: 0.292 0.478 )
## Intervention: 0.209 (95% CL: 0.15 0.282 )
## Efficacy: 0.452 (95% CL: 0.167 0.639 )
## Spillover interval(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 55 (95% CL: 55 55 %)
## Total effect : 0.172 (95% CL: 0.0524 0.292 )
## Ipsilateral Spillover : 0.0313 (95% CL: 0.01 0.0512 )
## Contralateral Spillover : 0.0444 (95% CL: 0.0128 0.0785 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1379.89
## AIC : 1385.89
ps4 <- plotCRT(examples_hdep,legend.position = c(0.8, 0.8))
examples_sdep <- CRTanalysis(examples, method = "LME4", distance = "sdep", cfunc = 'R')
summary(examples_sdep)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Simplicial depth
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.393 (95% CL: 0.307 0.485 )
## Intervention: 0.199 (95% CL: 0.145 0.268 )
## Efficacy: 0.493 (95% CL: 0.243 0.66 )
## Spillover interval(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 52.4 (95% CL: 52.2 52.4 %)
## Total effect : 0.193 (95% CL: 0.0802 0.306 )
## Ipsilateral Spillover : 0.0299 (95% CL: 0.013 0.0456 )
## Contralateral Spillover : 0.0431 (95% CL: 0.0169 0.0704 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1376.417
## AIC : 1382.417
ps5 <- plotCRT(examples_sdep)
examples_disc <- CRTanalysis(examples, method = "LME4", distance = "disc", cfunc = 'R', scale_par = 0.15)
summary(examples_disc)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: disc of radius 0.15 km
## Precalculated scale parameter: 0.15
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.387 (95% CL: 0.312 0.467 )
## Intervention: 0.2 (95% CL: 0.149 0.26 )
## Efficacy: 0.482 (95% CL: 0.273 0.634 )
## Spillover interval(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 8.89 (95% CL: 8.89 8.89 %)
## Total effect : 0.186 (95% CL: 0.0912 0.282 )
## Ipsilateral Spillover : 0.00458 (95% CL: 0.00239 0.00656 )
## Contralateral Spillover : 0.00576 (95% CL: 0.00271 0.00905 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.274
## AIC : 1380.274
ps6 <- plotCRT(examples_disc)
examples_kern <- CRTanalysis(examples, method = "LME4", distance = "kern", cfunc = 'R', scale_par = 0.15)
summary(examples_kern)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: kern with kernel s.d. 0.15 km
## Precalculated scale parameter: 0.15
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.406 (95% CL: 0.327 0.491 )
## Intervention: 0.185 (95% CL: 0.136 0.245 )
## Efficacy: 0.542 (95% CL: 0.349 0.684 )
## Spillover interval(km): 0.979 (95% CL: 0.977 0.98 )
## % locations contaminated: 50.8 (95% CL: 50.6 50.9 %)
## Total effect : 0.22 (95% CL: 0.122 0.32 )
## Ipsilateral Spillover : 0.011 (95% CL: 0.00661 0.0152 )
## Contralateral Spillover : 0.0134 (95% CL: 0.00707 0.0203 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1369.677
## AIC : 1375.677
ps7 <- plotCRT(examples_kern)
plot_grid(ps4, ps5, ps6, ps7, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)
Fig 5.3 Fitted curves for the
example dataset with different surrounds
Geostatistical models and mapping results
To carry out a geostatistical analysis with
method = "INLA"
a prediction mesh is needed. By default a
very low resolution mesh is created (creating a high resolution mesh is
computationally expensive). To create a 100m INLA mesh for
<MyTrial>
, use:
mesh <- new_mesh(trial = <MyTrial> , pixel = 0.1)