Skip to contents

The Intracluster Correlation Coefficient (ICC) is one of the inputs to standard power and sample size calculations for CRTs. Trialists often have difficulty identifying an appropriate source for their ICC calculations, or use a value from a source of questionable relevance.

The CRTanalysis function has an option to use Generalised Estimating Equations, which provide an estimate of the ICC. This can be applied to baseline data, and hence to different cluster configurations. This makes it possible to estimate the ICC which is appropriate for any given cluster definition, in the chosen geography, assuming baseline data are available.

# use the same dataset as for Use Case 1.
library(CRTspat)
example_locations <- readdata('example_site.csv')
example_locations$base_denom <- 1
library(dplyr)
example <- CRTsp(example_locations) %>%
    aggregateCRT(auxiliaries = c("RDT_test_result", "base_denom"))
summary(example)
## ===============================CLUSTER RANDOMISED TRIAL ===========================
## 
## Summary of coordinates
## ----------------------
##         Min.   : 1st Qu.: Median : Mean   : 3rd Qu.: Max.   :
##       x -3.20    -1.40    -0.30    -0.07     1.26     5.16   
##       y -5.08    -2.84     0.19     0.05     2.49     6.16   
## 
## Total area (within  0.2 km of a location) :  27.6 sq.km
## Total area (convex hull) :  48.2 sq.km
## 
## Locations and Clusters
## ----------------------                                          -                        
## Coordinate system                      (x, y)                        
## row 3                                    
## Locations:                                                      1181                        
## Available clusters (across both arms)                           Not assigned                        
## row 6                                    
## row 7                                    
## row 8                                    
## No randomization          -                        
## row 10                                    
## row 11                                    
## row 12                                    
## row 13                                    
## row 14                                    
## row 15                                    
## row 16                                    
## No power calculations to report          -                        
## row 18                                    
## row 19                                    
## row 20                                    
## row 21                                    
## row 22                                    
## 
## Other variables in dataset
## --------------------------          RDT_test_result  base_denom
# randomly sample an array of values of c (use a small sample size for testing
# the plots were produced with n=5000)
set.seed(5)
c_vec <- round(runif(50, min = 6, max = 150))

# a user function randomizes and analyses each simulated trial
CRTscenario3 <- function(c, CRT) {
    ex <- specify_clusters(CRT, c = c, algo = "kmeans") %>%
        randomizeCRT()
      GEEanalysis <- CRTanalysis(ex, method = "GEE", baselineOnly = TRUE, excludeBuffer = FALSE,
          baselineNumerator = "RDT_test_result", baselineDenominator = "base_denom")
    locations <- GEEanalysis$description$locations
    ICC <- GEEanalysis$pt_ests$ICC
    value <- c(c = c, ICC = ICC, mean_h = locations/c)
    return(value)
}

# The results are collected in a data frame
results <- t(sapply(c_vec, FUN = CRTscenario3, simplify = "array", CRT = example)) %>%
    data.frame()

There is a clear downward trend in the ICC estimates, as cluster size increases (Figure 3.1). The ICC expected for a trial in this, or similar, geographies can be read off the curve. Note that the ICC is expected to vary not just with cluster size, but also to vary between different outcomes.


Fig 3.1 Intracluster correlation by size of cluster