Skip to contents

specify_clusters algorithmically assigns locations to clusters by grouping them geographically

Usage

specify_clusters(
  trial = trial,
  c = NULL,
  h = NULL,
  algorithm = "NN",
  reuseTSP = FALSE,
  auxiliary = NULL
)

Arguments

trial

A CRT object or data frame containing (x,y) coordinates of households

c

integer: number of clusters in each arm

h

integer: number of locations per cluster

algorithm

algorithm for cluster boundaries, with options:

NNNearest neighbour: assigns equal numbers of locations to each cluster
kmeanskmeans clustering: aims to partition locations so that each belongs to the cluster with the nearest centroid.
TSPtravelling salesman problem heuristic: Assigns locations sequentially along a travelling salesman path.
reuseTSP

logical: indicator of whether a pre-existing path should be used by the TSP algorithm

auxiliary

"CRTsp" object containing external cluster and or arm assignments.

Value

A list of class "CRTsp" containing the following components:

geom_fulllist:summary statistics describing the site, and cluster assignments.
trialdata frame:rows correspond to geolocated points, as follows:
xnumeric vector: x-coordinates of locations
ynumeric vector: y-coordinates of locations
clusterfactor: assignments to cluster of each location
...other objects included in the input "CRTsp" object or data frame

Details

Either c or h must be specified. If both are specified the input value of c is ignored.

The reuseTSP parameter is used to allow the path to be reused for creating alternative allocations with different cluster sizes.

If an auxiliary auxiliary "CRTsp" object is specified then the other options are ignored and the cluster assignments (and arm assignments if available) are taken from the auxiliary object. The trial data frame is augmented with a column "nearestPixel" containing the distance to boundary of the nearest grid pixel in the auxiliary. If the auxiliary is a grid with design$geometry set to 'triangle', 'square' or 'hexagon' then the distance is computed to the edge of the nearest grid pixel in the discordant arm (using a circular approximation for the perimeter) rather than to the point location itself. If the point is within the pixel then the distance is given a negative sign.

Examples

#Assign clusters of average size h = 40 to a test set of co-ordinates, using the kmeans algorithm
exampletrial <- specify_clusters(trial = readdata('exampleCRT.txt'),
                            h = 40, algorithm = 'kmeans', reuseTSP = FALSE)