# Assign locations to clusters in a CRT

`specify_clusters.Rd`

`specify_clusters`

algorithmically assigns locations to clusters by grouping them geographically

## Usage

```
specify_clusters(
trial = trial,
c = NULL,
h = NULL,
algorithm = "NN",
reuseTSP = FALSE,
auxiliary = NULL
)
```

## Arguments

- trial
A CRT object or data frame containing (x,y) coordinates of households

- c
integer: number of clusters in each arm

- h
integer: number of locations per cluster

- algorithm
algorithm for cluster boundaries, with options:

`NN`

Nearest neighbour: assigns equal numbers of locations to each cluster `kmeans`

kmeans clustering: aims to partition locations so that each belongs to the cluster with the nearest centroid. `TSP`

travelling salesman problem heuristic: Assigns locations sequentially along a travelling salesman path. - reuseTSP
logical: indicator of whether a pre-existing path should be used by the TSP algorithm

- auxiliary
`"CRTsp"`

object containing external cluster and or arm assignments.

## Value

A list of class `"CRTsp"`

containing the following components:

`geom_full` | list: | summary statistics describing the site, and cluster assignments. |

`trial` | data frame: | rows correspond to geolocated points, as follows: |

`x` | numeric vector: x-coordinates of locations | |

`y` | numeric vector: y-coordinates of locations | |

`cluster` | factor: assignments to cluster of each location | |

`...` | other objects included in the input `"CRTsp"` object or data frame |

## Details

Either `c`

or `h`

must be specified. If both are specified
the input value of `c`

is ignored.

The `reuseTSP`

parameter is used to allow the path to be reused
for creating alternative allocations with different cluster sizes.

If an auxiliary `auxiliary`

`"CRTsp"`

object is specified then the other options are ignored
and the cluster assignments (and arm assignments if available) are taken from the auxiliary object.
The trial data frame is augmented with a column `"nearestPixel"`

containing the distance to boundary of the nearest
grid pixel in the auxiliary. If the auxiliary is a grid with `design$geometry`

set to `'triangle'`

,
`'square'`

or `'hexagon'`

then the distance is computed to the edge of the nearest grid pixel in the discordant arm
(using a circular approximation for the perimeter) rather than to the point location itself. If the point is within
the pixel then the distance is given a negative sign.

## Examples

```
#Assign clusters of average size h = 40 to a test set of co-ordinates, using the kmeans algorithm
exampletrial <- specify_clusters(trial = readdata('exampleCRT.txt'),
h = 40, algorithm = 'kmeans', reuseTSP = FALSE)
```