This vignette will walk you through the analyses presented in Winton et al. 2018, who describe the use of spatial point process models to estimate individual centers of activity from passive acoustic telemetry data. The vignette progresses from applying the simplest case, which assumes that detection probabilities/receiver detection ranges remain constant over time, to application of a test-tag integrated model, which incorporates detection data from one or more stationary test transmitters to estimate time-varying detection ranges. The models are fitted in a Bayesian framework using the Stan software (Carpenter et al. 2017); code was modified from that provided in Royle et al. 2013 for fitting spatial point process models to data from camera traps. We prefer the Bayesian approach for COA estimation due to its treatment of uncertainty, but realize the longer computational time required may be prohibitive for some applications. In the near future, we will update to include an option to fit in a maximum likelihood framework, which will reduce run times. We’d also like to note that the models described can support varying degrees of complexity - not all applications will require (or have the data to support) the most complex version of the model. The simpler the model, the shorter the run-time.
We realize that many users will have unique situations and may need
to modify the base code to suit their purposes. Users can copy the
.stan files contained in the src/stan_files
folder to their local machine to do so. We will be happy to accommodate
user requests (as our schedules allow). Our hope is to make this a
collaborative package that will evolve based on the needs of the
acoustic telemetry community and continue to improve over time. We have
tried to make the instructions outlined in this vignette user-friendly
since we are a group of applied biologists with varying degrees of
statistical experience. If some of the statistical notation outlined
here or in the paper remains unclear, feel free to contact us with
questions/for clarification. Most of it is actually very intuitive, and
we would like to make sure that comes through in the documentation. This
is a new package, so if you find bugs, places where code efficiency
could be improved, or instances where the documentation could be made
more user-friendly, please let us know!
The package includes two data sets:
To access the provided data, run:
library(TelemetrySpace)
library(dplyr)
library(tidyr)
library(ggplot2)
library(hexbin)
library(ggpubr)
rlocs # Receiver locations## # A tibble: 30 × 3
## Station east north
## <chr> <dbl> <dbl>
## 1 SB1 -1.33 1.61
## 2 SB2 -0.579 1.91
## 3 SB3 0.219 2.00
## 4 SB4 1.00 1.81
## 5 SB5 -1.69 0.896
## 6 SB6 -0.879 0.950
## 7 SB7 -0.231 1.20
## 8 SB8 0.464 1.21
## 9 SB9 1.28 1.04
## 10 SB10 -1.77 0.0952
## # ℹ 20 more rows
## # A tibble: 1 × 2
## east north
## <dbl> <dbl>
## 1 -0.329 0.713
## # A tibble: 6 × 5
## Station Transmitter east north hour
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 SB12 3447 -0.326 0.513 1
## 2 SB13 3447 0.125 0.575 1
## 3 SB7 3447 -0.231 1.20 1
## 4 SB6 3447 -0.879 0.950 1
## 5 SB13 3447 0.125 0.575 1
## 6 SB6 3447 -0.879 0.950 1
## # A tibble: 6 × 5
## Station Transmitter east north hour
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 SB15 3425 0.190 0.209 1
## 2 SB14 3425 -0.261 0.159 1
## 3 SB15 3425 0.190 0.209 1
## 4 SB14 3425 -0.261 0.159 1
## 5 SB12 3425 -0.326 0.513 1
## 6 SB14 3425 -0.261 0.159 1
Prior to doing anything else, we need to specify our ‘state space’ -
the spatial extent of interest. This will consist of your receiver array
and a ‘buffer’ area around the array. The buffer needs to be large
enough to allow for individuals that may have activity centers outside
of the receiver array. There is no general rule of thumb for this, but
it should scale with the geographic extent covered by your array. I
would suggest selecting a buffer that is large enough to allow you to
discriminate between individuals that are detected along the periphery
of the array and individuals that are not detected at all - a COA will
be estimated for all time steps unless you specify the individual has
left the area; time steps with 0 detections will have an estimated COA
somewhere outside the detection range of all receivers (note that this
could be within the array bounds if your array has non-continuous
coverage). If there are many time steps with zeros, you may want to
modify your data file to omit time periods with 0 detections or modify
the relevant .stan code chunk included in the
src/stan_files folder on your local machine to skip time
periods with no detections, which will speed up computing times. As
always, it is up to the user to make sure that the results make sense in
the context of the species/array of interest.
# Define state-space of point process - 'buffer' adds a fixed buffer to the outer extent of the recs
buffer <- 1
xlim <- c(min(rlocs$east - buffer), max(rlocs$east + buffer))
ylim <- c(min(rlocs$north - buffer), max(rlocs$north + buffer))Next, we need to prepare our test tag data. First, calculate the distance between the test tag’s location and each receiver. This can be done using the included ‘distf’ function.
# Set up a blank vector for storage
D <- NULL
# Loop over each hour
for(i in 1:nrow(testdat)){
D[i] <- distf(testloc[, c('east', 'north')], testdat[i, c('east', 'north')])
}
testdat$D <- unlist(D) # Assign to testdat data frame
# Plot to examine variation in detection rate over time
testdat |>
count(Station, hour, D, east, north) |>
mutate(
dist = round(D * 1000), #Round distance for plotting
label = paste(dist, "m", "(", Station, ")") #Create label by merging station name and distance
) |>
#Plot using ggplot
ggplot(aes(x = hour, y = n)) +
geom_bar(stat = "identity", fill = '#008080') +
facet_wrap(~ as.factor(label)) +
theme(text = element_text(size = 16)) +
scale_y_continuous(breaks = seq(0,30,10)) +
labs(x = "Hours since deployment",y = "Number of detections")Tally how many time intervals each receiver was operational for - this allows for individual receivers deployed for different periods and/or receivers that were lost.
# The full analysis takes several hours to run, so we'll subset out 10 hours to illustrate
fishdat <- fishdat |>
filter(hour < 11)
testdat <- testdat |>
filter(hour < 11)
# Create a copy of the receiver locations for tallying
rs <- rlocs
# Add column for each time interval to indicate whether receiver was operational or not
rs[, c(4:(max(fishdat$hour) + 3))] <- 1
# Create vector of the number of sampling occasions for each receiver
tsteps <- rowSums(rs[, 4:ncol(rs)])
tsteps # Will all be the same here because all operational over the entire time span## [1] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
## [26] 10 10 10 10 10
Since detection records only include non-zero events (here assuming we’d like to keep time steps with zeros), we’ll need to convert the number of detections to include zeros for all receivers that did not detect the tag during that time period.
## Starting with the test tag data
test_agg <- testdat |>
mutate(
rec = as.numeric(substr(Station, 3, 4)),
# Add a column that indicates each record corresponds to 1 detection
count = 1
) |>
# Aggregate the number of detections for each individual at each receiver
# in each time step
count(Transmitter, rec, east, north, hour) |>
# Create a numeric identifier for each transmitter
mutate(tag = as.numeric(as.factor(Transmitter))) |>
# Rename hour to time for consistency when plotting below
rename(time = "hour")
# Specify quantities for indexing
# number of individual tags (here just the one test tag)
ntest <- length(unique(test_agg$Transmitter))
# number of receivers
nrec <- nrow(rlocs)
# number of time steps
tsteps <- max(test_agg$time)
# This chunk saves the total number of encounters of each individual (rows) in
#each trap (cols) at each sampling occasion (array elements)
testY <- array(NA, dim=c(ntest, nrec, tsteps))
for(t in 1:tsteps){
for (i in 1:ntest){
h1 <- test_agg[test_agg$tag == i,]
for(j in 1:nrec){
testY[i, j, t] <- ifelse(
# If there are no detections at that receiver in that time period, set to 0
nrow(h1[h1$time == t & h1$rec == j,]) == 0,
0,
# otherwise set to the number of detections
h1[h1$time == t & h1$rec == j,]$n
)
}
}
}Now we’ll do the same thing for the black sea bass detection data.
## Now do the same for each tagged fish
fish_agg <- fishdat |>
mutate(
rec = as.numeric(substr(Station, 3, 4)),
# Add a column that indicates each record corresponds to 1 detection
count = 1
) |>
# Aggregate the number of detections for each individual at each receiver
# in each time step
count(Transmitter, rec, east, north, hour) |>
# Create a numeric identifier for each transmitter
mutate(tag = as.numeric(as.factor(Transmitter))) |>
# Rename hour to time for consistency when plotting below
rename(time = "hour")
# Specify quantities for indexing
# number of individual tags (here just the one test tag)
nind <- length(unique(fish_agg$Transmitter))
# This chunk saves the total number of encounters of each individual (rows) in each trap (cols) at each sampling occasion (array elements)
Y <- array(NA, dim=c(nind, nrec, tsteps))
for(t in 1:max(tsteps)){
for (i in 1:nrow(Y)){
h1 <- fish_agg[fish_agg$tag == i,]
for(j in 1:nrow(rlocs)){
# If there are no detections at that receiver in that time period, set to 0; otherwise set to the number of detections
Y[i, j, t] <- ifelse(
# If there are no detections at that receiver in that time period, set to 0
nrow(h1[h1$time == t & h1$rec == j,]) == 0,
0,
# otherwise set to the number of detections
h1[h1$time == t & h1$rec == j,]$n
)
}
}
}After all of that data processing, we’re finally ready to fit our
model! We’ll start by fitting the model assuming that detection
probabilities are the same among all receivers and remain constant over
time (As we all know, this will pretty much never be the case in
‘real-life’ but best to start simple). Besides, this model is handy when
you don’t have test-tag (or other) data available to inform detection
probabilities but would like to be able to estimate COAs along the
periphery of the array. The only other information you’ll need to
provide is the expected number of transmissions per tag per time
interval (ntrans below).
### Format data for model fitting
fit <- COA_Standard(
nind = nind, # number of individuals
nrec = nrec, # number of receivers
ntime = tsteps, # number of time steps
ntrans = 30, # number of expected transmissions per tag per time interval
y = Y, # array of detections
recX = as.vector(rlocs$east), # E-W receiver coordinates
recY = as.vector(rlocs$north), # N-S receiver coordinates
xlim = xlim, # E-W boundary of spatial extent (receiver array + buffer)
ylim = ylim # N-S boundary of spatial extent (receiver array + buffer)
)## warmup sample
## chain:1 2.298 1.985
## chain:2 1.830 2.315
## chain:3 1.749 1.337
## chain:4 1.759 2.078
The function will automatically spit out the run-time associated with each of the 4 chains used for fitting and returns a list with four objects:
## Length Class Mode
## model 1 stanfit S4
## summary 20 -none- numeric
## time 1 -none- numeric
## coas 7 data.frame list
## all_estimates 325 data.frame list
The first contains the Stan model object (accessible via
fit$Model, which will allow you to use rstan
plotting tools and diagnostic plots - see rstan documentation for
details).
The second element is a table of parameter estimates and associated quantiles from the posterior distribution. The table also includes the effective sample size and the Rhat statistic (which should be ~1).
## mean se_mean sd 2.5% 25% 50% 75%
## p0 0.2822800 0.0004478783 0.02520083 0.2361652 0.2644027 0.2813702 0.2994305
## sigma 0.2845763 0.0002817552 0.01493595 0.2554774 0.2742519 0.2845919 0.2947178
## 97.5% n_eff Rhat
## p0 0.3329404 3165.990 1.000794
## sigma 0.3141591 2810.101 1.001010
The third returns the time required to run the model (in minutes).
Note that Stan will automatically detect and use multiple cores. If the
computer used to run this has multiple cores, the time returned will be
longer than the actual run time (because it will sum the time for each
core). To return the realized run time, divide fit$Time by
the number of cores.
## [1] 0.25585
The fourth returns an array of COA estimates, where each matrix corresponds to one individual, the rows correspond to each time step, and the columns include the median posterior estimate of the east-west (x) and north-south (y) coordinates. The 95% credible interval (Bayesian version of a confidence interval) for each coordinate is also provided.
## time x y x_lower x_upper y_lower y_upper
## 1 1 -0.04419208 0.2936790 -0.1462210 0.05951826 0.18845035 0.4023171
## 2 2 -0.02330482 0.3578317 -0.1420680 0.09773196 0.21932718 0.5005321
## 3 3 -0.09875625 0.3839595 -0.2569494 0.04730306 0.19878372 0.5686053
## 4 4 -0.12329253 0.2816735 -0.2786449 0.02481079 0.10911757 0.4669253
## 5 5 -0.16831392 0.3242430 -0.3259290 -0.02445424 0.15206068 0.5067735
## 6 6 -0.13019674 0.7062093 -0.4725975 0.17890991 0.17035957 0.8814318
## 7 7 -0.05163113 0.3667484 -0.1700821 0.07165783 0.23090152 0.4937734
## 8 8 -0.15522780 0.4049252 -0.2793242 -0.03514842 0.27458678 0.5346351
## 9 9 -0.05002765 0.3329523 -0.2220192 0.11698386 0.12850061 0.5501859
## 10 10 -0.09807742 0.3133353 -0.2885934 0.07443582 0.08394051 0.5651978
The last element (fit$All_estimates) contains the
estimates from each non-warm-up iteration for all 4 chains. (If you’re
not familiar with Bayesian lingo and this just seems like statistical
jargon, all you need to know is that this is what you’ll use to plot the
uncertainty cloud around COA estimates and/or the distribution of
parameter estimates as presented in the paper.) This includes estimates
of all latent parameters for each individual in each time step (aka this
element contains tons of parameter estimates), so we will need to subset
out parameters for plotting.
# Extract east-west and north-south coordinates of COAs from each iteration of each chain
# Note that 'sx' and 'sy' are the variable names used to store the coordinates for each individual (rows) in each time step (columns).
# We'll store them as a list, where each element corresponds to an individual
EN_fit <- list(NA)
for (i in 1:nind){
EN_fit[[i]] <- bind_cols(
# East-West
fit$all_estimates |>
select(starts_with(paste("sx[", i, ",", sep=''))) |>
pivot_longer(everything(), values_to = "X"),
# North-South
fit$all_estimates |>
select(starts_with(paste("sy[", i, ",", sep=''))) |>
pivot_longer(everything(), values_to = "Y") |>
select(Y)
) |>
# The ID field corresponds to the time step.
mutate(time = as.numeric(gsub(".*,|\\]", "", name)))
}
## Plot posterior estimates
# Select individual for plotting
post_fit <- EN_fit[[1]]
coa_fit <- as.data.frame(fit$coas[, , 1])
# Plot in ggplot
plotCOAs <- ggplot(aes(x = X, y = Y), data = post_fit) +
geom_hex(alpha = 1, bins = 100) +
geom_point(aes(x = x, y = y), data = coa_fit, pch = 25, cex = 1.5, alpha = .8, fill = NA) +
facet_wrap(~ time, ncol = 2) +
scale_fill_gradientn(colours = c("white", "blue"), name = "Frequency", na.value = NA) +
geom_point(aes(x = east, y = north), data = rlocs, cex = 1) +
geom_point(aes(x = east, y = north), data = fish_agg, pch = 21, cex = 1.5, fill = '#00BFC4') +
# Unhash line below to include test tag location
#geom_point(aes(x = east, y = north), data = testloc, cex = 1.5, pch = 21, fill = '#F8766D') +
coord_fixed(xlim = c(-1, 1), ylim = c(-0.5, 1.5)) +
scale_x_continuous(breaks = c(-1, 0, 1)) +
#scale_y_continuous(breaks = c(-1, 0, 1)) +
theme(legend.position = "none") +
labs(x = 'East-West (km)', y = 'North-South (km)')Now let’s fit a model that allows for variation in detection
probabilities. This model is a simple extension of the previous model -
it just allows the detection probability to vary between time steps and
receivers. The only other information you’ll need to provide is the
expected number of transmissions per tag per time interval
(ntrans).
### Format data for model fitting
fit_vary <- COA_TimeVarying(
nind = nind, # number of individuals
nrec = nrec, # number of receivers
ntime = tsteps, # number of time steps
ntrans = 30, # number of expected transmissions per tag per time interval
y = Y, # array of detections from tagged fish
recX = as.vector(rlocs$east), # E-W receiver coordinates
recY = as.vector(rlocs$north), # N-S receiver coordinates
xlim = xlim, # E-W boundary of spatial extent (receiver array + buffer)
ylim = ylim # N-S boundary of spatial extent (receiver array + buffer)
)## warmup sample
## chain:1 24.266 14.747
## chain:2 22.624 17.264
## chain:3 23.072 14.642
## chain:4 22.762 14.569
## Length Class Mode
## model 1 stanfit S4
## summary 3010 -none- numeric
## time 1 -none- numeric
## coas 7 data.frame list
## detection_probs 300 -none- numeric
## all_estimates 923 data.frame list
## [1] 2.565767
# As before, extract east-west and north-south coordinates of COAs from each iteration of each chain
# Note that 'sx' and 'sy' are the variable names used to store the coordinates for each individual (rows) in each time step (columns).
# We'll store them as a list, where each element corresponds to an individual
EN_vary <- list(NA)
for (i in 1:nind){
EN_vary[[i]] <- bind_cols(
# East-West
fit_vary$all_estimates |>
select(starts_with(paste("sx[", i, ",", sep=''))) |>
pivot_longer(everything(), values_to = "X"),
# North-South
fit_vary$all_estimates |>
select(starts_with(paste("sy[", i, ",", sep=''))) |>
pivot_longer(everything(), values_to = "Y") |>
select(Y)
) |>
# The ID field corresponds to the time step.
mutate(time = as.numeric(gsub(".*,|\\]", "", name)))
}
## Plot posterior estimates
# Select individual for plotting
post_vary <- EN_vary[[1]]
coa_vary <- as.data.frame(fit_vary$coas[, , 1])
# Plot in ggplot
plotTVary <- ggplot(aes(x = X, y = Y), data = post_vary) +
geom_hex(alpha = 1, bins = 100) +
geom_point(aes(x = x, y = y), data = coa_vary,
pch = 25, cex = 1.5, alpha = .8, fill = NA
) +
facet_wrap(~time, ncol = 2) +
scale_fill_gradientn(colours= c("white", "blue"), name = "Frequency", na.value=NA) +
geom_point(aes(x = east, y = north), data = rlocs, cex = 1) +
geom_point(aes(x = east,y = north), data = fish_agg, pch = 21, cex = 1.5, fill = '#00BFC4') +
# Unhash line below to include test tag location
#geom_point(aes(x=east,y=north),data=testloc,cex=1.5,pch=21,fill='#F8766D') +
coord_fixed(xlim = c(-1, 1), ylim = c(-0.5, 1.5)) +
scale_x_continuous(breaks = c(-1, 0, 1)) +
#scale_y_continuous(breaks = c(-1, 0, 1)) +
theme(legend.position = "none") +
labs(x = 'East-West (km)', y = 'North-South (km)')Now we’ll fit the most exciting model - the model that integrates
test tag data to inform detection probabilities. This model is a simple
extension of the previous models - it just adds a component that
specifies the distance of each test tag from each receiver, and shifts
the probability of detection by comparing the number of detections
logged at each receiver versus those emitted from the test tag in each
time step. In other words, it treats test tags as tagged individuals but
with known COAs. The only other information you’ll need to provide is
the expected number of transmissions per tag per time interval
(ntrans).
### Format data for model fitting
## Specify the number of sentinel tags (this step is necessary because of issues that arise with Stan indexing if you have only 1 test tag)
nsentinel <- 1
fit_tag <- COA_TagInt(
nind = nind, # number of individuals
nrec = nrec, # number of receivers
ntime = tsteps, # number of time steps
ntest = nsentinel, # number of test tags
ntrans = 30, # number of expected transmissions per tag per time interval
y = Y, # array of detections from tagged fish
test = testY, # array of detections from test tags
recX = as.vector(rlocs$east), # E-W receiver coordinates
recY = as.vector(rlocs$north), # N-S receiver coordinates
xlim = xlim, # E-W boundary of spatial extent (receiver array + buffer)
ylim = ylim, # N-S boundary of spatial extent (receiver array + buffer)
testX = array(testloc$east, dim = c(nsentinel)),
testY = array(testloc$north, dim = c(nsentinel))
)## warmup sample
## chain:1 19.695 10.553
## chain:2 18.609 10.535
## chain:3 19.143 10.438
## chain:4 20.521 9.958
## Length Class Mode
## model 1 stanfit S4
## summary 3010 -none- numeric
## time 1 -none- numeric
## coas 7 data.frame list
## detection_probs 300 -none- numeric
## all_estimates 953 data.frame list
## [1] 1.990867
# As before, extract east-west and north-south coordinates of COAs from each iteration of each chain
# Note that 'sx' and 'sy' are the variable names used to store the coordinates for each individual (rows) in each time step (columns).
# We'll store them as a list, where each element corresponds to an individual
EN_tag <- list(NA)
for (i in 1:nind){
EN_tag[[i]] <- bind_cols(
# East-West
fit_tag$all_estimates |>
select(starts_with(paste("sx[", i, ",", sep=''))) |>
pivot_longer(everything(), values_to = "X"),
# North-South
fit_tag$all_estimates |>
select(starts_with(paste("sy[", i, ",", sep=''))) |>
pivot_longer(everything(), values_to = "Y") |>
select(Y)
) |>
# The ID field corresponds to the time step.
mutate(time = as.numeric(gsub(".*,|\\]", "", name)))
}
## Plot posterior estimates
# Select individual for plotting
post_tag <- EN_tag[[1]]
coa_tag <- as.data.frame(fit_tag$coas[, , 1])
# Plot in ggplot
plotTagInt <- ggplot(aes(x = X, y = Y), data = post_tag) +
geom_hex(alpha = 1, bins = 100) +
geom_point(aes(x = x, y = y),
data = coa_tag, pch = 25, cex = 1.5, alpha = 0.8, fill = NA
) +
facet_wrap(~time, ncol = 2) +
scale_fill_gradientn(colours = c("white", "blue"), name = "Frequency", na.value = NA) +
geom_point(aes(x = east, y = north), data = rlocs, cex = 1) +
geom_point(aes(x = east, y = north), data = fish_agg, pch = 21, cex = 1.5, fill = '#00BFC4') +
# Unhash line below to plot test tag location
#geom_point(aes(x=east,y=north),data=testloc,cex=1.5,pch=21,fill='#F8766D') +
coord_fixed(xlim = c(-1, 1), ylim = c(-0.5, 1.5)) +
scale_x_continuous(breaks = c(-1, 0, 1)) +
theme(legend.position = "none") +
labs(x = 'East-West (km)', y = 'North-South (km)')Now plot them all up together to see how they compare!
ggarrange(plotCOAs, plotTVary, plotTagInt,
labels = c("Standard", "Time-varying", "Tag-integrated"),
ncol = 3, nrow = 1,
align = "hv",
#widths = c(8,8,8), heights = c(2,2,2),
common.legend = F)Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., … & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of statistical software, 76, 1-32.
Royle, J. A., Chandler, R. B., Sollmann, R., & Gardner, B. (2013). Spatial capture-recapture. Academic press.
Winton, M. V., Kneebone, J., Zemeckis, D. R., & Fay, G. (2018). A spatial point process model to estimate individual centres of activity from passive acoustic telemetry data. Methods in Ecology and Evolution, 9(11), 2262-2272.