Efficient Estimation of Large-Scale Spatial Capture-Recapture Models

Capture-recapture methods are a common tool in ecological statistics, which have been extended to spatial capture-recapture models for data accompanied by location information. However, standard formulations of these models can be unwieldy and computationally intractable for large spatial scales, many individuals, and/or activity center movement. We provide a cumulative series of methods that yield dramatic improvements in Markov chain Monte Carlo (MCMC) estimation for two examples. These include removing unnecessary computations, integrating out latent states, vectorizing declarations, and restricting calculations to the locality of individuals. Our approaches leverage the flexibility provided by the nimble R package. In our first example, we demonstrate an improvement in MCMC efficiency (the rate of generating effectively independent posterior samples) by a factor of 100. In our second example, we reduce the computing time required to generate 10,000 posterior samples from 4.5 hours down to five minutes, and realize an increase in MCMC efficiency by a factor of 25. We also explain how these approaches can be applied generally to other spatially-indexed hierarchical models. R code is provided for all examples, as well as an executable web-appendix.

where 1 is a unit data value.
We use four refinements of the model and MCMC sampling, with the goal to improve Define the AC of individual i on primary occasion k as s i,k = (s x i,k , s y i,k ). On first capture, 158 the components s x i,F i and s y i,F i are given uniform prior distributions spanning the mean loca-159 tion of captures during that occasion. The dispersal between primary occasions k and k + 1 160 uses a uniformly-distributed dispersal angle θ ik , and an exponentially-distributed dispersal 161 distance d ik with rate parameter λ G i , where G i is the sex of individual i (1: female; 2: male), 162 and λ 1 and λ 2 are sex-specific parameters. Thus, the AC components are related across 163 primary occasions as s x i,k+1 = s x i,k + d ik cos(θ ik ) and s y i,k+1 = s y i,k + d ik sin(θ ik ). hazard rate λ 0 . β 1 is the effect of morning trapping sessions, and β 2 is that of males.

177
Total capture hazard rate is in trap r, accounting for competing risks among traps and satisfying R r=0 π ijkr = 1.

180
The "ones trick" is used to induce the correct likelihood calculation. Observation data 181 y is a 3-dimensional array, where y ijk = 0 indicates that individual i was not captured in 182 trapping session j of primary occasion k, and y ijk = r indicates a capture in trap r. The 183 complete Vole model definition is given in (2), where all indices j take the range of the 184 number of secondary trapping sessions in the relevant primary occasion k, and all indices r assume the range 1, . . . , R.
We apply three cumulative refinements to the model and MCMC sampling: (1) Jointly 187 sample correlated dimensions and marginalize over z i indicator variables, (2)  We originally modeled dispersal distances and angles as random variables subject to MCMC 207 sampling, a standard approach for movement models. This results in high computational 208 cost because any proposed update to dispersal distance or angle (especially for early primary 209 occasions) results in a large chain of calculations to determine the updated ACs, detection 210 probabilities, and detection likelihoods for all subsequent occasions. Specifically, say we make 211 an MCMC proposal for modifying d 11 , the dispersal distance for the first individual, between 212 the first and second primary occasions. This MCMC update will require re-evaluating each 213 s 1,2 , s 1,3 , . . . , s 1,L , up through the AC of the final primary occasion. Further, detection 214 probabilities and data likelihoods for each AC also need be recalculated. 215 We reparameterize this model using a custom distribution of activity center locations that 216 is induced by the distributions of turning angle and distance, as s i,k+1 ∼ Dispersal(s ik , λ G i ).

217
This distribution is centered around the current AC and is mathematically equivalent to 218 the original {d, θ} parameterization. Now, updates of s i,k do not induce a large chain of 219 ensuing calculations, but rather, only the likelihoods corresponding to s i,k and s i,k+1 must 220 be calculated. The custom distribution is given by where d = s ik+1 − s ik , and omitting constants of proportionality which are not necessary 222 for sampling. We recognize λ G i e −λd as the exponential density for the dispersal distance d.

223
The factor of when s i,k is "far" from a trap r, then h ijkr will be extremely low, and a capture in trap r is 232 exceedingly unlikely. Its contribution to h ijk * = R r=1 h ijkr is negligible, as is the probability 233 of capture in trap r. The original BUGS modeling language lacks the ability to conditionally 234 disable calculations, and hence all the capture hazard rates must always be computed. 235 We introduce logic such that h ijkr is only calculated for traps within a distance d min of 236 the individual's AC. For traps located further, we assign h ijkr a small positive value. This 237 will not affect the sum h ijk * , but still allows for a non-zero probability of capture. Here, we 238 let h ijkr = 10 −14 for traps outside a radius d min = 40 from each individual's AC. 239 We introduce a discretized grid over the study area, and pre-compute indices of traps 240 within a radius d min from each grid cell. Using this, a custom nimble function returns the 241 indices of "local traps" nearest to any s i,k , and subsequently calculates hazard rates only for 242 the "local" traps. This is similar to the local trap calculations used in the previous example, 243 but implementations are different on account of the discretized habitat mask used there.

Wolverine Model
We assess performance of the Wolverine model using total population size (N ), probability 264 of detection (p 0 ), and scale factor (σ). Results for the four stages of iterative improvement 265 described in Section 2.1 will be denoted as Nimble1 (vectorization), Nimble2 (joint sampling),

266
Nimble3 (evaluating local detectors), and Nimble4 (skipping unnecessary calculations).  Nimble3 formulation requires less than 6 hours to accomplish the same.  Table 4. taken when applying this model to other data. That said, our purpose has been to investigate 334 efficiency of estimation methods rather than statistical properties (such as bias or goodness 335 of fit) of paticular models. Indeed, the ability to perform inference more efficiently will 336 support a deeper exploration of alternative models structures.

337
Many software packages are available for fitting SCR models, making these analyses faster 338 and more accessible to practitioners (e.g., secr, or oSCR, among others). The prevalence 339 of specialized software underscores the complex nature of SCR problems, and furthermore 340 that no single software package could be general enough to approach all SCR problems.

341
nimble does not attempt to provide "canned" algorithms for SCR, or any other particular 342 application, but rather a flexible programming environment suitable for customized (and 343 highly efficient) analysis of complex data.