--- title: c-MoosePopR and SightabilityPopR Demonstration author: "Carl James Schwarz" date: '`r format(Sys.time(), "%Y-%m-%d")`' output: rmarkdown::html_vignette: toc: true number_sections: yes vignette: > %\VignetteIndexEntry{c-MoosePopR and SightabilityPopR Demonstration} %\VignetteEncoding{UTF-8} %\VignetteDepends{car, ggplot2, kableExtra, plyr, readxl} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=200) library(car) library(ggplot2) library(kableExtra) library(plyr) library(readxl) library(reshape2) library(SightabilityModel) ``` # Availability of code The code for this vignette is available in the GitHub repository noted in the *Description* file of this package. # Introduction This document illustrates how to use the *MoosePopR()* and *SightabilityPopR()* functions to replace the *MoosePop* (Reed, 1989) and *AerialSurvey* (Unsworth, 1999) programs. The results from a 2018 survey of moose conducted according to methods described in RISC (2002) will be used for illustrative purposes. ## Survey protocol The survey area is divided in blocks (which are the primary sampling units, PSUs). Blocks could be defined using a grid-system, or based on ecological factors such as elevation, type of vegetation, etc. The blocks do not have the same area, nor do they have to rectangular. If blocks are of different size and the area of the block is not used in the analysis, no biases are introduced into the estimation procedure, but the survey design could be inefficient, i.e., the uncertainty in the estimates could be larger compared to the standard error obtained from methods that account for different block area (see later in this document). Blocks are then stratified using a habitat model into categories of relative moose densities. In this case, the blocks are stratified into three strata corresponding to *low*, *medium*, and *high* density depending on past years results, vegetation cover, and other attributes. There is no limit on the number of strata that can be defined, but usually three to five blocks would be sufficient. Blocks within strata should have similar densities and strata should "as different" as possible. If the stratification is incorrect (e.g., a block is classified into the *high* stratum when in fact is should be in the *low* stratum), no biases are introduced into the estimates A simple random sample of blocks is selected from each stratum, i.e., each block is selected with equal probability within each stratum. The number of block to select from each stratum is determined using different allocation methods. The analysis of the completed survey does not depend on how the sample size was determined in each stratum. Some selected blocks may not be surveyed due to bad weather etc. These missed blocks are assumed to be missing completely at random (MCAR) within each stratum, i.e., the missingness is unrelated to the response or any other covariate. If a missed block is not replaced by another block chosen at random from the stratum, the sample size in this stratum will be reduced. The impact of a reduction in sample size in each stratum (with increased uncertainty in the estimates for each stratum compared to a survey with no missing blocks), but no biases are introduced. When a block is surveyed, moose are observed in groups. When a group is identified, a count of the number of moose is made separated by age (adult, juvenile, calves) and sex (bull or cow). Not all groups of moose can be observed due to sightability issues. A previous sightability study (e.g., Quayle et al. 2001; RISC, 2002) is used to estimate the probability of detecting a group of moose based on a categorical vegetation cover variable with five classes. **NOTE that sightability operates at the group level and not at the moose level**. This means that if a group is sighted, then all members of the group are enumerated. But entire groups could be "invisible" due to vegetation cover. This is a stratified random sampling design where the sampling unit is the *block* surveyed in each stratum and the observational unit is a group of moose. The observations on groups are pseudo-replicates and must be treated as observations within a cluster (being the block). This document will illustrate how to estimate the total number and density of moose (in various classes) and the ratio of types of moose (e.g., bulls to cow ratio) without and with correcting for sightability. The estimates are compared to those generated by the *MoosePop* program (Reed, 1989; no correction for sightability) and the *AerialSurvey* program (Unsworth, 1999; correction for sightability). # Data input Data input consists of several datasets placed into an Excel workbook using different worksheets for the different information. ## Moose data This consists of information on group of moose observed in a selected blocks. The data should have a header row with the variable names for each column. ```{r echo=TRUE, message=FALSE, warning=FALSE} # Get the actual survey data dir(system.file("extdata", package = "SightabilityModel")) survey.data <- readxl::read_excel(system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="BlockData", skip=1,.name_repair="universal") # Skip =xx says to skip xx lines at the top of the worksheet. In this case skip=1 implies start reading in line 2 of # the worksheet. # .name_repair="universal" changes all variable names to be compatible with R, e.g., no spaces, no special characters, etc ``` Variable names for the counts are somewhat arbitrary. Notice how *R* converts the column names from the Excel workbook to to proper *R* variable names (*.name_repair="universal"* controls this conversion. These (modified) variable names will be used to identify the quantities to be estimated so simpler names are easier to use than more complicated names. ```{r echo=TRUE, message=FALSE, warning=FALSE} # Check the variable names in the input data names(survey.data) ``` Missing values in the count fields are assume to be zeros and a zero value must be substituted for the missing values. ```{r echo=TRUE, message=FALSE, warning=FALSE} # convert all NA in moose counts to 0 survey.data$Bulls [ is.na(survey.data$Bulls) ] <- 0 survey.data$Lone.Cows [ is.na(survey.data$Lone.Cows) ] <- 0 survey.data$Cow.W.1...calf [ is.na(survey.data$Cow.W.1...calf)] <- 0 survey.data$Cow.W.2.calves [ is.na(survey.data$Cow.W.2.calves)] <- 0 survey.data$Lone...calf [ is.na(survey.data$Lone...calf) ] <- 0 survey.data$Unk.Age.Sex [ is.na(survey.data$Unk.Age.Sex) ] <- 0 #head(survey.data) ``` We check that the *NMoose* is computed properly by comparing the recorded total number of moose with a computed total number of moose. ```{r echo=TRUE, message=TRUE, warning=TRUE} # check the total moose read in vs computed number survey.data$myNMoose <- survey.data$Bulls + survey.data$Lone.Cows+ survey.data$Cow.W.1...calf*2+ survey.data$Cow.W.2.calves*3+ survey.data$Lone...calf+ survey.data$Unk.Age.Sex ggplot(data=survey.data, aes(x=myNMoose, y=NMoose))+ ggtitle("Compare my count vs. count on spreadsheet")+ geom_point(position=position_jitter(h=.1,w=.1))+ geom_abline(intercept=0, slope=1) ``` We count the number of observations by sub-unit and stratum to ensure that *Stratum* has been coded properly. For example ensure that - stratum case is consistent (e.g., *h* vs. *H*), - stratum codes are consistent (e.g., *H* vs. *HIGH*). ```{r echo=TRUE, message=TRUE, warning=TRUE} xtabs(~Stratum, data=survey.data, exclude=NULL, na.action=na.pass) ``` We can also add additional computed variables. For example, the number of cows is found as ```{r echo=TRUE, message=FALSE, warning=FALSE} survey.data$Cows <- survey.data$Lone.Cows + survey.data$Cow.W.1...calf + survey.data$Cow.W.2.calves ``` The first few records of the counts of interest are: ```{r echo=FALSE, message=FALSE, warning=FALSE} head(as.data.frame(survey.data[,c("Block.ID","Stratum","Bulls","Lone.Cows","Cow.W.1...calf", "Cow.W.2.calves","Cows","Lone...calf","Unk.Age.Sex","NMoose")])) ``` ## Block area The area of each surveyed block is also needed. ```{r echo=TRUE, message=TRUE, warning=TRUE} # Get the area of each block survey.block.area <- readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="BlockArea", skip=1,.name_repair="universal") head(survey.block.area) ``` The name of the variable identifying the block should match the name used in the raw data from the previous section. ```{r echo=TRUE, message=FALSE, warning=FALSE} names(survey.block.area) names(survey.data) ``` Every block that is surveyed should have an area, but the block area file can have areas for blocks not surveyed (e.g., every block in the population of blocks). If a area is available for a block that was not surveyed, it is ignored. ```{r echo=TRUE, message=FALSE, warning=FALSE} # Check that every surveyed block has an area. The setdiff() should return null. setdiff(survey.data$Block.ID, survey.block.area$Block.ID) # It is ok if the block area file has areas for blocks not surveyed setdiff(survey.block.area$Block.ID, survey.data$Block.ID) ``` ## Stratum information Information about each stratum such as the total area and total number of blocks is required ```{r echo=TRUE, message=TRUE, warning=TRUE} stratum.data <- readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="Stratum", skip=2,.name_repair="universal") stratum.data ``` Ensure that the stratum labels match those used in the raw data ## Sightability model. A logistic regression comparing observed groups of moose to known groups of moose (e.g., based on radio collared moose) was used to estimate the sightability model (e.g., Quayle, 2001). ### Extracting from *AerialSurvey* The sightability model used was extracted from the *AerialSurvey* (Unsworth, 1999) program input files and entered on the worksheet in the Excel workbook. The *AerialSurvey* program uses two files to describe the survey (and computations) and the model for the sightability. The *Moose.mdf* is a text-file that describes the model used for sightability corrections. A portion of the file is shown below ``` ;============================================================================== ; MOOSE.MDF - Model description file (MDF) for Aerial Survey - 7-Oct-94 ; -comments start with a semicolon (any text after it will be ignored) ; -blank value fields (right hand side) will default to nil, zero, or no! ;============================================================================== [Model] Title = "Moose, Bell JetRanger" Subtitle = '' ; optional Species = Moose Aircraft = "Bell JetRanger" Locale = xxxxxx Version = 1.00 Terms = 2 ; number of terms in the model, max = 10 Date = 04.11.00 ``` This appears to be the model for Bell JetRanger aircraft for moose. The model for sightability appears to be function only of vegetation cover with two term (intercept and vegetation cover class). Vegetation cover appears to be divided into 5 classes: ``` [Transformations] ;Type of transformations can be: ; Class, LookupT1, LookupT2, LookupT3, LookupT4, Table1, Table2, Power, None Number = 1 ; number of transformations, max = 5 ; for each transformation, give the name/index of the variable, ; and the type of transformation required on the variable ; TransformX = variable, type Transform1 = VegCover, Class [Class] ; Class - e.g., vegetation cover classes into intervals ; for each interval give lower and upper limits and the assigned value Intervals = 6 ; number of class intervals, max = 9 ; ClassX = >lower, <=upper, value - ascending, sequential order Class1 = -0.1, 20.0, 1.0 Class2 = 20.0, 40.0, 2.0 Class3 = 40.0, 60.0, 3.0 Class4 = 60.0, 80.0, 4.0 Class5 = 80.0, 100.0, 5.0 ``` For example, if vegetation cover is from 0 to 20%, this is class 1; from 20%+ to 40% is class 2; etc. The file also contains the estimated model coefficients and estimated covariance matrix for the sightability model.. ``` ; Map input variables into X's [X] ; label/index from a list of variables X01 = 1.0 ; constant X02 = VegCover ; class transformation ;------------------------------------------------------------------------------ ; Model coefficients ;------------------------------------------------------------------------------ [Beta] ; beta vector Beta01 = 4.9604 ; constant Beta02 = -1.8437 ; vegetation cover [Sigma] ; sigma matrix Sigma01 = 0.7822 Sigma02 = -0.2820, 0.1115 ``` The estimated probability of sighting a moose is $$logit(p)=4.9604 -1.8437(vegetation~class)$$ Notice that vegetation class is treated as continuous variable and used directly with values of 1 to 5 rather than as 5 indicator variables. This implies that the $logit(p)$ increases linearly at the same rate as the vegetation cover class goes from 1 to 2 or from 2 to 3, etc. In some cases, sightability may depend on a categorical variable (e.g., north or south aspect). Then a series of indicator variables will have to be created, e.g *AspectNorth* takes the value of 1 if the aspect is north and the value of 0 if the aspect is south. If the categorical variable has $K$ classes, you will need $K-1$ indicator variables. Contact me for help on this issue. ### Reading coefficients of the model The information from the *AerialSurvey* (Unsworth, 1999) program was transferred to the Excel workbook containing the survey information for this study. The estimated coefficients of the model and the covariance matrix (sigma matrix) of the estimated coefficients are input ```{r echo=TRUE, message=FALSE, warning=FALSE} # number of beta coefficients nbeta <- readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="SightabilityModel", range="B2", col_names=FALSE)[1,1,drop=TRUE] # Here Range="B2" refers to a SINGLE cell (B2) on the spreadsheet # extract the names of the terms of the model beta.terms <- unlist(readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="SightabilityModel", range=paste0("B3:",letters[1+nbeta],"3"), col_names=FALSE, col_type="text")[1,,drop=TRUE], use.names=FALSE) # Here the range is B3: C3 in the case of nbeta=2 etc cat("Names of the variables used in the sightability model:", beta.terms, "\n") # extract the beta coefficients beta <- unlist(readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="SightabilityModel", range=paste0("B4:",letters[1+nbeta],"4"), col_names=FALSE, col_type="numeric")[1,,drop=TRUE], use.names=FALSE) # Here the range is B4: C4 in the case of nbeta=2 etc cat("Beta coefficients used in the sightability model:", beta, "\n") # extract the beta covariance matrix beta.cov <- matrix(unlist(readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="SightabilityModel", range=paste0("B5:",letters[1+nbeta],4+nbeta), col_names=FALSE, col_type="numeric"), use.names=FALSE), ncol=nbeta, nrow=nbeta) cat("Beta covariance matrix used in the sightability model:\n") beta.cov ``` These values match those in the *AerialSurvey* (Unsworth, 1999) package. The estimated sightability and expansion factors at the group level (inverse of sightability) are: ```{r echo=FALSE, message=FALSE, warning=FALSE} Cover <- data.frame(intercept=1, VegCoverClass=1:5) Cover$logit.p <- as.matrix(Cover[,1:nbeta ]) %*% beta Cover$logit.p.se <- diag(sqrt( as.matrix(Cover[,1:nbeta ]) %*% beta.cov %*% t(as.matrix(Cover[,1:nbeta ])))) Cover$p <- 1/(1+exp(-Cover$logit.p)) Cover$p.se <- Cover$logit.p.se * Cover$p * (1-Cover$p) Cover$expansion <- 1/Cover$p Cover$expansion.se <- Cover$p.se / Cover$p^2 temp <- Cover[, -1] temp[,2:7]<- round(temp[,2:7],3) #temp kable(temp, row.names=FALSE, caption="Estimated sightability of groups and expansion by Vegetation Cover Class", col.names=c("Vegetation Cover Class","logit(p)","SE logit(p)","p","SE(p)","Expansion Factor","SE(EF)")) %>% column_spec(column=c(1:7), width="2cm") %>% kable_styling("bordered",position = "center", full_width=FALSE) ``` So a single moose sighted in Vegetation Cover Class 5 is expanded to almost 72 moose (!). ### Creation of *VegCoverClass* variable We need to compute a *VegCoverClass* variable in the survey data. The *recode()* function from the *car* package is used. ```{r echo=TRUE, warning=FALSE, message=FALSE} # convert the Veg Cover to Veg.Cover.Class xtabs(~..Veg.Cover, data=survey.data, exclude=NULL, na.action=na.pass) survey.data$VegCoverClass <- car::recode(survey.data$..Veg.Cover, " lo:20=1; 20:40=2; 40:60=3; 60:80=4; 80:100=5") xtabs(~VegCoverClass+..Veg.Cover, data=survey.data, exclude=NULL, na.action=na.pass) ``` Notice that there are 8 groups of moose where the *Vegetation Cover* was not recorded. This has implications for estimation as noted later. Generally speaking, you should not simply delete these observations because this will introduce substantial bias in the estimates. Rather, impute a vegetation cover class that gives the highest sightability for these groups to give a lower bound on the number of moose in this block. This will reduce (but not eliminate) the bias caused by simply dropping these observations. ## Key variables. There are several key variables used to match records across the raw data, block area, and stratum data frames. The default variable names are: - *Block.ID* representing the block identification number. The *Block.ID* should not be duplicated in different strata. It can be numeric or characters. - *Block.Area* representing the area of each surveyed block. This should be numeric. - *Stratum* representing the stratum identification. It can be character or numeric does not have to be a single character. - *Stratum.Blocks* representing the total number of blocks in each stratum. - *Stratum.Area* representing the total area of each stratum in the same units as the *Block.Area*. The analysis function has arguments that can be changed if the names of these key variables differ from above. ## Dealing with missing and zero values ### Group information For each group sighted, the number of animals is recorded and may be divided by age and/or sex groupings. If no animals of a particular sex/age are seen in a group, then it should be recorded as zero rather than as missing. In some cases, sex/age information is not available for an animal. It should be counted in a separate category for unknown sexed/aged animals. Of course the, a naive application of estimating the number of bulls may be biased downwards is some bulls are recorded in the unknown sexed/aged category; however, estimates of the total number moose will be fine. No missing values are allowed in any count. ### Covariate information The sightability models use covariates at the group level to adjust for detection less than 100%. No missing values are allowed for the covariates. If the covariates are not available, then you should not simply delete the group, but rather impute a value of the covariate that gives the highest sightability to form a lower bound on the number of moose in the block because this particular block is not fully corrected for sightability. There are two sources of bias that could occur when you drop a group with missing sightability covariate values: - If you delete the group and the block is now empty (and so vanishes), then your sample size (number of blocks surveyed) and area surveyed are too small and your total observed animals is too small. The bias could be positive or negative depending if the deleted groups are large or small. - If the number of blocks does not change (i.e., deleting this group does make the block empty), then deleting the group makes the total count for this block too small. If you keep the block with sightability set to 1, you partially correct against dropping the block, because rather than having a block correction factor of 2 (e.g., some lack of sightability due to high vegetation cover), you are using a sightability correction factor of 1. Trying to predict the combined effect of these sources of bias is not easy. The effect of the second item is to partially correct the abundance/density estimate compared to dropping the block. See the example in the Sightability section below on dealing with missing covariates. ### Blocks with no observed groups. It is possible that a block will be surveyed and NO animals seen in the entire block. In this case, you should insert a "dummy" observation for this block with counts of 0 for the number of moose to ensure that this surveyed block (with no animals seen) is properly accounted for. You can set the sightability variables to any value, because $0 \times \textrm{sightability correction}$ is still 0. ### Block information Each block must have a known area. Missing values are not allowed. ### A single block in a stratum If a stratum has only a single block surveyed, then no design based estimate of variance is possible. This is known as the *lonely primarily sampling unit (lonely PSU)*. Ad hoc adjustments are possible; refer to http://r-survey.r-forge.r-project.org/survey/exmample-lonely.html for some suggestions. The *MoosePopR()* function has an argument that can be set using ``` MoosePopR(..., survey.lonely.psu="remove") ``` The the variance for this stratum is not considered when estimating the variance of the grand totals or density. Unfortunately, it is not possible, at this time, to estimate a ratio with some strata having lonely PSU. The *SightabilityPopR()* function can deal with lonely PSUs by using within block variability (e.g., among groups) so this will provide a lower bound to the true uncertainty. The bottom line is avoid designs where you sample only a single block in a stratum! ### Stratum information Each stratum must have the known total area for the *MoosePop()* function and the known total number of blocks for the *SightabilityPopR()* function. # Analysis ## Analysis not adjusting for sightability ### Within a stratum #### Density Suppose a sample of $n_h$ blocks were surveyed in stratum $h$. For each block, the number of moose $m_{hi}$ and the area of the block $a_{hi}$ are obtained for block $i$ in stratum $h$. The estimated density of moose per unit area in stratum $h$ is obtained as: $$\widehat{D}_h = \frac{\sum_i m_{hi}}{\sum_i a_{hi}}$$ i.e., as the total number of moose in the surveyed blocks divided by the total survey area. This is a standard ratio estimator and the estimated standard error is found as using equation 6.9 of Cochran (1977). The estimator and its standard error can be automatically computed using the *svyratio()* function in the *survey* package (Lumley, 2019) of *R*. Similar formula are used for estimating the density of other sex/age categories. #### Abundance The estimated total number of moose in stratum $h$ ($\widehat{M}_h$) is found by multiplying the density estimate and its standard error by the stratum area $A_h$ $$\widehat{M}_h = \widehat{D}_h \times A_h$$ $$se(\widehat{M}_h) = se(\widehat{D}_h) \times A_h$$ Similar formula are used for estimating the abundance of other sex/age categories. #### Bull to Cow and other ratios Suppose a sample of $n_h$ blocks were surveyed in stratum $h$. For each block, the number of bulls $b_{hi}$ and the number of cows $c_{hi}$ are obtained for block $i$ in stratum $h$. The estimated bull to cow ratio in stratum $h$ is obtained as: $$\widehat{BC}_h = \frac{\sum_i b_{hi}}{\sum_i c_{hi}}$$ i.e., as the total number of bulls ($\sum_i b_{hi}$) in the surveyed blocks divided by the total number of cows ($\sum_i c_{hi}$) in the surveyed blocks. This is a standard ratio estimator and the estimated standard error is found as using equation 6.9 of Cochran (1977). The estimator and its standard error are automatically computed using the *svyratio()* function in the *survey* package (Lumley, 2019) of *R*. If the number of bulls per 100 cows is wanted, simply multiply the estimate and its standard error by $100\times$. Other ratios such as calves:cows are computed in similar ways. ### Across all strata #### Density The overall density for the number of moose is found as a weighted averaged of the stratum specific densities: $$\widehat{D}= \sum_h \frac{A_h}{\sum_h A_h}\widehat{D}_h$$ $$se(\widehat{D}) = \sqrt{\sum_h \left( \frac{A_h}{\sum_h A_h} \right)^2se(\widehat{D}_h)^2}$$ which is equivalent to the estimated total abundance (see below) divided by the total stratum area. This is known as a *separate-ratio* estimator of the total and is possible because the expansion factor going from density to abundance (the stratum area) is known for each stratum. Similar formula are used for estimating the density of other sex/age categories. #### Abundance The total abundance of moose is found by summing the estimated abundances across strata: $$\widehat{M} = \sum_h \widehat{M}_h$$ and the standard error of the overall abundance is found as: $$se(\widehat{M}) = \sqrt{\sum_h se(\widehat{M}_h)^2}$$ This is known as a *separate-ratio* estimator of the total and is possible because the expansion factor going from density to abundance (the stratum area) is known for each stratum. Similar formula are used for estimating the abundance of other sex/age categories. #### Bull to Cow and other ratios The estimation of the overall bull to cow ratio is computed using the *double-ratio* estimate (Cochran, 1977, Section 6.19) as $$\widehat{BC}_{\textit{double~ratio}}=\frac{\widehat{B}}{\widehat{C}}$$ where $\widehat{B}$ and $\widehat{C}$ are the estimated total number of bull and cows individually computed. This is equivalent to the ratio of the overall density of bulls and cows. The above formula can be expanded to give $$\widehat{BC}_{\textit{double~ratio}}=\frac{\sum_h \frac{\sum_i b_{hi}}{\sum_i a_{hi}} \times A_h} {\sum_h \frac{\sum_i c_{hi}}{\sum_i a_{hi}} \times A_h}$$ Its standard error is NOT directly available in the *survey* package but is readily computed by finding the estimates and the covariance of the estimates of number of bull and cows for each stratum and then using the delta method to find the overall ratio and standard error of the overall ratio. ### The *MoosePopR()* function. A *MoosePopR()* function was created in *R* to replicate the results from *MoosePop* (Reed, 1989). The key arguments to this function are: - *survey.data* - information on each group measured in the survey. This was read from the Excel workbook earlier. - *survey.block.area* - information about the area of each block surveyed. This was read from the Excel workbook earlier. - *stratum.data* - information about each stratum. This was read from the Excel workbook earlier. - *density, abundance, numerator, denominator* - variables for which the density, abundance, ratio are to be formed as right-sided formula (e.g., density=~Cows to estimate the density of Cows). - *conf.level=0.90* indicates the confidence level for confidence intervals to compute with the default being a 90% confidence interval. ### Sample Analyses **Results differ slightly from those reported on an Excel spreadsheet because of slight differences in areas of blocks between data provided to me and that used in MoosePop (Reed, 1989)**. We are now ready to do the analysis. I've gathered all of the analysis results into a list object for extraction later. **The *UC* suffix that I have placed on the estimate name indicates it is not corrected for sightability.** I have only computed abundance and density estimates for all moose, but similar code can be used for other age/sex categories. Similarly, I have only considered the bull:cow ratio and other ratio can be computed in similar fashion. #### Density of all moose We compute the (uncorrected for sightability) density of all moose (regardless of sex or age) using the number of moose (*NMoose*) variable. ```{r echo=TRUE, message=TRUE, warning=TRUE} result <- NULL result$"All.Density.UC" <- MoosePopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,density=~NMoose # which density ) result$"All.Density.UC" ``` The *estimate* and *SE* column as the estimated (uncorrected for sightability) density of all moose with a measure of uncertainty for each stratum. The overall density is found as a weighted average (by stratum area) of the density in each stratum. Similar function calls are used for estimating the density of other sex/age categories by changing the *density* argument. #### Abundance of all moose We compute the estimated (uncorrected for sightability) abundance of all moose (regardless of sex or age) by expanding the density by the *Stratum.Area*. The overall abundance is found as the sum of the expanded estimates. ```{r echo=TRUE, message=FALSE, warning=FALSE} result$"All.Abund.UC" <- MoosePopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~NMoose # which density ) result$"All.Abund.UC" ``` The *estimate* and *SE* column as the estimated (uncorrected for sightability) density of all moose with a measure of uncertainty for each stratum. The overall abundance is found as the sum of the estimates in each stratum. Similar function calls are used for estimating the abundance of other sex/age categories by changing the *density* argument. #### Number of bulls and number of cows We estimate the (uncorrected for sightability) abundance of Bulls and Cows. ```{r echo=TRUE, message=FALSE, warning=FALSE} result$"Bulls.Abund.UC" <- MoosePopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~Bulls # which density ) result$"Bulls.Abund.UC" result$"Cows.Abund.UC" <- MoosePopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~Cows # which density ) result$"Cows.Abund.UC" ``` These values are computed similarly as the total abundance. #### Bull to Cow and other ratios Finally, we estimate the Bulls:Cows ratio: ```{r echo=TRUE, message=FALSE, warning=FALSE} result$"Bulls/Cow.UC" <- MoosePopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,numerator=~Bulls, denominator=~Cows # which density ) result$"Bulls/Cow.UC" ``` Other ratios would be estimated in similar ways by changing the *numerator* and *denominator* arguments to the function call. #### Final object ```{r echo=TRUE, warning=FALSE, message=FALSE} names(result) ``` ## Adjusting for sightability The above computations are not adjusted for sightability, i.e., not all moose can be detected due to visibility concerns. An adjustment for sightability can be done by also performing a study where a known number of groups of moose (e.g., radio collared) are surveyed and information about detection/non detection is recorded at the group level. A logistic regression (see previous sections) is used to estimate the probability of detection of a group given covariates (such as Vegetation Class). Then the sightability model is used to expand the observed number of groups of moose by the a sightability correction factor (SCF). This is related to the inverse of the probability of detection, but includes a correction to account for the uncertainty in the estimates of the coefficients of the sightability model (see Fieberg, 2012, Equation 1 and later in this document). For example, consider a model to estimate detectability given in Quayle et al. (2001): ```{r echo=FALSE, warning=FALSE, message=FALSE} sightability.model <- ~VegCoverClass sightability.beta <- c(4.2138, -1.5847) sightability.beta.cov <- matrix(c(0.78216336, -0.282, -0.282, 0.11148921), nrow=2, ncol=2) sightability.table <- data.frame(VegCoverClass=1:5, VegPercent=c("00-20","21-40","41-60","61-80","81-100")) sightability.table$detect.prob <- SightabilityModel::compute.detect.prob(sightability.table, sightability.model, sightability.beta, sightability.beta.cov) sightability.table$SCF <- SightabilityModel::compute.SCF(sightability.table, sightability.model, sightability.beta, sightability.beta.cov) kable(sightability.table, row.names=FALSE, caption="Estimated sightability correction factor for each vegetation cover class", col.names=c("Veg Cover Class","Veg Cover %","Detection probability","Sightability Correction Factor"), digits=c(0,0, 3,2)) %>% kable_styling("bordered",position = "center", full_width=FALSE, latex_options = "HOLD_position") ``` Notice that that, for example, that 1/`r sightability.table$prob.detect[2]`=`r 1/sightability.table$prob.detect[2]` is only approximately equal to `r sightability.table$SCF[2]`. **It is assumed that all animals in a group (e.g., bulls, cows, calves) are observed if a group is detected.** Let $\theta_{hjk}$ represent the expansion factor for group $k$ in block $j$ of stratum $h$. In the analyses that follow the block area is NOT used in the analysis. In its place, a simple expansion based on the number of blocks surveyed within a stratum are used. This will be less efficient (i.e., have a larger standard error) compared to an analysis that uses the individual block areas, but unless there is a high correlation between the number of animals and the block area, the loss of efficiency is small. If all blocks had the same area, the two approaches are identical. We examined the correlation for the existing survey data and it is typically small. Cochran (1977) shows that unless the correlation is larger than a specified cutoff that depends on the coefficient of variation of the numerator and denominator variables, then there is no gain in using the block area, i.e., unless $$\rho > \frac{1}{2}\frac{SD(denominator~variable)/mean(denominator~variable)}{SD(numerator~variable)/mean(numerator~variable)} $$ where $\rho$ is the correlation between the numerator and denominator variable (i.e., between abundance of the numerator variable and block area)$, then a ratio estimator using the area of blocks is not more efficient. ```{r echo=FALSE,message=FALSE, warning=FALSE} # Look at correlations between total number of moose, bulls, and cows and block area survey.block.data <- plyr::ddply(survey.data, c("Block.ID","Stratum"), plyr::summarize, Bulls = sum(Bulls, na.rm=TRUE), Lone.cows = sum(Lone.Cows, na.rm=TRUE), Cow.W.1...calf = sum(Cow.W.1...calf, na.rm=TRUE), Cow.W.2.calves = sum(Cow.W.2.calves, na.rm=TRUE), Lone...calf = sum(Lone...calf, na.rm=TRUE), Unk.Age.Sex = sum(Unk.Age.Sex, na.rm=TRUE), Cows = sum(Cows, na.rm=TRUE), NMoose = sum(NMoose, na.rm=TRUE)) # add the area to the block totals survey.block.data <- merge(survey.block.data, survey.block.area, all.x=TRUE) # What is correlation between block area and number of moose etc survey.block.data.melt <- reshape2::melt(survey.block.data, id.vars=c("Stratum","Block.ID","Block.Area"), measure.vars=c("Bulls","Lone.cows","Cow.W.1...calf","Cow.W.2.calves", "Lone...calf","Unk.Age.Sex","Cows","NMoose"), variable.name="Classification", value.name="N.Animals") # find correlation between number of animals and area res <- plyr::ddply(survey.block.data.melt, c("Stratum","Classification"), plyr::summarize, corr=cor(N.Animals, Block.Area), cv.N.Animals=sd(N.Animals)/mean(N.Animals), cv.Area =sd(Block.Area)/mean(Block.Area), cut.off = cv.Area/2/cv.N.Animals) temp <- res[,c(1,2,3,6)] temp[,3:4] <- round(temp[,3:4],2) kable(temp, row.names=FALSE, caption="Estimated correlation between animal counts and block area", col.names=c("Stratum","Type of Animal","Corr","Cutoff")) %>% column_spec(column=c(1:2), width="3cm") %>% column_spec(column=c(3:4), width="1cm") %>% kable_styling("bordered",position = "center", full_width=FALSE) %>% footnote(threeparttable=TRUE, general=paste0("Cutoff is 0.5 ratio of sd/mean of area and abundance of each type of animal", " and unless the correlation exceeds the cutoff, there is no advantage", " in using the block area in the analysis")) ``` The correlations are not high. While about half of the response variables have correlations of abundance with block area that exceed the cutoff, the variation in the block areas is not that large, so we don't think that also adjusting for block area is necessary. Of course, this also does not account for sightability issues. A plot of the two variables for each types of animals shows no strong relationship between the number of observed animals and the block area: ```{r echo=FALSE, message=FALSE, warning=FALSE} ggplot(data=survey.block.data.melt, aes(x=Block.Area, y=N.Animals, color=Stratum))+ geom_point()+ facet_wrap(~Classification, ncol=3, scales="free")+ theme(legend.position=c(1,0), legend.justification=c(1,0)) ``` Consequently, it not expected that ignoring block area in the analysis will have a material effect. ### Within a stratum #### Abundance Suppose a sample of $n_h$ blocks were surveyed in stratum $h$ out of $N_h$ total number of blocks. For each block, the number of moose $m_{hjk}$ are obtained for group $k$ in block $j$ in stratum $h$. The estimated correction factor (for visibility of the group) is $\widehat{\theta}_{hjk}$ which is computed give the vector of covariates (x_{hjk}) as (see page 4 of Fieberg, 2012) $$\widehat{\theta}_{hjk}= 1+\exp{(-x_{hjk}^t \widehat{\beta}-x_{hjk}^t \Sigma x_{hjk} / 2)}$$ Notice that the sightability correction factor is not simply $1/\textit{probability of detection}$ or $1+\exp{(-x_{hjk}^t \widehat{\beta}$ because it includes a correction factor to account for the non-linear transformation from the logit to the probability scale. The estimated abundance of moose in stratum $h$ is obtained as: $$\widehat{M}_h = N_h \times \frac{\sum_{jk} m_{hjk}\widehat{\theta}_{hjk}}{n_h}$$ i.e., as the mean number of (sightability corrected) moose per surveyed block times the total number of blocks. The standard error of this estimate is complicated because of the need to account for the estimated sightability but the theory is covered in Steinhorst and Samuel (1989), Fieberg (2012) and Wong (1996) and implemented in the *AerialSurvey* (Unsworth, 1999) and *SightabilityModel* packages of *R*. The latter package corrects the variance computations used by Steinhorst and Samuel (1989) and Fieberg (2012). Similar computations are used for estimating the abundance of other sex/age categories. #### Density The estimated density of moose in stratum $h$ ($\widehat{D}_h$) is found by dividing the abundance estimate and its standard error by the stratum area $A_h$ $$\widehat{D}_h = \frac{\widehat{M{_h}}}{A_h} $$ $$se(\widehat{M}_h) = \frac{se(\widehat{M}_h)}{A_h}$$ Similar computations are used for estimating the density of other sex/age categories. #### Bull to Cow and other ratios The ratio of bulls to cows within a stratum is computed as the estimated abundances in each stratum: $$\widehat{BC}_h = \frac{\widehat{Bulls}_h}{\widehat{Cows}_h}$$ i.e., as the estimated total number of bulls in the stratum divided by the estimated total number of cows in the stratum. This ratio estimator is not available in the *SightabilityModel* package (Fieberg, 2012), but a new function was written and added to the package to add this functionality. The standard error is difficult to compute because of the need to account for the correlation between the number of bulls and moose in individual groups and the the uncertainty of the correction factors but is given in Wong (1996). Note that if the sightability is equal for all groups, then this common sightability correction factor would cancel everywhere and you would get the same results as the ratio estimator based on the observed groups only. In most cases, the impact of sightability will be small on the ratio estimator and its standard error. If the number of bulls per 100 cows is wanted, simply multiply the estimate and its standard error by $100\times$. Similar computations occur for other ratios such as calves:cows. ### Across all strata #### Abundance The total abundance of moose is found by summing the estimated abundances across strata: $$\widehat{M} = \sum_h \widehat{M}_h$$ The standard error is difficult to compute because the same sightability model is used across strata. This has been implemented in the *SightabilityModel* package (Fieberg, 2012) and the *AerialSurvey* (Unsworth, 1999) program. Similar computations are used for estimating the abundance of other sex/age categories. #### Density The overall density of moose and its standard error is found as estimated total abundance and its standard error divided by the total area over all strata.. $$\widehat{D}= \frac{ \widehat{M}}{\sum_h A_h}$$ $$se(\widehat{D}) = \frac{se(\widehat{M})}{{\sum_h A_h}}$$ Because the total area of all strata is known quantity, this is straightforward. Similar computations are used for estimating the density of other sex/age categories. #### Bull to Cow and other ratios The estimation of the overall bull to cow ratio is computed as the ratios of their respective estimated abundances $$\widehat{BC}=\frac{\widehat{Bulls}}{\widehat{Cows}}$$ This ratio estimator was not available in the *SightabilityModel* package (Fieberg, 2012), but its functionality has been added as part of this contract. Standard errors are computed as described in Wong (1996). Similar computations can be used for other ratios such as calves:cows. ### The *SightabilityPopR()* function I have created a function with the same calling arguments as the *MoosePopR()* function to estimate abundance, density, and ratios using the *SightabilityModel* package. Notice that additional functionality was added to the *SightabilityModel* package (Fieberg, 2012) to estimate ratios. A *SightabilityPopR()* function has a similar calling sequence as the *MoosePopR* function with the same key input variables. - *survey.data* - information on each group measured in the survey. This was read from the Excel workbook earlier. - *survey.block.area* - information about the area of each block surveyed. This was read from the Excel workbook earlier. - *stratum.data* - information about each stratum. This was read from the Excel workbook earlier. - *density, abundance, numerator, denominator* - variables for which the density, abundance, ratio are to be formed as right-sided formula (e.g., density=~Cows to estimate the density of Cows). - *conf.level=0.90* indicates the confidence level for confidence intervals to compute with the default being a 90% confidence interval. Additionally, the sightability formula, the beta coefficients and the estimated covariance matrix must also be specified using the arguments: - *sight.formula* - the formula for the sightability model. This should use the variables in the *survey.data* frame - *sight.beta* - the beta coefficients for the sightability model - *sight.beta.cov* - the covariance matrix for the beta coefficients - *sight.logCI* - should confidence intervals for abundance be computed using a logarithmic transformation - *sight.var.method* - what method (either *"Wong"* or *"SS"*) should be used to estimate the variances . using the methods of Wong (1996) or Steinhorst and Samuels (2012). The default (and preferred) method is *"Wong"* because she corrects the variance computations used by Steinhorst and Samuel (1989) and Fieberg (2012). These additional parameters are passed to the appropriate functions in the *SightabilityModel* package (Fieberg, 2012). Refer to the examples for details. ### Sample Analyses #### Dealing with missing values in the sightability covariates Because we are correcting for sightability, we need the *VegCoverClass* for all groups. However there are some groups of moose with missing data for this covariate: ```{r echo=TRUE, message=FALSE, warning=FALSE} select <- is.na(survey.data$VegCoverClass) survey.data[select,] ``` If you simply delete these observations, then estimates can be severely biased because these groups would be excluded from the expansions when corrected for sightability. About the best that can be done is to assign an imputed vegetation class to these observations with the highest sightability to reduce the bias. ```{r echo=TRUE, message=FALSE, warning=FALSE} # change the missing values to cover class with highest sightability survey.data$VegCoverClass[select] <- 1 survey.data[select,] ``` We are now ready to do the analysis. I've gathered all of the analysis results into a list object for extraction later. **The *C* suffix that I have placed on the estimate name indicates it is corrected for sightability.** I have only computed abundance and density estimates for all moose, but similar code can be used for other age/sex categories. Similarly, I have only considered the bull:cow ratio and other ratios can be estimated in similar fashion. #### Abundance of all moose We compute the (corrected for sightability) abundance of all moose (regardless of sex or age) using the number of moose (*NMoose*) variable. ```{r echo=TRUE, message=TRUE, warning=TRUE} result <- NULL result$"All.Abund.C" <- SightabilityPopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) temp <- result$"All.Abund.C" temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0) # force to be numeric temp ``` The *estimate* and *SE* column as the estimated (corrected for sightability) density of all moose with a measure of uncertainty for each stratum. The overall abundance is found as the sum of the estimates in each stratum. The right most columns are information about the sources of uncertainty (due to sampling, sightability, or the model for sightability). Similar function calls are used for estimating the abundance of other sex/age categories by changing the *abundance* argument. We can show the impact of simply deleting those values missing the *VegCoverClass* covariate by setting the observed values of moose to .001 for these groups. This will retain the correct number of blocks surveyed. If you simply deleted those groups of moose with missing *VegCoverClass* values and a block then had no observations, the block would "disappear" from the analysis which is not correct. ```{r echo=TRUE, message=FALSE, warning=FALSE} # mimic impact of deleting observations with missing VegCoverClass survey.data2 <- survey.data survey.data2$NMoose[select] <- .001 wrong.way <- SightabilityPopR( # show impact of deleting observations but keeping # blocks the same survey.data = survey.data2 # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) wrong.way ``` Notice the negative bias in the estimated abundance computed the incorrect way (`r round(wrong.way[wrong.way$Stratum==".OVERALL","estimate"])`) vs. the value when the *VegCoverClass* is set to the class with highest sightability (`r round(result$"All.Abund.C"[result$"All.Abund.C"$Stratum==".OVERALL","estimate"])`) computed above. In the real world, if a group has missing values for covariates using in the sightability model, do not change the count data, but change the missing covariates to values that result in the highest possible sightability for this group. In this way, the number of blocks surveyed is computed properly and you have partially the bias in dropping these groups by including them with an imputed high sightability value. #### Density of all moose We compute the estimated (corrected for sightability) density of all moose (regardless of sex or age). ```{r echo=TRUE, message=FALSE, warning=FALSE} result$"All.Density.C" <- SightabilityPopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,density=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) temp<- result$"All.Density.C" temp[,-(1:3)] <- round(temp[,-(1:3)]+.00001,2) # force to be numeric temp ``` The *estimate* and *SE* column as the estimated (corrected for sightability) density of all moose with a measure of uncertainty for each stratum. The overall density is found as a estimated abundance/total area over all strata. The variance components are the same for the total abundance because these values are used to derive the density. Similar function calls are used for estimating the density of other sex/age categories by changing the *density* argument. #### Number of bulls and number of cows We estimate the (corrected for sightability) abundance of Bulls and cows. ```{r echo=TRUE, message=FALSE, warning=FALSE} result$"Bulls.Abund.C" <- result$"All.Density.C" <- SightabilityPopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,density=~Bulls # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) result$"Bulls.Abund.C" result$"Cows.Abund.C" <- SightabilityPopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,density=~Cows # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) result$"Cows.Abund.C" ``` These values are computed similarly as the total abundance. #### Bull to Cow and other ratios Finally we estimates the Bulls/Cow ratio using the corrected for sightability values. ```{r echo=TRUE, message=FALSE, warning=FALSE} result$"Bulls/Cow.C" <- SightabilityPopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,numerator=~Bulls, denominator=~Cows # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) temp<- result$"Bulls/Cow.C" temp[,-(1:3)] <- round(temp[,-(1:3)]+.00001,3) # force to be numeric temp ``` The ratio is computed within each stratum and overall The sources of variation computations are set to missing as they are not sensible in this case. Notice that this ratio estimator is not that different from that computed from values uncorrected for sightability. This is a consequence of the approximately equal sightability correction factor for all groups. Other ratios can be computed by changing the *numerator* and *denominator* arguments in the function call. #### Final object The result is a list object with one element each of the analyses done above. ```{r echo=TRUE, message=FALSE, warning=FALSE} names(result) ``` # Domain estimation Domain estimation refer to estimating density, abundance, or ratios for subsets of the survey area. For example, in British Columbia, a survey may take place over a combined Buckley Valley and Lake District (sub units) survey area, but estimates of abundance are also wanted for Buckley Valley and Lake District individually. In this case, the sub-units are domains. There are four cases to consider depending if information about the primary sampling units (PSU) (i.e., blocks) and stratum areas are known or unknown at the domain level $\times$ the domain separation operates at the PSU (block) or group level. ## Domain at PSU (block) level; domain population information known This is the simplest case. Here the primary sampling units (PSU) (or blocks) are classified as belonging to the domain or not. For example, the study area could be divided into two or more management units and each PSU (block) belongs to one and only one management units in its entirety. The total area of the management unit and the total number of potential PSUs is known for each management unit. Because PSUs (blocks) were randomly selected within each stratum with equal probability, they are also randomly selected from each domain (management unit). Consequently, just subset the survey data (observed animals) to those PSUs (blocks) belonging to the domain of interest and change the stratum information to refer to the total area and total PSUs (block) for that domain and estimation for density, abundance, and ratios occurs exactly as before. ### Example For example, the survey data used in the examples was classified in Domain $A$ and $B$, and a separate worksheet in the workbook has information about stratum information for Domain $A$. We subset the survey data, read in the stratum information for Domain $A$ and then use the same methods as shown before. The PSUs (blocks) in the example are split between two domains (*A* and *B*) as shown below: ```{r echo=TRUE, message=FALSE, warning=FALSE} # PSU's are split between the domains xtabs(~Domain+Block.ID, data=survey.data, exclude=NULL, na.action=na.pass) ``` The number of groups of moose in each block is shown. We subset the survey data: ```{r echo=TRUE, message=FALSE, warning=FALSE} # Subset the survey data survey.data.A <- survey.data[ survey.data$Domain == "A",] ``` The block information (block area) is unchanged and does not need to be subset. We need information at the stratum level for domain *A* which is available on the workbook: ```{r echo=TRUE, message=FALSE, warning=FALSE} # Domain information for the population for each stratum stratum.data.A <- readxl::read_excel( system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE), sheet="Stratum-DomainA", skip=2,.name_repair="universal") kable(stratum.data.A, row.names=FALSE, caption="Stratum totals for Domain A", col.names=c("Stratum","Stratum Area","Stratum # blocks")) %>% column_spec(column=c(1:3), width="2cm") %>% kable_styling("bordered",position = "center", full_width=FALSE) ``` We can now estimate abundance, density, or ratios for Domain A in the same way as before with and without corrections for sightability. We show only the estimated abundance for all moose after correcting for sightability. Note the use of **survey.data.A** and **stratum.data.A*** ```{r echo=TRUE, message=FALSE, warning=FALSE} All.Abund.A.corrected <- SightabilityPopR( survey.data = survey.data.A # raw data for domain A ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data.A # stratum information for domain A ,abundance=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) temp <- All.Abund.A.corrected temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0) # force to be numeric temp[,1:10] ``` ## Domain at PSU (block) level; domain population information unknown. In some cases, only the surveyed PSUs can be classified by domain and the stratum total area and stratum total blocks is unknown for the domain. Cochran (1977, Sections 2.13 and 5A.14) discuss this case. Abundance and ratios can be estimated by setting all survey data to 0 for blocks/groups not part of the domain, and then using the estimating functions on the complete data after zeroing. This may seem strange, but consider the following table ``` PSU Domain A A A A B B B B B B Y 2 3 3 4 5 3 2 3 2 4 ``` The total of $Y$ in Domain $A$ is $2+3+3+4=12$. If we consider the modified data ``` PSU Domain A A A A B B B B B B Y' 2 3 2 3 0 0 0 0 0 0 ``` and find the mean of $\overline{Y'}=(2+3+3+4+0+0+0+0+0+0)/10=1.2$ and multiply the mean of *Y'* by the population number of PSU (10) we get the domain total of $Total(y)=\overline{Y'} \times 10=1.2 \times 10=12$. ### Example We set the survey data to 0 if the domain is not *A* (notice that we do NOT delete observations). ```{r echo=TRUE, message=FALSE, warning=FALSE} # Set number of moose to zero if not part of domain A survey.data.A.z <- survey.data survey.data.A.z$NMoose[ survey.data$Domain != "A"] <- 0 ``` We now use the modified survey data (**survey.data.A.Z** but the original group and stratum data: ```{r echo=TRUE, message=FALSE, warning=FALSE} All.Abund.A.corrected.z <- SightabilityPopR( survey.data = survey.data.A.z # raw data with non-domain counts set to zero ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) temp <- All.Abund.A.corrected.z temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0) # force to be numeric temp[,1:10] ``` The estimates will be similar (but not exactly the same) between this analysis and the previous section. Estimates of ratios can be found in the same way because these are based on abundances **However, estimates of density cannot be found using this method**. Refer to Cochran (1977, Section 5A.14, *Estimating Domain Means*) for more details. ## Domain at group level; domain population information known The analysis in this case is similar to the comparable case when the domain occurs at the PSU (block) level. Now **groups** are classified as belonging to the domain or not. For example, interest may lie in the number of moose in each vegetation class. All blocks are used, but now in each block, you will subset the groups that belong to the domain. The hard part is the area in this domain of EACH PSU (block) must also be known. Presumably, this is a GIS exercise in the case of vegetation cover, assuming that entire survey area has been mapped. Similarly, the total area in the stratum for this domain must be known. This is presumably a GIS exercise. The analysis proceeds by subsetting the survey data to groups that fall within this domain, but **you need to ensure that groups without this domain are still included (with 0 counts for the response variable)**. You also need to use the modified block areas and modified stratum area in this domain. Please contact me if you are attempting to do this as the programming *R* is tricky, to say the least. ## Domain at group level; domain population information unknown. In some cases, only the surveyed groups can be classified by domain and the block area, stratum total area and stratum total blocks is unknown for the domain. We use the same trick as above by setting the survey data to 0 if the domain is not *A* (notice that we do NOT delete observations). ### Example Suppose we want moose abundance by vegetation cover class. The estimate of total abundance is ```{r echo=TRUE, message=FALSE, warning=FALSE} All.Abund <- SightabilityPopR( survey.data = survey.data # raw data ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) temp <- All.Abund temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0) # force to be numeric temp[,1:10] ``` We will compute the total abundance for each vegetation cover class ```{r echo=TRUE, message=FALSE, warning=FALSE} All.Abund.vc <- plyr::llply(unique(survey.data$VegCoverClass), function(VegCover){ # Set number of moose to zero if not part of domain A survey.data.z <- survey.data survey.data.z$NMoose[ survey.data$VegCoverClass != VegCover] <- 0 # set to zero #browser() All.Abund.vc <- SightabilityPopR( survey.data = survey.data.z # raw data that has been zeroed out ,survey.block.area = survey.block.area # area of each block ,stratum.data = stratum.data # stratum information ,abundance=~NMoose # quantifies to estimate ,sight.formula = observed ~ VegCoverClass # sightability mode; ,sight.beta = beta # the beta coefficients for the sightability model ,sight.beta.cov = beta.cov # the covariance matrix for the beta coefficients ,sight.var.method = "Wong" # method used to estimate the variances ) list(VegCoverClass=VegCover, est=All.Abund.vc) }) # extract the total abundance All.Abund.vc.df <- plyr::ldply(All.Abund.vc, function(x){ #browser() res <- data.frame(VegCoverClass=as.character(x$VegCoverClass), stringsAsFactors=FALSE) res$estimate <- x$est[x$est$Stratum==".OVERALL", c("estimate")] res$SE <- x$est[x$est$Stratum==".OVERALL", c("SE" )] res }) All.Abund.vc.df <- plyr::rbind.fill( All.Abund.vc.df, data.frame(VegCoverClass=".OVERALL", estimate=sum(All.Abund.vc.df$estimate, stringAsFactors=FALSE))) All.Abund.vc.df ``` The estimates sum to the estimate firstly computed, but the estimates from the individual vegetation cover classes are not independent and so the overall standard error cannot be found from the separate estimates by vegetation cover class. Estimates of ratios can be found in the same way because they are based on abundances. **However, estimates of density cannot be found using this method**. Refer to Cochran (1977, Section 5A.14, *Estimating Domain Means*) for more details. # References Cochran, W.G. (1977). Sampling techniques. 3rd Edition. Wiley. New York. Fieberg, J. (2012). Estimating Population Abundance Using Sightability Models: R SightabilityModel Package. Journal of Statistical Software, 51(9), 1-20. URL https://doi.org/10.18637/jss.v051.i09. Gasaway, W.C., S.D. Dubois, D.J. Reed and S.J. Harbo. (1986). Estimating moose population parameters from aerial surveys. Biol. Pap. Univ. Alaska, No. 22. Institute of Arctic Biology, U. of Alaska, Fairbanks. 108 pp. Lumley, T. (2019) "survey: analysis of complex survey samples". R package version 3.35-1. Quayle, J. F., A. G. MacHutchon, and D. N. Jury. (2001). Modeling moose sightability in south-central British Columbia. Alces 37:43–54. R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Resource Inventory Standards Committee (RISC). (2002). Aerial-based inventory methods for selected ungulates bison, mountain goat, mountain sheep, moose, elk, deer, and caribou. Version 2. Standards for Components of British Columbia’s Biodiversity No 32. British Columbia Ministry of Sustainable Resource Management, Victoria, B.C. Reed, D.J. (1989). MOOSEPOP program documentation and instructions. Alaska Department of Fish and Game. Unpublished report. Steinhorst, K. R. and Samuel, M. D. (1989). Sightability Adjustment Methods for Aerial Surveys of Wildlife Populations. Biometrics 45:415--425. Unsworth JW, A LF, O GE, Leptich DJ, Zager P (1999). Aerial Survey: User’s Manual. Idaho Department of Fish and Game, electronic edition edition. Wong, C. (1996). Population size estimation using the modified Horvitz-Thompson estimator with estimated sighting probabilities. Dissertation, Colorado State University, Fort Collins, USA.