---
title: c-MoosePopR and SightabilityPopR Demonstration
author: "Carl James Schwarz"
date: '`r format(Sys.time(), "%Y-%m-%d")`'
output: 
   rmarkdown::html_vignette:
     toc: true
     number_sections: yes
vignette: >
  %\VignetteIndexEntry{c-MoosePopR and SightabilityPopR Demonstration}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{car, ggplot2, kableExtra, plyr, readxl}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=200)

library(car)
library(ggplot2)
library(kableExtra)
library(plyr)
library(readxl)
library(reshape2)
library(SightabilityModel)


```

# Availability of code

The code for this vignette is available in the GitHub repository noted 
in the *Description* file of this package. 

# Introduction

This document illustrates how to use the *MoosePopR()* and *SightabilityPopR()* functions to 
replace the *MoosePop* (Reed, 1989) and *AerialSurvey* (Unsworth, 1999) programs.
The results from a 2018 survey of moose conducted according to methods
described in RISC (2002) will be used for illustrative purposes.

## Survey protocol

The survey area is divided in blocks (which are the primary sampling units, PSUs). 
Blocks could be defined using a grid-system, or based on ecological factors such as elevation, type of vegetation, etc.
The blocks do not have the same area, nor do they have to rectangular.
If blocks are of different size and the area of the block is not used in the analysis, 
no biases are introduced into the estimation procedure,
but the survey design could be inefficient, i.e., the uncertainty in the estimates could be 
larger compared to the standard error obtained from methods that account for different
block area (see later in this document).

Blocks are then stratified using a habitat model  into categories of relative moose densities.
In this case, the blocks are stratified into three strata corresponding to *low*, *medium*, and *high* density
depending on past years results, vegetation cover, and other attributes. There is no limit on the number of
strata that can be defined, but usually three to five blocks would be sufficient.
Blocks within strata should have similar densities and strata should "as different" as possible.
If the stratification is incorrect (e.g., a block is classified into the *high* stratum
when in fact is should be in the *low* stratum), no biases are introduced into the estimates

A simple random sample of blocks is selected from each stratum, i.e., each block is selected with equal probability
within each stratum. The number of block to select from each stratum is determined using different allocation methods.
The analysis of the completed survey does not depend on how the sample size was determined in each stratum.
Some selected blocks may not be surveyed due to bad weather etc. These missed blocks are assumed to be missing 
completely at random (MCAR) within each stratum, i.e., the missingness is unrelated to the response or any other covariate.
If a missed block is not replaced by another block chosen at random from the stratum, the sample size in this
stratum will be reduced.
The impact of a reduction in sample size in each stratum (with increased uncertainty in the estimates
for each stratum compared to 
a survey with no missing blocks), but no biases are introduced.

When a block is surveyed, moose are observed in groups. When a group is identified, a count of the number
of moose is made separated by age (adult, juvenile, calves) and sex (bull or cow).

Not all groups of moose can be observed due to sightability issues. 
A previous sightability study (e.g., Quayle et al. 2001; RISC, 2002) is used
to estimate the probability of detecting a group of moose based on a
categorical vegetation cover variable with five classes.
**NOTE that sightability operates at the group level and not at the moose level**. This
means that if a group is sighted, then all members of the group are enumerated. But entire groups
could be "invisible" due to vegetation cover.

This is a stratified random sampling design where the sampling unit is the *block* surveyed in each stratum
and the observational unit is a group of moose. The observations on groups are pseudo-replicates
and must be treated as observations within a cluster (being the block). 

This document will illustrate how to estimate the total number and density of moose (in various classes) and 
the ratio of types of moose (e.g., bulls to cow ratio) without and with correcting for sightability. The
estimates are compared to those generated by the *MoosePop* program (Reed, 1989; 
no correction for sightability) and the
*AerialSurvey* program (Unsworth, 1999; correction for sightability).

# Data input

Data input consists of several datasets placed into an Excel workbook using different
worksheets for the different information. 

## Moose data

This consists of information on group of moose observed in a selected blocks.
The data should have a header row with the variable names for each column.


```{r echo=TRUE, message=FALSE, warning=FALSE}
# Get the actual survey data
dir(system.file("extdata", package = "SightabilityModel"))

survey.data <- readxl::read_excel(system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
                                  sheet="BlockData",
                                  skip=1,.name_repair="universal")
# Skip =xx says to skip xx lines at the top of the worksheet. In this case skip=1 implies start reading in line 2 of
# the worksheet.
# .name_repair="universal" changes all variable names to be compatible with R, e.g., no spaces, no special characters, etc
```

Variable names for the counts are somewhat arbitrary.
Notice how *R* converts the column names from the Excel workbook to 
to proper *R* variable names (*.name_repair="universal"* controls this conversion. 
These (modified) variable names will be used to identify
the quantities to be estimated so simpler names are easier to use than more complicated names.
```{r echo=TRUE, message=FALSE, warning=FALSE}
# Check the variable names in the input data
names(survey.data)
```

Missing values in the count fields are assume to be zeros and a zero value must be 
substituted for the missing values. 

```{r echo=TRUE, message=FALSE, warning=FALSE}
# convert all NA in moose counts to 0
survey.data$Bulls          [ is.na(survey.data$Bulls)         ] <- 0
survey.data$Lone.Cows      [ is.na(survey.data$Lone.Cows)     ] <- 0
survey.data$Cow.W.1...calf [ is.na(survey.data$Cow.W.1...calf)] <- 0
survey.data$Cow.W.2.calves [ is.na(survey.data$Cow.W.2.calves)] <- 0
survey.data$Lone...calf    [ is.na(survey.data$Lone...calf)   ] <- 0
survey.data$Unk.Age.Sex    [ is.na(survey.data$Unk.Age.Sex)   ] <- 0
#head(survey.data)
```


We check that the *NMoose* is computed properly by comparing the recorded total number of moose with
a computed total number of moose. 

```{r echo=TRUE, message=TRUE, warning=TRUE}
# check the total moose read in vs computed number
survey.data$myNMoose <- survey.data$Bulls +
                       survey.data$Lone.Cows+
                       survey.data$Cow.W.1...calf*2+
                       survey.data$Cow.W.2.calves*3+
                       survey.data$Lone...calf+
                       survey.data$Unk.Age.Sex

ggplot(data=survey.data, aes(x=myNMoose, y=NMoose))+
  ggtitle("Compare my count vs. count on spreadsheet")+
  geom_point(position=position_jitter(h=.1,w=.1))+
  geom_abline(intercept=0, slope=1)
```


We count the number of observations by sub-unit and stratum
to ensure that *Stratum* has been coded properly. For example ensure that

-  stratum case is consistent (e.g., *h* vs. *H*), 
-  stratum codes are consistent (e.g., *H* vs. *HIGH*).

```{r echo=TRUE, message=TRUE, warning=TRUE}
xtabs(~Stratum, data=survey.data, exclude=NULL, na.action=na.pass)
```

We can also add additional computed variables. For example, the number of cows is found as

```{r echo=TRUE, message=FALSE, warning=FALSE}
survey.data$Cows <- survey.data$Lone.Cows + survey.data$Cow.W.1...calf + survey.data$Cow.W.2.calves
```

The first few records of the counts of interest are:

```{r echo=FALSE, message=FALSE, warning=FALSE}
head(as.data.frame(survey.data[,c("Block.ID","Stratum","Bulls","Lone.Cows","Cow.W.1...calf",
                    "Cow.W.2.calves","Cows","Lone...calf","Unk.Age.Sex","NMoose")]))
```

## Block area

The area of each surveyed block is also needed. 
```{r echo=TRUE, message=TRUE, warning=TRUE}
# Get the area of each block
survey.block.area <- readxl::read_excel(
                        system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
                        sheet="BlockArea",
                        skip=1,.name_repair="universal")
head(survey.block.area)
```


The name of the variable identifying
the block should match the name used in the raw data from the previous section.
```{r echo=TRUE, message=FALSE, warning=FALSE}
names(survey.block.area)
names(survey.data)
```

Every block that is surveyed should have an area, but the block area file can have areas 
for blocks not surveyed (e.g., every block 
in the population of blocks). If a area is available for a block that was not surveyed, it is ignored.


```{r echo=TRUE, message=FALSE, warning=FALSE}
# Check that every surveyed block has an area. The setdiff() should return null.
setdiff(survey.data$Block.ID, survey.block.area$Block.ID)

# It is ok if the block area file has areas for blocks not surveyed
setdiff(survey.block.area$Block.ID, survey.data$Block.ID)
```


## Stratum information

Information about each stratum such as the total area and total number of blocks is required

```{r echo=TRUE, message=TRUE, warning=TRUE}
stratum.data <- readxl::read_excel(
      system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
      sheet="Stratum",
      skip=2,.name_repair="universal")
stratum.data
```

Ensure that the stratum labels match those used in the raw data

## Sightability model.

A logistic regression comparing observed groups of moose to known groups of moose (e.g., based on radio collared moose)
was used to estimate the sightability model (e.g., Quayle, 2001). 

### Extracting from *AerialSurvey*

The sightability model used was extracted from the *AerialSurvey* (Unsworth, 1999)
program input files and entered on the
worksheet in the Excel workbook.
The *AerialSurvey* program uses two files to describe the survey (and computations) and the
model for the sightability.

The *Moose.mdf* is a text-file that describes the model used for sightability corrections.
A portion of the file is shown below

```
;==============================================================================
; MOOSE.MDF - Model description file (MDF) for Aerial Survey - 7-Oct-94
; -comments start with a semicolon (any text after it will be ignored)
; -blank value fields (right hand side) will default to nil, zero, or no!
;==============================================================================
[Model]
Title    = "Moose, Bell JetRanger"
Subtitle = '<Uses only the % veg cover>' ; optional
Species  = Moose
Aircraft = "Bell JetRanger"
Locale   = xxxxxx
Version  = 1.00
Terms    = 2           ; number of terms in the model, max = 10
Date     = 04.11.00
```

This appears to be the model for Bell JetRanger aircraft for moose.
The model for sightability appears to be function only of vegetation cover with two term
(intercept and vegetation cover class).

Vegetation cover appears to be divided into 5 classes:
```
[Transformations]
;Type of transformations can be:
;  Class, LookupT1, LookupT2, LookupT3, LookupT4, Table1, Table2, Power, None
Number = 1  ; number of transformations, max = 5
; for each transformation, give the name/index of the variable,
; and the type of transformation required on the variable
; TransformX = variable, type
Transform1 = VegCover, Class

[Class]  ; Class - e.g., vegetation cover classes into intervals
; for each interval give lower and upper limits and the assigned value
Intervals = 6  ; number of class intervals, max = 9
; ClassX = >lower, <=upper, value - ascending, sequential order
Class1 = -0.1,  20.0, 1.0
Class2 = 20.0,  40.0, 2.0
Class3 = 40.0,  60.0, 3.0
Class4 = 60.0,  80.0, 4.0
Class5 = 80.0,  100.0, 5.0
```
For example, if vegetation cover is from 0 to 20%, this is class 1; from 20%+ to 40% is class 2; etc.

The file also contains the estimated model coefficients and estimated covariance matrix for the sightability model..
```
; Map input variables into X's
[X] ; label/index from a list of variables
X01 = 1.0       ; constant
X02 = VegCover  ; class transformation

;------------------------------------------------------------------------------
; Model coefficients
;------------------------------------------------------------------------------
[Beta]   ; beta vector
Beta01 =  4.9604    ; constant
Beta02 = -1.8437    ; vegetation cover

[Sigma]  ; sigma matrix
Sigma01 =  0.7822
Sigma02 = -0.2820,  0.1115
```
The estimated probability of sighting a moose is
$$logit(p)=4.9604 -1.8437(vegetation~class)$$
Notice that vegetation class is treated as continuous variable and 
used directly with values of 1 to 5 rather than as 5 indicator variables. 
This implies that the $logit(p)$ increases linearly at the same rate as the vegetation cover class goes
from 1 to 2 or from 2 to 3, etc. In some cases, sightability may depend on a categorical
variable (e.g., north or south aspect). Then a series of indicator variables will have to be created, e.g
*AspectNorth* takes the value of 1 if the aspect is north and the value of 0 if the aspect is south.
If the categorical variable has $K$ classes, you will need $K-1$ indicator variables. Contact me
for help on this issue.

### Reading coefficients of the model

The information from the *AerialSurvey* (Unsworth, 1999) program was transferred to the
Excel workbook containing the survey information for this study.

The estimated coefficients of the model and the 
covariance matrix (sigma matrix) of the estimated coefficients are input

```{r echo=TRUE, message=FALSE, warning=FALSE}
# number of beta coefficients
nbeta <- readxl::read_excel(
   system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
   sheet="SightabilityModel",
   range="B2", col_names=FALSE)[1,1,drop=TRUE]
# Here Range="B2" refers to a SINGLE cell (B2) on the spreadsheet

# extract the names of the terms of the model
beta.terms <- unlist(readxl::read_excel(
     system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
     sheet="SightabilityModel",
     range=paste0("B3:",letters[1+nbeta],"3"), 
     col_names=FALSE, col_type="text")[1,,drop=TRUE], use.names=FALSE)
# Here the range is B3: C3 in the case of nbeta=2 etc
cat("Names of the variables used in the sightability model:", beta.terms, "\n")

# extract the beta coefficients
beta <- unlist(readxl::read_excel(
    system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
    sheet="SightabilityModel",
    range=paste0("B4:",letters[1+nbeta],"4"), 
    col_names=FALSE, col_type="numeric")[1,,drop=TRUE], use.names=FALSE)
# Here the range is B4: C4 in the case of nbeta=2 etc
cat("Beta coefficients used in the sightability model:", beta, "\n")

# extract the beta covariance matrix
beta.cov <- matrix(unlist(readxl::read_excel(
    system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
    sheet="SightabilityModel",
    range=paste0("B5:",letters[1+nbeta],4+nbeta), 
    col_names=FALSE, col_type="numeric"), use.names=FALSE), ncol=nbeta, nrow=nbeta)
cat("Beta covariance matrix used in the sightability model:\n")
beta.cov


```

These values match those in the *AerialSurvey* (Unsworth, 1999) package.

The estimated sightability and expansion factors at the group level (inverse of sightability) are:

```{r echo=FALSE, message=FALSE, warning=FALSE}
Cover <- data.frame(intercept=1, VegCoverClass=1:5)
Cover$logit.p <- as.matrix(Cover[,1:nbeta ]) %*% beta
Cover$logit.p.se <- diag(sqrt( as.matrix(Cover[,1:nbeta ]) %*% beta.cov %*% t(as.matrix(Cover[,1:nbeta ]))))
Cover$p       <- 1/(1+exp(-Cover$logit.p))
Cover$p.se    <- Cover$logit.p.se * Cover$p * (1-Cover$p)
Cover$expansion <- 1/Cover$p
Cover$expansion.se <- Cover$p.se / Cover$p^2

temp <- Cover[, -1]
temp[,2:7]<- round(temp[,2:7],3)
#temp

kable(temp, row.names=FALSE,
      caption="Estimated sightability of groups and expansion by Vegetation Cover Class",
      col.names=c("Vegetation Cover Class","logit(p)","SE logit(p)","p","SE(p)","Expansion Factor","SE(EF)")) %>%
      column_spec(column=c(1:7),         width="2cm") %>%
      kable_styling("bordered",position = "center", full_width=FALSE)


```

So a single moose sighted in Vegetation Cover Class 5 is expanded to almost 72 moose (!).


### Creation of *VegCoverClass* variable

We need to compute a *VegCoverClass* variable in the survey data. The *recode()* function
from the *car* package is used. 

```{r echo=TRUE, warning=FALSE, message=FALSE}
# convert the Veg Cover to Veg.Cover.Class

xtabs(~..Veg.Cover, data=survey.data, exclude=NULL, na.action=na.pass)
survey.data$VegCoverClass <- car::recode(survey.data$..Veg.Cover,
                                " lo:20=1; 20:40=2; 40:60=3; 60:80=4; 80:100=5")
xtabs(~VegCoverClass+..Veg.Cover, data=survey.data, exclude=NULL, na.action=na.pass)
```

Notice that there are 8 groups of moose where the *Vegetation Cover* was not recorded.
This has implications for estimation as noted later. Generally speaking,
you should not simply delete these observations because this will introduce
substantial bias in the estimates. Rather, impute a vegetation cover
class that gives the highest sightability for these groups to give a lower bound
on the number of moose in this block. This will reduce (but not eliminate) the bias caused by simply dropping
these observations.


## Key variables.

There are several key variables used to match records across the raw data, block area, and stratum data frames. The 
default variable names are:

- *Block.ID* representing the block identification number. The *Block.ID* should not be duplicated in different
strata. It can be numeric or characters.
- *Block.Area* representing the area of each surveyed block. This should be numeric. 
- *Stratum* representing the stratum identification. It can be character or numeric does not have to be a single
character.
- *Stratum.Blocks* representing the total number of blocks in each stratum.      
- *Stratum.Area* representing the total area of each stratum in the same units as the *Block.Area*. 

The analysis function has arguments that can be changed if the names of these key variables differ from above.

## Dealing with missing and zero values

### Group information

For each group sighted, the number of animals is recorded and may be divided by age and/or sex groupings.
If no animals of a particular sex/age are seen in a group, then it should be recorded as zero rather than
as missing.

In some cases, sex/age information is not available for an animal. It should be counted in a separate category
for unknown sexed/aged animals. Of course the, a naive application of estimating the number of bulls may be biased
downwards is some bulls are recorded in the unknown sexed/aged category; however, estimates of the total
number moose will be fine.

No missing values are allowed in any count.

### Covariate information

The sightability models use covariates at the group level to adjust for detection less than 100%.
No missing values are allowed for the covariates. If the covariates are not available, then you 
should not simply delete the group, but rather impute a value of the covariate that gives the highest 
sightability to form a lower bound on the number of moose in the block because this particular block is
not fully corrected for sightability.

There are two sources of bias that could occur when you drop a group with missing sightability covariate values:

-  If you delete the group and the block is now empty (and so vanishes), 
then your sample size (number of blocks surveyed) and area surveyed are too small 
and your total observed animals is too small. 
The bias could be positive or negative depending if the deleted groups are large or small.
- If the number of blocks does not change (i.e., deleting this group does make the block empty), 
then deleting the group makes the total count for this block too small. 
If you keep the block with sightability set to 1, you partially correct against 
dropping the block, because rather than having a block correction factor of 2 
(e.g., some lack of sightability due to high vegetation cover), you are using 
a sightability correction factor of 1.

Trying to predict the combined effect of these sources of bias is not easy. 
The effect of the second item is to partially correct the abundance/density estimate
compared to dropping the block.

See the example in the Sightability section below on dealing with missing covariates.

### Blocks with no observed groups.

It is possible that a block will be surveyed and NO animals seen in the entire block. In this case, you should
insert a "dummy" observation for this block with counts of 0 for the number of moose to ensure that
this surveyed block (with no animals seen) is properly accounted for. You can set the sightability variables
to any value, because $0 \times \textrm{sightability correction}$ is still 0.

### Block information

Each block must have a known area. Missing values are not allowed.

### A single block in a stratum

If a stratum has only a single block surveyed, then no design based estimate of variance
is possible. This is known as the *lonely primarily sampling unit (lonely PSU)*. Ad hoc
adjustments are possible; refer to
http://r-survey.r-forge.r-project.org/survey/exmample-lonely.html
for some suggestions.

The *MoosePopR()* function has an argument that can be set using
```
MoosePopR(...,  survey.lonely.psu="remove")
```
The the variance for this stratum is not considered when estimating the variance of the
grand totals or density. Unfortunately, it is not possible, at this time, to estimate
a ratio with some strata having lonely PSU. 

The *SightabilityPopR()* function can
deal with lonely PSUs by using within block variability (e.g., among groups) so this will provide a lower
bound to the true uncertainty.

The bottom line is avoid designs where you sample only a single block in a stratum!

### Stratum information

Each stratum must have the known total area for the *MoosePop()* function and the
known total number of blocks for the *SightabilityPopR()* function.

# Analysis

## Analysis not adjusting for sightability 

### Within a stratum

#### Density

Suppose a sample of $n_h$ blocks were surveyed in stratum $h$. 
For each block, the number of moose $m_{hi}$
and the area of the block $a_{hi}$ are obtained for block $i$ in stratum $h$. 
The estimated density of moose per unit area 
in stratum $h$ is obtained as:
$$\widehat{D}_h = \frac{\sum_i m_{hi}}{\sum_i a_{hi}}$$
i.e., as the total number of moose in the surveyed blocks divided by the total survey area. 
This is a standard ratio estimator
and the estimated standard error is found as using equation 6.9 of Cochran (1977). 
The estimator and its standard error
can be automatically computed using the *svyratio()* function 
in the *survey* package (Lumley, 2019) of *R*.

Similar formula are used for estimating the density of other sex/age categories.

#### Abundance

The estimated total number of moose in stratum $h$ ($\widehat{M}_h$) is
found by multiplying the density estimate and its standard error by the stratum area
$A_h$

$$\widehat{M}_h = \widehat{D}_h  \times A_h$$
$$se(\widehat{M}_h) = se(\widehat{D}_h) \times A_h$$
Similar formula are used for estimating the abundance of other sex/age categories.

#### Bull to Cow and other ratios

Suppose a sample of $n_h$ blocks were surveyed in stratum $h$. For each block, 
the number of bulls $b_{hi}$
and the number of cows $c_{hi}$ are obtained for block $i$ in stratum $h$. 
The estimated bull to cow ratio
in stratum $h$ is obtained as:
$$\widehat{BC}_h = \frac{\sum_i b_{hi}}{\sum_i c_{hi}}$$
i.e., as the total number of bulls ($\sum_i b_{hi}$) in the surveyed blocks divided by the 
total number of cows ($\sum_i c_{hi}$) in the surveyed blocks. This is a standard ratio estimator
and the estimated standard error is found as using equation 6.9 of Cochran (1977). The estimator and its standard error
are automatically computed using the *svyratio()* function in the *survey* package (Lumley, 2019) of *R*.

If the number of bulls per 100 cows is wanted, simply multiply the estimate and its standard error by $100\times$.

Other ratios such as calves:cows are computed in similar ways.

### Across all strata

#### Density

The overall density for the number of moose is found as a weighted averaged of the stratum specific densities:
$$\widehat{D}= \sum_h \frac{A_h}{\sum_h A_h}\widehat{D}_h$$

$$se(\widehat{D}) = \sqrt{\sum_h \left( \frac{A_h}{\sum_h A_h} \right)^2se(\widehat{D}_h)^2}$$

which is equivalent to the estimated total abundance (see below) divided by the total stratum area.
This is known as a *separate-ratio* estimator of the total and is possible 
because the expansion factor going from density to abundance (the stratum area) is known for each stratum.

Similar formula are used for estimating the density of other sex/age categories.


#### Abundance

The total abundance of moose is found by summing the estimated abundances across strata:

$$\widehat{M} = \sum_h \widehat{M}_h$$
and the standard error of the overall abundance is found as:
$$se(\widehat{M}) = \sqrt{\sum_h se(\widehat{M}_h)^2}$$
This is known as a *separate-ratio* estimator of the total and is possible because the expansion factor going from density to abundance (the stratum area) is known for each stratum.

Similar formula are used for estimating the abundance of other sex/age categories.

#### Bull to Cow and other ratios

The estimation of the overall bull to cow ratio is computed using the
*double-ratio* estimate (Cochran, 1977, Section 6.19) as
$$\widehat{BC}_{\textit{double~ratio}}=\frac{\widehat{B}}{\widehat{C}}$$
where $\widehat{B}$ and $\widehat{C}$ are the estimated total number of bull 
and cows individually computed.
This is equivalent to the ratio of the overall density of bulls and cows. 
The above formula can be expanded to give

$$\widehat{BC}_{\textit{double~ratio}}=\frac{\sum_h \frac{\sum_i b_{hi}}{\sum_i a_{hi}} \times A_h}
                    {\sum_h \frac{\sum_i c_{hi}}{\sum_i a_{hi}} \times A_h}$$

Its standard error is NOT directly available in the *survey* package 
but is readily computed by finding the estimates
and the covariance of the estimates of number of bull and cows 
for each stratum and then using the delta method
to find the overall ratio and standard error of the overall ratio. 

### The *MoosePopR()* function.

A *MoosePopR()* function was created in *R* to replicate the results from *MoosePop* (Reed, 1989).
The key arguments to this function are:

- *survey.data* - information on each group measured in the survey. This was read from the Excel workbook earlier.
- *survey.block.area* - information about the area of each block surveyed. This was read from the Excel workbook earlier.
- *stratum.data* - information about each stratum. This was read from the Excel workbook earlier.
- *density, abundance, numerator, denominator* - variables for which the density, 
abundance, ratio are to be formed as right-sided formula 
(e.g., density=~Cows to estimate the density of Cows).
- *conf.level=0.90* indicates the  confidence 
level for confidence intervals to compute with the default being a 90% 
confidence interval.

### Sample Analyses

**Results differ slightly from those reported on an Excel spreadsheet because of slight differences
in areas of blocks between data provided to me and that used in MoosePop (Reed, 1989)**.

We are now ready to do the analysis. I've gathered all of the analysis results into a list object
for extraction later.

**The *UC* suffix that I have placed on the estimate name indicates it is not corrected for sightability.**

I have only computed abundance and density estimates
for all moose, but similar code can be used for other age/sex categories. Similarly, I have only
considered the bull:cow ratio and other ratio can be computed in similar fashion.

#### Density of all moose

We compute the (uncorrected for sightability) density of all moose (regardless of sex or age) using the
number of moose (*NMoose*) variable. 

```{r echo=TRUE, message=TRUE, warning=TRUE}
result <- NULL
result$"All.Density.UC" <- MoosePopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,density=~NMoose                         # which density
)
result$"All.Density.UC"
```

The *estimate* and *SE* column as the estimated (uncorrected for sightability) 
density of all moose with a measure of uncertainty for
each stratum. The overall density is found as a weighted average 
(by stratum area) of the density in each stratum.

Similar function calls are used for estimating the density of other sex/age categories by changing the *density* argument.

#### Abundance of all moose

We compute the estimated (uncorrected for sightability) abundance of all moose (regardless of sex or age)
by expanding the density by the *Stratum.Area*. The overall abundance is found
as the sum of the expanded estimates.

```{r echo=TRUE, message=FALSE, warning=FALSE}
result$"All.Abund.UC" <- MoosePopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~NMoose                      # which density
)
result$"All.Abund.UC"
```

The *estimate* and *SE* column as the estimated (uncorrected for sightability) 
density of all moose with a measure of uncertainty for
each stratum. The overall abundance is found as the sum of the estimates in each stratum.

Similar function calls are used for estimating the abundance of 
other sex/age categories by changing the *density* argument.

#### Number of bulls and number of cows

We estimate the (uncorrected for sightability) abundance of Bulls and Cows.

```{r echo=TRUE, message=FALSE, warning=FALSE}
result$"Bulls.Abund.UC" <- MoosePopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~Bulls                      # which density
)
result$"Bulls.Abund.UC" 
  
result$"Cows.Abund.UC" <- MoosePopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~Cows                      # which density
)  
result$"Cows.Abund.UC"
```

These values are computed similarly as the total abundance.

#### Bull to Cow and other ratios

Finally, we estimate the Bulls:Cows ratio:

```{r echo=TRUE, message=FALSE, warning=FALSE}
result$"Bulls/Cow.UC" <- MoosePopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,numerator=~Bulls, denominator=~Cows    # which density
) 
result$"Bulls/Cow.UC"
```

Other ratios would be estimated in similar ways by changing the 
*numerator* and *denominator* arguments
to the function call.

#### Final object

```{r echo=TRUE, warning=FALSE, message=FALSE}
names(result)
```


## Adjusting for sightability

The above computations are not adjusted for sightability, i.e., not all moose
can be detected due to visibility concerns. 

An adjustment for sightability can be done by also performing a study where a known 
number of groups of moose (e.g., radio collared) are surveyed and information about detection/non detection
is recorded at the group level. A logistic regression (see previous sections) is used to estimate the
probability of detection of a group given covariates (such as Vegetation Class). 

Then the sightability model is used to expand the observed number of groups of moose by the
a sightability correction factor (SCF). This is related to the
inverse of the probability of detection, but includes a correction
to account for the uncertainty in the estimates of the coefficients 
of the sightability model (see Fieberg, 2012, Equation 1 and later in this document).

For example, consider a model to estimate detectability given in Quayle et al. (2001):


```{r echo=FALSE, warning=FALSE, message=FALSE}
sightability.model <- ~VegCoverClass
sightability.beta  <-  c(4.2138, -1.5847)
sightability.beta.cov <- matrix(c(0.78216336,	-0.282,	-0.282,	0.11148921), nrow=2, ncol=2)


sightability.table <- data.frame(VegCoverClass=1:5,
                                 VegPercent=c("00-20","21-40","41-60","61-80","81-100"))
sightability.table$detect.prob <- SightabilityModel::compute.detect.prob(sightability.table, 
                                                      sightability.model, 
                                                      sightability.beta, 
                                                      sightability.beta.cov)

sightability.table$SCF <- SightabilityModel::compute.SCF(sightability.table, 
                                                      sightability.model, 
                                                      sightability.beta, 
                                                      sightability.beta.cov)


kable(sightability.table, row.names=FALSE,
      caption="Estimated sightability correction factor for each vegetation cover class",
      col.names=c("Veg Cover Class","Veg Cover %","Detection probability","Sightability Correction Factor"),
      digits=c(0,0, 3,2)) %>%
      kable_styling("bordered",position = "center", full_width=FALSE, latex_options = "HOLD_position")


```

Notice that that, for example, that 1/`r sightability.table$prob.detect[2]`=`r 1/sightability.table$prob.detect[2]`
is only approximately equal to `r sightability.table$SCF[2]`.

**It is assumed that all animals in a group (e.g., bulls, cows, calves) are observed
if a group is detected.**

Let $\theta_{hjk}$ represent the expansion factor for group $k$ in block $j$ of stratum $h$.
In the analyses that follow the block area is NOT used in the analysis. In its place, a simple
expansion based on the number of blocks surveyed within a stratum are used. This will be less
efficient (i.e., have a larger standard error) compared to an analysis that uses the individual
block areas, but unless there is a high correlation between the number of animals and the block area,
the loss of efficiency is small. If all blocks had the same area, the two approaches are identical.

We examined the correlation for the existing survey data and it is typically small.
Cochran (1977) shows that unless the correlation is larger than a specified cutoff
that depends on the coefficient of variation of the numerator and denominator variables,
then there is no gain in using the block area, i.e., unless
$$\rho > \frac{1}{2}\frac{SD(denominator~variable)/mean(denominator~variable)}{SD(numerator~variable)/mean(numerator~variable)}  $$
where $\rho$ is the correlation between the numerator and denominator variable (i.e., between
abundance of the numerator variable and block area)$, then a ratio estimator using
the area of blocks is not more efficient. 

```{r echo=FALSE,message=FALSE, warning=FALSE}
# Look at correlations between total number of moose, bulls, and cows and block area
survey.block.data <- plyr::ddply(survey.data, 
                                 c("Block.ID","Stratum"),
                                 plyr::summarize,
                                 Bulls          = sum(Bulls,           na.rm=TRUE),
                                 Lone.cows      = sum(Lone.Cows,       na.rm=TRUE),
                                 Cow.W.1...calf = sum(Cow.W.1...calf,  na.rm=TRUE),
                                 Cow.W.2.calves = sum(Cow.W.2.calves,  na.rm=TRUE),
                                 Lone...calf    = sum(Lone...calf,     na.rm=TRUE),
                                 Unk.Age.Sex    = sum(Unk.Age.Sex,     na.rm=TRUE),
                                 Cows           = sum(Cows,            na.rm=TRUE),
                                 NMoose         = sum(NMoose,          na.rm=TRUE))
# add the area to the block totals
survey.block.data <- merge(survey.block.data, survey.block.area, all.x=TRUE)

# What is correlation between block area and number of moose etc
survey.block.data.melt <- reshape2::melt(survey.block.data,
                        id.vars=c("Stratum","Block.ID","Block.Area"),
                        measure.vars=c("Bulls","Lone.cows","Cow.W.1...calf","Cow.W.2.calves",
                                       "Lone...calf","Unk.Age.Sex","Cows","NMoose"),
                        variable.name="Classification",
                        value.name="N.Animals")

# find correlation between number of animals and area
res <- plyr::ddply(survey.block.data.melt, c("Stratum","Classification"), plyr::summarize,
                   corr=cor(N.Animals, Block.Area),
                   cv.N.Animals=sd(N.Animals)/mean(N.Animals),
                   cv.Area     =sd(Block.Area)/mean(Block.Area),
                   cut.off      = cv.Area/2/cv.N.Animals)

temp <- res[,c(1,2,3,6)]
temp[,3:4] <- round(temp[,3:4],2)
kable(temp, row.names=FALSE,
      caption="Estimated correlation between animal counts and block area",
      col.names=c("Stratum","Type of Animal","Corr","Cutoff")) %>%
      column_spec(column=c(1:2),         width="3cm") %>%
      column_spec(column=c(3:4),         width="1cm") %>%
      kable_styling("bordered",position = "center", full_width=FALSE) %>%
      footnote(threeparttable=TRUE,
               general=paste0("Cutoff is 0.5 ratio of sd/mean of area and abundance of each type of animal",
                              " and unless the correlation exceeds the cutoff, there is no advantage",
                              " in using the block area in the analysis"))

```

The correlations are not high. While about half of the response variables
have correlations of abundance with block area that exceed the cutoff, the variation
in the block areas is not that large, so we don't think that also adjusting for block area is necessary.
Of course, this also does not account for sightability issues. 

A plot of the two variables for each types of animals shows no strong relationship between 
the number of observed animals and the block area:

```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(data=survey.block.data.melt, aes(x=Block.Area, y=N.Animals, color=Stratum))+
  geom_point()+
  facet_wrap(~Classification, ncol=3, scales="free")+
  theme(legend.position=c(1,0), legend.justification=c(1,0))
```

Consequently, it not expected that ignoring block area in the analysis will have
a material effect.

### Within a stratum

#### Abundance

Suppose a sample of $n_h$ blocks were surveyed in stratum $h$ out of $N_h$ total number of blocks. 
For each block, the number of moose $m_{hjk}$
are obtained for group $k$ in block $j$ in stratum $h$. The
estimated correction factor (for visibility of the group) is 
$\widehat{\theta}_{hjk}$ which is computed give the vector of covariates (x_{hjk}) as (see page 4 of Fieberg, 2012)

$$\widehat{\theta}_{hjk}=  1+\exp{(-x_{hjk}^t \widehat{\beta}-x_{hjk}^t \Sigma x_{hjk} / 2)}$$

Notice that the sightability correction factor is not simply $1/\textit{probability of detection}$
or $1+\exp{(-x_{hjk}^t \widehat{\beta}$
because it includes a correction factor to account for the non-linear transformation from the logit to the probability scale.

The estimated abundance of moose in stratum $h$ is obtained as:
$$\widehat{M}_h = N_h \times \frac{\sum_{jk} m_{hjk}\widehat{\theta}_{hjk}}{n_h}$$
i.e., as the mean number of (sightability corrected) moose per surveyed block times the total number of blocks. The
standard error of this estimate is complicated because of the need to account for the estimated sightability
but the theory is covered in Steinhorst and Samuel (1989), Fieberg (2012) and Wong (1996) and
implemented in the *AerialSurvey* (Unsworth, 1999) and *SightabilityModel* packages of *R*. The latter 
package corrects the variance computations used by Steinhorst and Samuel (1989) and Fieberg (2012).

Similar computations are used for estimating the abundance of other sex/age categories.

#### Density

The estimated density of moose in stratum $h$ ($\widehat{D}_h$) is 
found by dividing the abundance estimate and its standard error by the stratum area
$A_h$

$$\widehat{D}_h = \frac{\widehat{M{_h}}}{A_h} $$
$$se(\widehat{M}_h) = \frac{se(\widehat{M}_h)}{A_h}$$

Similar computations are used for estimating the density of other sex/age categories.

#### Bull to Cow and other ratios

The ratio of bulls to cows within a stratum is computed as the estimated abundances in each stratum:
$$\widehat{BC}_h = \frac{\widehat{Bulls}_h}{\widehat{Cows}_h}$$
i.e., as the estimated total number of bulls in the stratum
divided by the estimated total number of cows in the stratum. This ratio estimator is not available
in the *SightabilityModel* package (Fieberg, 2012), but a new function was written and added to the package to
add this functionality. The standard error is difficult to compute because of the need to 
account for the correlation between the number of bulls and moose in individual groups and the
the uncertainty of the correction factors but is given in Wong (1996).

Note that if the sightability is equal for all groups, then this common sightability correction
factor would cancel everywhere and you would get the same results as the ratio estimator based
on the observed groups only. In most cases, the impact of sightability will be small on the ratio estimator
and its standard error.

If the number of bulls per 100 cows is wanted, simply multiply the estimate and its standard error by $100\times$.

Similar computations occur for other ratios such as calves:cows.

### Across all strata

#### Abundance

The total abundance of moose is found by summing the estimated abundances across strata:

$$\widehat{M} = \sum_h \widehat{M}_h$$
The standard error is difficult to compute because the same sightability model is used across strata. 
This has been implemented in the *SightabilityModel* package (Fieberg, 2012) and the *AerialSurvey* (Unsworth, 1999) program.

Similar computations are used for estimating the abundance of other sex/age categories.

#### Density

The overall density of moose and its standard error 
is found as estimated total abundance and its standard error
divided by the total area over all strata..
$$\widehat{D}= \frac{ \widehat{M}}{\sum_h A_h}$$
$$se(\widehat{D}) =  \frac{se(\widehat{M})}{{\sum_h A_h}}$$
Because the total area of all strata is known quantity, this is straightforward.

Similar computations are used for estimating the density of other sex/age categories.


#### Bull to Cow and other ratios

The estimation of the overall bull to cow ratio is computed as the ratios of their respective estimated
abundances
$$\widehat{BC}=\frac{\widehat{Bulls}}{\widehat{Cows}}$$
This ratio estimator was not available in the *SightabilityModel* package (Fieberg, 2012), 
but its functionality has been added
as part of this contract. Standard errors are computed as described in Wong (1996).

Similar computations can be used for other ratios such as calves:cows.

### The *SightabilityPopR()* function

I have created a function with the same calling arguments as the *MoosePopR()* function to
estimate abundance, density, and ratios using the *SightabilityModel* package. 
Notice that additional functionality was added to the *SightabilityModel* package (Fieberg, 2012) to estimate ratios.


A *SightabilityPopR()* function has a similar calling sequence as the *MoosePopR* function with 
the same key input variables.

- *survey.data* - information on each group measured in the survey. This was read from the Excel workbook earlier.
- *survey.block.area* - information about the area of each block surveyed. This was read from the Excel workbook earlier.
- *stratum.data* - information about each stratum. This was read from the Excel workbook earlier.
- *density, abundance, numerator, denominator* - variables for which the density, abundance, ratio are to be formed as right-sided formula (e.g., density=~Cows to estimate the density of Cows).
- *conf.level=0.90* indicates the  confidence 
level for confidence intervals to compute with the default being a 90% 
confidence interval.

Additionally, the sightability formula, the beta coefficients and the estimated covariance matrix
must also be specified using the arguments:

- *sight.formula* - the formula for the sightability model. This should use the variables in the *survey.data* frame
- *sight.beta* - the beta coefficients for the sightability model
- *sight.beta.cov* - the covariance matrix for the beta coefficients
- *sight.logCI* - should confidence intervals for abundance be computed using a logarithmic transformation
- *sight.var.method* - what method (either *"Wong"* or *"SS"*) should be used to estimate the variances .
using the methods of Wong (1996) or Steinhorst
and Samuels (2012). The default (and preferred) method is *"Wong"* because she corrects 
the variance computations used by Steinhorst and Samuel (1989) and Fieberg (2012).

These additional parameters are passed to the appropriate functions in the *SightabilityModel* package (Fieberg, 2012). 
Refer to
the examples for details.
 
### Sample Analyses

#### Dealing with missing values in the sightability covariates

Because we are correcting for sightability, we need the *VegCoverClass* for all groups. However
there are some groups of moose with missing data for this covariate:
```{r echo=TRUE, message=FALSE, warning=FALSE}
select <- is.na(survey.data$VegCoverClass)
survey.data[select,]
```
If you simply delete these observations, then estimates can be severely biased because these groups
would be excluded from the expansions when corrected for sightability. About the best that can be done is
to assign an imputed vegetation class to these observations with the highest sightability to reduce the
bias.

```{r echo=TRUE, message=FALSE, warning=FALSE}
# change the missing values to cover class with highest sightability
survey.data$VegCoverClass[select] <- 1
survey.data[select,]

```


We are now ready to do the analysis. I've gathered all of the analysis results into a list object
for extraction later.

**The *C* suffix that I have placed on the estimate name indicates it is  corrected for sightability.**

I have only computed abundance and density estimates
for all moose, but similar code can be used for other age/sex categories. Similarly, I have only
considered the bull:cow ratio and other ratios can be estimated in similar fashion.

#### Abundance of all moose

We compute the (corrected for sightability) abundance of all moose (regardless of sex or age) using the
number of moose (*NMoose*) variable. 

```{r echo=TRUE, message=TRUE, warning=TRUE}
result <- NULL
result$"All.Abund.C" <- SightabilityPopR(
      survey.data        = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~NMoose                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
temp <- result$"All.Abund.C"
temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0)  # force to be numeric
temp
```
The *estimate* and *SE* column as the estimated (corrected for sightability) 
density of all moose with a measure of uncertainty for
each stratum. The overall abundance is found as the sum of the estimates in each stratum. The right most columns
are information about the sources of uncertainty (due to sampling, sightability, or the model for sightability).

Similar function calls are used for estimating the abundance of other sex/age categories by changing the *abundance* argument.

We can show the impact of simply deleting those values missing the *VegCoverClass* covariate by
setting the observed values of moose to .001 for these groups. This will retain the correct number
of blocks surveyed. If you simply deleted those groups of moose with missing *VegCoverClass* values
and a block then had no observations, the block would 
"disappear" from the analysis which is not correct.

```{r echo=TRUE, message=FALSE, warning=FALSE}
# mimic impact of deleting observations with missing VegCoverClass
survey.data2 <- survey.data
survey.data2$NMoose[select] <- .001
wrong.way <- SightabilityPopR( # show impact of deleting observations but keeping # blocks the same
      survey.data        = survey.data2         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~NMoose                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
wrong.way
```

Notice the negative bias in the estimated abundance computed the incorrect way
(`r round(wrong.way[wrong.way$Stratum==".OVERALL","estimate"])`) vs. the value
when the *VegCoverClass* is set to the class with highest sightability 
(`r round(result$"All.Abund.C"[result$"All.Abund.C"$Stratum==".OVERALL","estimate"])`) computed above.

In the real world, if a group has missing values for covariates using in the sightability model, do not change
the count data, but change the missing covariates to values that result in the highest possible sightability
for this group. In this way, the number of blocks surveyed is computed properly and you have partially the
bias in dropping these groups by including them with an imputed high sightability value.


#### Density of all moose

We compute the estimated (corrected for sightability) density of all moose (regardless of sex or age).

```{r echo=TRUE, message=FALSE, warning=FALSE}
result$"All.Density.C" <- SightabilityPopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,density=~NMoose                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
temp<- result$"All.Density.C"
temp[,-(1:3)] <- round(temp[,-(1:3)]+.00001,2)  # force to be numeric
temp
```

The *estimate* and *SE* column as the estimated (corrected for sightability) 
density of all moose with a measure of uncertainty for
each stratum. The overall density is found as a estimated abundance/total area over all strata. The variance components are
the same for the total abundance because these values are used to derive the density.

Similar function calls are used for estimating the density of other sex/age categories by changing the *density* argument.


#### Number of bulls and number of cows

We estimate the (corrected for sightability) abundance of Bulls and  cows.

```{r echo=TRUE, message=FALSE, warning=FALSE}
result$"Bulls.Abund.C" <- result$"All.Density.C" <- SightabilityPopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,density=~Bulls                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
result$"Bulls.Abund.C"

result$"Cows.Abund.C" <- SightabilityPopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,density=~Cows                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
result$"Cows.Abund.C"
```
These values are computed similarly as the total abundance.

#### Bull to Cow and other ratios

Finally we estimates the Bulls/Cow ratio using the corrected for sightability values.

```{r echo=TRUE, message=FALSE, warning=FALSE}
result$"Bulls/Cow.C" <- SightabilityPopR(
      survey.data       = survey.data         # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,numerator=~Bulls, denominator=~Cows                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
  
temp<- result$"Bulls/Cow.C"
temp[,-(1:3)] <- round(temp[,-(1:3)]+.00001,3)  # force to be numeric
temp
```

The ratio is computed within each stratum and overall
The sources of variation computations are set to missing as they are not sensible in this case.

Notice that this ratio estimator is not that different from that computed from values uncorrected
for sightability. This is a consequence of the approximately equal sightability correction factor
for all groups.

Other ratios can be computed by changing the *numerator* and *denominator* arguments
in the function call.


#### Final object

The result is a list object with one element each of the analyses done above.
```{r echo=TRUE, message=FALSE, warning=FALSE}
names(result)
```

# Domain estimation

Domain estimation refer to estimating density, abundance, or ratios for subsets of the survey area.
For example, in British Columbia, a survey may take place over a combined Buckley Valley and Lake District (sub units)
survey area, but estimates of abundance are also wanted for Buckley Valley and Lake District individually.
In this case, the sub-units are domains.

There are four cases to consider depending if information about the primary sampling units (PSU) (i.e., blocks)
and stratum areas are known or unknown
at the domain level  $\times$ the domain separation operates at the PSU (block) or group level.

## Domain at PSU (block) level; domain population information known

This is the simplest case. Here the primary sampling units (PSU) (or blocks) are classified as belonging
to the domain or not. For example, the study area could be divided into two or more management units
and each PSU (block) belongs to one and only one management units in its entirety.
The total area of the management unit and the total number of potential PSUs is known
for each management unit.

Because PSUs (blocks) were randomly selected within each stratum with equal probability, they are also
randomly selected from each domain (management unit). 

Consequently, just subset the survey data (observed animals) to those PSUs (blocks) belonging to the domain of interest and
change the stratum information to refer to the total area and total PSUs (block) for that domain and 
estimation for density, abundance, and ratios occurs exactly as before.

### Example

For example, the survey data used in the examples was classified in Domain $A$ and $B$,
and a separate worksheet in the workbook has information about stratum information for Domain $A$. We 
subset the survey data, read in the stratum information for Domain $A$ and then
use the same methods as shown before.


The PSUs (blocks) in the example are split between two domains (*A* and *B*) as shown below:

```{r echo=TRUE, message=FALSE, warning=FALSE}
# PSU's are split between the domains
xtabs(~Domain+Block.ID, data=survey.data, exclude=NULL, na.action=na.pass)
```

The number of groups of moose in each block is shown. 

We subset the survey data:

```{r echo=TRUE, message=FALSE, warning=FALSE}
# Subset the survey data
survey.data.A <- survey.data[ survey.data$Domain == "A",]
```

The block information (block area) is unchanged and does not need to be subset.

We need information at the stratum level for domain *A* which
is available on the workbook:
```{r echo=TRUE, message=FALSE, warning=FALSE}
# Domain information for the population for each stratum
stratum.data.A <- readxl::read_excel(
      system.file("extdata", "SampleMooseSurveyData.xlsx", package = "SightabilityModel", mustWork=TRUE),
      sheet="Stratum-DomainA",
      skip=2,.name_repair="universal")

kable(stratum.data.A, row.names=FALSE,
      caption="Stratum totals for Domain A",
      col.names=c("Stratum","Stratum Area","Stratum # blocks")) %>%
      column_spec(column=c(1:3),         width="2cm") %>%
      kable_styling("bordered",position = "center", full_width=FALSE)

```

We can now estimate abundance, density, or ratios for Domain A in the same way as before with and without
corrections for sightability. We show only the estimated abundance for all moose after correcting
for sightability. Note the use of **survey.data.A** and **stratum.data.A***

```{r echo=TRUE, message=FALSE, warning=FALSE}
All.Abund.A.corrected <- SightabilityPopR(
      survey.data       = survey.data.A         # raw data for domain A
      ,survey.block.area = survey.block.area    # area of each block
      ,stratum.data      = stratum.data.A       # stratum information for domain A
      ,abundance=~NMoose                        # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
temp <- All.Abund.A.corrected
temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0)  # force to be numeric
temp[,1:10]

```


## Domain at PSU (block) level; domain population information unknown.

In some cases, only the surveyed PSUs can be classified by domain and the stratum total area and stratum total blocks
is unknown for the domain. Cochran (1977, Sections 2.13 and 5A.14) discuss this case. 

Abundance and ratios can be estimated by setting all survey data to 0 for blocks/groups not part of the domain,
and then using the estimating functions on the complete data after zeroing. 
This may seem strange, but consider the following table
```
                    PSU
Domain     A     A     A     A     B     B     B     B    B    B
Y          2     3     3     4     5     3     2     3    2    4
```

The total of $Y$ in Domain $A$ is $2+3+3+4=12$. If we consider the modified data
```
                     PSU
Domain     A     A     A     A     B     B     B     B    B    B
Y'         2     3     2     3     0     0     0     0    0    0
```
and find the mean of $\overline{Y'}=(2+3+3+4+0+0+0+0+0+0)/10=1.2$ and multiply the mean
of *Y'* by the population number of PSU (10) we get the domain total of $Total(y)=\overline{Y'} \times 10=1.2 \times 10=12$.


### Example

We set the survey data to 0 if the domain is not *A* (notice that we do NOT delete observations).

```{r echo=TRUE, message=FALSE, warning=FALSE}
# Set number of moose to zero if not part of domain A
survey.data.A.z <- survey.data
survey.data.A.z$NMoose[ survey.data$Domain != "A"] <- 0
```

We now use the modified survey data (**survey.data.A.Z** but the original group and stratum data:

```{r echo=TRUE, message=FALSE, warning=FALSE}
All.Abund.A.corrected.z <- SightabilityPopR(
      survey.data        = survey.data.A.z    # raw data with non-domain counts set to zero
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~NMoose                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
temp <- All.Abund.A.corrected.z
temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0)  # force to be numeric
temp[,1:10]

```

The estimates will be similar (but not exactly the same) between this analysis and the previous section.

Estimates of ratios can be found in the same way because these are based on abundances

**However, estimates of density cannot be found using this method**. 
Refer to Cochran (1977, Section 5A.14, *Estimating Domain Means*) for more details.


## Domain at group level; domain population information known
The analysis in this case is similar to the comparable case
when the domain occurs at the PSU (block) level. 
Now **groups** are classified as belonging
to the domain or not. For example, interest may lie in the number of moose in each
vegetation class. 

All blocks are used, but now in each block, you will subset the groups that belong to the domain.
The hard part is the area  in this domain of EACH PSU (block) must also be known. Presumably, this is a GIS
exercise in the case of vegetation cover, assuming that entire survey area has been mapped.

Similarly, the total area in the stratum for this domain must be known. This is presumably a GIS exercise.

The analysis proceeds by subsetting the survey data to groups that fall within this domain, but 
**you need to ensure that groups without this domain are still included (with 0 counts for the response variable)**.
You also need to use
the modified block areas and modified stratum area in this domain. 
Please contact me if you are attempting to do this as the programming *R* is tricky, to say the least.


## Domain at group level; domain population information unknown.
In some cases, only the surveyed groups can be classified by domain and the block area, 
stratum total area and stratum total blocks
is unknown for the domain. 

We use the same trick as above by setting the survey data 
to 0 if the domain is not *A* (notice that we do NOT delete observations).

### Example

Suppose we want moose abundance by vegetation cover class. The estimate of total abundance is

```{r echo=TRUE, message=FALSE, warning=FALSE}
All.Abund <- SightabilityPopR(
      survey.data        = survey.data        # raw data
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~NMoose                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
)
temp <- All.Abund
temp[,-c(1:3,8)] <- round(temp[,-c(1:3,8)]+.00001,0)  # force to be numeric
temp[,1:10]

```

We will compute the total abundance for each vegetation cover class

```{r echo=TRUE, message=FALSE, warning=FALSE}
All.Abund.vc <- plyr::llply(unique(survey.data$VegCoverClass), function(VegCover){
   # Set number of moose to zero if not part of domain A
   survey.data.z <- survey.data
   survey.data.z$NMoose[ survey.data$VegCoverClass != VegCover] <- 0  # set to zero
   #browser()
   All.Abund.vc <- SightabilityPopR(
      survey.data        = survey.data.z      # raw data that has been zeroed out
      ,survey.block.area = survey.block.area  # area of each block
      ,stratum.data      = stratum.data       # stratum information
      ,abundance=~NMoose                      # quantifies to estimate
      ,sight.formula     = observed ~ VegCoverClass # sightability mode;
      ,sight.beta        = beta                     # the beta coefficients for the sightability model
      ,sight.beta.cov    = beta.cov                 # the covariance matrix for the beta coefficients
      ,sight.var.method  = "Wong"                   # method  used to estimate the variances 
   )
   list(VegCoverClass=VegCover, est=All.Abund.vc)
})

# extract the total abundance

All.Abund.vc.df <- plyr::ldply(All.Abund.vc, function(x){
   #browser()
   res <- data.frame(VegCoverClass=as.character(x$VegCoverClass), stringsAsFactors=FALSE)
   res$estimate  <- x$est[x$est$Stratum==".OVERALL", c("estimate")]
   res$SE        <- x$est[x$est$Stratum==".OVERALL", c("SE"      )]
   res
   
})
All.Abund.vc.df <- plyr::rbind.fill( All.Abund.vc.df,
                                     data.frame(VegCoverClass=".OVERALL", estimate=sum(All.Abund.vc.df$estimate, stringAsFactors=FALSE)))
All.Abund.vc.df

```

The estimates sum to the estimate firstly computed, but the estimates from the individual 
vegetation cover classes are not independent and so the overall standard error cannot be found
from the separate estimates by vegetation cover class.

Estimates of ratios can be found in the same way because they are based on abundances.

**However, estimates of density cannot be found using this method**. 
Refer to Cochran (1977, Section 5A.14, *Estimating Domain Means*) for more details.


# References

Cochran, W.G. (1977). Sampling techniques. 3rd Edition. Wiley. New York.

Fieberg, J. (2012). Estimating Population Abundance Using Sightability Models: R SightabilityModel Package. 
Journal of Statistical Software, 51(9), 1-20. 
URL https://doi.org/10.18637/jss.v051.i09.

Gasaway, W.C., S.D. Dubois, D.J. Reed and S.J. Harbo. (1986). 
Estimating moose population parameters from aerial surveys. Biol. Pap. Univ. 
Alaska, No. 22. Institute of Arctic Biology, U. of Alaska, Fairbanks. 108 pp. 

Lumley, T. (2019) "survey: analysis of complex survey samples". R package version 3.35-1.

Quayle, J. F., A. G. MacHutchon, and D. N. Jury. (2001). Modeling moose sightability in south-central
British Columbia. Alces 37:43–54. 

R Core Team (2020). 
R: A language and environment for statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. URL https://www.R-project.org/.

Resource Inventory Standards Committee (RISC). (2002). Aerial-based inventory methods for selected 
ungulates bison, mountain goat, mountain sheep, moose, elk, deer, and 
caribou. Version 2. Standards for Components of British Columbia’s Biodiversity 
No 32. British Columbia Ministry of Sustainable Resource Management, Victoria, B.C.

Reed, D.J. (1989). MOOSEPOP program documentation and instructions. 
Alaska Department of Fish and Game. Unpublished report.

Steinhorst, K. R. and Samuel, M. D. (1989). 
Sightability Adjustment Methods for Aerial Surveys of Wildlife Populations. Biometrics 45:415--425.

Unsworth JW, A LF, O GE, Leptich DJ, Zager P (1999). Aerial Survey: User’s Manual.
Idaho Department of Fish and Game, electronic edition edition.

Wong, C. (1996). Population size estimation using the modified Horvitz-Thompson 
estimator with estimated sighting probabilities. 
Dissertation, Colorado State University, Fort Collins, USA.