Index: admbre.tex
===================================================================
--- admbre.tex (revision 1074)
+++ admbre.tex (revision 1075)
@@ -23,9 +23,9 @@
\begin{document}
\title{%
- \largetitlepart{Random Effects in\\ \ADM}
- \smalltitlepart{ADMB-RE User Guide}
- \vspace{4.5ex}\textsf{\textit{Version \admbversion~~(2013-05-01)\\[3pt]
+ \largetitlepart{Random Effects in\\\ADM}
+ \smalltitlepart{ADMB-RE User Guide}
+ \vspace{4.5ex}\textsf{\textit{Version \admbversion~~(2013-05-01)\\[3pt]
Revised manual~~(2013-06-24)}}\vspace{3ex}
}
% Author definition.
@@ -38,8 +38,8 @@
\noindent ADMB Foundation, Honolulu.\\\\
\noindent This is the manual for AD Model Builder with Random Effects (ADMB-RE)
version \admbversion.\\\\
-\noindent Copyright \copyright\ 2004, 2006, 2008, 2009, 2011, 2013 Hans Skaug \& David
-Fournier\\\\
+\noindent Copyright \copyright\ 2004, 2006, 2008, 2009, 2011, 2013 Hans Skaug \&
+David Fournier\\\\
\noindent The latest edition of the manual is available at:\\
\url{http://admb-project.org/documentation/manuals}
@@ -62,14 +62,14 @@
This document is a user's guide to random-effects modelling in \ADM\ (\scAB).
Random effects are a feature of \scAB, in the same way as profile likelihoods
are, but are sufficiently complex to merit a separate user manual. The work on
-the random-effects ``module'' (\scAR) started around~2003. The pre-existing part
+the random-effects ``module'' (\scAR) started around 2003. The pre-existing part
of \scAB\ (and its evolution) is referred to as ``ordinary'' \scAB\ in the
-following. This manual refers to Version~9.0.x of \scAB\ (and~\scAR).
+following.
Before you start with random effects, it is recommended that you have some
experience with ordinary \scAB. This manual tries to be self-contained, but it
is clearly an advantage if you have written (and successfully run) a few
-\textsc{tpl}~files. Ordinary \scAB\ is described in the \scAB\
+\textsc{tpl} files. Ordinary \scAB\ is described in the \scAB\
manual~\cite{admb_manual}, which is available from
\href{mailto:admb-project.org}{admb-project.org}. If you are new to \scAB, but
have experience with \cplus\ (or a similar programming language), you may
@@ -79,7 +79,7 @@
only can handle mixed-effect regression type models, but this is very
misleading. ``Latent variable'' would have been a more precise term. It can be
argued that \scAR\ is the most flexible latent variable framework around. All
-the mixed-model stuff in software packages, such as allowed by~R, Stata,
+the mixed-model stuff in software packages, such as allowed by R, Stata,
\textsc{spss}, etc., allow only very specific models to be fit. Also, it is
impossible to change the distribution of the random effects if, say, you wanted
to do that. The \textsc{nlmixed} macro in \textsc{sas} is more flexible, but
@@ -101,12 +101,12 @@
Why use \ADM\ for creating nonlinear random-effects models? The answer consists
of three words: ``flexibility,'' ``speed,'' and ``accuracy.'' To illustrate
these points. a number of examples comparing \scAR\ with two existing packages:
-\textsc{nlme}, which runs on R and Splus, and \scWinBUGS. In general, \scNLME\
+\textsc{nlme}, which runs on R and S-Plus, and \scWinBUGS. In general, \scNLME\
is rather fast and it is good for the problems for which it was designed, but it
is quite inflexible. What is needed is a tool with at least the computational
power of \textsc{nlme} yet the flexibility to deal with arbitrary nonlinear
random-effects models. In Section~\ref{lognormal}, we consider a thread from the
-R~user list, where a discussion took place about extending a model to use random
+R user list, where a discussion took place about extending a model to use random
effects with a log-normal, rather than normal, distribution. This appeared to be
quite difficult. With \scAR, this change takes one line of code. \scWinBUGS, on
the other hand, is very flexible, and many random-effects models can be easily
@@ -220,7 +220,7 @@
\subsection{Writing an \scAB\ program}
-%\XX{\fontindexentry{sc}{tpl} file}{writing}
+% \XX{\fontindexentry{sc}{tpl} file}{writing}
\index{TPL@\textsc{tpl} file!writing}
To fit a statistical model to data, we must carry out certain fundamental tasks,
such as reading data from file, declaring the set of parameters that should be
@@ -238,17 +238,20 @@
A \textsc{tpl}~file is divided into a number of ``sections,'' each representing
one of the fundamental tasks mentioned above. See
Table~\ref{tab:required-sections} for the required sections.
-\begin{table}[h]
+\begin{table}[htbp]
\begin{center}
\begin{tabular}%
- {@{\vrule height 12pt depth 6pt width0pt}@{\extracolsep{1em}} l l }
+ {@{\vrule height 12pt depth 6pt width0pt}@{\extracolsep{1em}} ll}
\hline
- \textbf{Name} & \textbf{Purpose} \\ \hline\\[-16pt]
+ \textbf{Name}
+ & \textbf{Purpose} \\
+ \hline\\[-16pt]
\texttt{DATA\_SECTION}
- & Declare ``global'' data objects, initialization from file. \\
- \texttt{PARAMETER\_SECTION} & Declare independent parameters. \\
+ & Declare ``global'' data objects, initialization from file.\\
+ \texttt{PARAMETER\_SECTION}
+ & Declare independent parameters. \\
\texttt{PROCEDURE\_SECTION}
- & Specify model and objective function in \cplus. \\[3pt]
+ & Specify model and objective function in \cplus. \\[3pt]
\hline
\end{tabular}
\end{center}
@@ -260,7 +263,7 @@
reference card is available in Appendix~\ref{sec:quick}.
\subsection{Compiling an \scAB\ program}
-%\XX{\fontindexentry{sc}{tpl} file}{compiling}
+% \XX{\fontindexentry{sc}{tpl} file}{compiling}
\index{TPL@\textsc{tpl} file!compiling}
After having finished writing \texttt{simple.tpl}, we want to convert it into an
@@ -281,7 +284,7 @@
The compilation process really consists of two steps.
In the first step, \texttt{simple.tpl} is converted to a \cplus\ program by a
-preprosessor called \texttt{tpl2rem} in the case of \scAR, and \texttt{tpl2cpp}
+translator called \texttt{tpl2rem} in the case of \scAR, and \texttt{tpl2cpp}
in the case of ordinary \scAB\ (see Appendix~\ref{sec:quick}). An error message
from \texttt{tpl2rem} consists of a single line of text, with a reference to the
line in the \textsc{tpl}~file where the error occurs. If successful, the first
@@ -304,25 +307,27 @@
The executable program is run in the same window as it was compiled. Note that
data are not usually part of the \scAB\ program (e.g., \texttt{simple.tpl}).
Instead, data are being read from a file with the file name
-extension~\texttt{.dat} (e.g., \texttt{simple.dat}). This brings us to the
+extension \texttt{.dat} (e.g., \texttt{simple.dat}). This brings us to the
naming convention used by \scAB\ programs for input and output files: the
executable automatically infers file names by adding an extension to its own
name. The most important files are listed in Table~\ref{tab:important-files}.
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} l l l }
-\hline
-& \textbf{File name} & \textbf{Contents} \\ \hline\\[-17pt]
-Input & \texttt{simple.dat} & Data for the analysis \\
-& \texttt{simple.pin} & Initial parameter values \\ \hline
-Output & \texttt{simple.par} & Parameter estimates \\
-& \texttt{simple.std} & Standard deviations \\
-& \texttt{simple.cor} & Parameter correlations\\
-\hline
-\end{tabular}
-\end{center}
-\caption{File naming convention.}
-\label{tab:important-files}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lll}
+ \hline
+ ~ & \textbf{File name} & \textbf{Contents} \\
+ \hline\\[-17pt]
+ Input & \texttt{simple.dat} & Data for the analysis \\
+ ~ & \texttt{simple.pin} & Initial parameter values\\
+ \hline
+ Output & \texttt{simple.par} & Parameter estimates \\
+ ~ & \texttt{simple.std} & Standard deviations \\
+ ~ & \texttt{simple.cor} & Parameter correlations \\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{File naming convention.}
+ \label{tab:important-files}
\end{table}
You can use command line options to modify the behavior of the program at
runtime. The available command line options can be listed by typing:
@@ -380,19 +385,19 @@
To use random effects in \scAB, you must be familiar with the notion of a random
variable and, in particular, with the normal distribution. In case you are not,
-please consult a standard textbook in statistics. The notation $u\sim N(\mu
-,\sigma ^{2})$ is used throughout this manual, and means that $u$ has a normal
-(Gaussian) distribution with expectation $\mu$ and variance $\sigma ^{2}$. The
+please consult a standard textbook in statistics. The notation $u\sim
+N(\mu,\sigma^2)$ is used throughout this manual, and means that $u$ has a normal
+(Gaussian) distribution with expectation $\mu$ and variance $\sigma^2$. The
distribution placed on the random effects is called the ``prior,'' which is a
term borrowed from Bayesian statistics.
A central concept that originates from generalized linear models is that of a
-``linear predictor.'' Let $x_{1},\ldots,x_{p}$ denote observed covariates
-(explanatory variables), and let $\beta _{1},\ldots,\beta _{p}$ be the
-corresponding regression parameters to be estimated. Many of the examples in
-this manual involve a linear predictor $\eta_{i}=\beta_{1}x_{1,i}+\cdots
-+\beta_{p}x_{p,i}$, which we also will write in vector form as
-$\mathbf{\eta}=\mathbf{X\beta }$. \index{linear predictor}
+``linear predictor.'' Let $x_1,\ldots,x_p$ denote observed covariates
+(explanatory variables), and let $\beta_1,\ldots,\beta_p$ be the corresponding
+regression parameters to be estimated. Many of the examples in this manual
+involve a linear predictor $\eta_i=\beta_1x_{1,i}+\cdots+\beta_px_{p,i}$, which
+we also will write in vector form as $\mathbf{\eta}=\mathbf{X\beta}$.
+\index{linear predictor}
\subsection{Frequentist or Bayesian statistics?}
@@ -419,7 +424,7 @@
Y_i=ax_i+b+\varepsilon_i,\qquad i=1,\ldots,n,
\]
where $Y_i$ and $x_i$ are the data, $a$ and $b$ are the unknown parameters to be
-estimated, and $\varepsilon_i\sim N(0,\sigma ^{2})$ is an error term.
+estimated, and $\varepsilon_i\sim N(0,\sigma^2)$ is an error term.
Consider now the situation where we do not observe $x_i$ directly, but rather
observe
@@ -428,12 +433,12 @@
\]
where $e_i$ is a measurement error term. This situation frequently occurs in
observational studies, and is known as the ``error-in-variables'' problem.
-Assume further that $e_i\sim N(0,\sigma_{e}^{2})$, where $\sigma_{e}^{2}$ is the
+Assume further that $e_i\sim N(0,\sigma_e^2)$, where $\sigma_e^2$ is the
measurement error variance. For reasons discussed below, we shall assume that we
-know the value of $\sigma_e$, so we shall pretend that~$\sigma_e=0.5$.
+know the value of $\sigma_e$, so we shall pretend that $\sigma_e=0.5$.
Because $x_i$ is not observed, we model it as a random effect with $x_i\sim
-N(\mu,\sigma_{x}^{2})$. In \scAR, you are allowed to make such definitions
+N(\mu,\sigma_x^2)$. In \scAR, you are allowed to make such definitions
through the new parameter type called \texttt{random\_effects\_vector}.
\index{random effects!random effects vector} (There is also a
\texttt{random\_effects\_matrix}, which allows you to define a matrix of random
@@ -448,8 +453,8 @@
a better term.
\item The unknown parameters in our model are: $a$, $b$, $\mu$, $\sigma$,
- $\sigma_{x}$, and $x_{1},\ldots,x_{n}$. We have agreed to call
- $x_{1},\ldots,x_{n}$ ``random effects.'' The rest of the parameters are called
+ $\sigma_x$, and $x_1,\ldots,x_n$. We have agreed to call
+ $x_1,\ldots,x_n$ ``random effects.'' The rest of the parameters are called
``hyper-parameters.'' Note that we place no prior distribution on the
hyper-parameters.
@@ -468,15 +473,15 @@
to write the code without declaring $x_i$ to be of type
\texttt{random\_effects\_vector}. However, more important is that random
effects can be used also in non-Gaussian (nonlinear) models where we are
- unable to derive an analytical expression for the distribution of~$(X,Y)$.
+ unable to derive an analytical expression for the distribution of $(X,Y)$.
- \item Why didn't we try to estimate $\sigma_{e}$? Well, let us count the
- parameters in the model: $a$, $b$, $\mu$, $\sigma$, $\sigma_{x}$, and
- $\sigma_{e}$. There are six parameters total. We know that the bivariate
+ \item Why didn't we try to estimate $\sigma_e$? Well, let us count the
+ parameters in the model: $a$, $b$, $\mu$, $\sigma$, $\sigma_x$, and
+ $\sigma_e$. There are six parameters total. We know that the bivariate
Gaussian distribution has only five parameters (the means of $X$ and $Y$ and
three free parameters in the covariate matrix). Thus, our model is not
- identifiable if we also try to estimate $\sigma_{e}$. Instead, we pretend that
- we have estimated $\sigma_{e}$ from some external data source. This example
+ identifiable if we also try to estimate $\sigma_e$. Instead, we pretend that
+ we have estimated $\sigma_e$ from some external data source. This example
illustrates a general point in random effects modelling: you must be careful
to make sure that the model is identifiable!
\end{enumerate}
@@ -486,32 +491,32 @@
Here is the random effects version of \texttt{simple.tpl}:
\begin{lstlisting}
DATA_SECTION
- init_int nobs
- init_vector Y(1,nobs)
- init_vector X(1,nobs)
+ init_int nobs
+ init_vector Y(1,nobs)
+ init_vector X(1,nobs)
PARAMETER_SECTION
- init_number a
- init_number b
- init_number mu
- vector pred_Y(1,nobs)
- init_bounded_number sigma_Y(0.000001,10)
- init_bounded_number sigma_x(0.000001,10)
- random_effects_vector x(1,nobs)
- objective_function_value f
+ init_number a
+ init_number b
+ init_number mu
+ vector pred_Y(1,nobs)
+ init_bounded_number sigma_Y(0.000001,10)
+ init_bounded_number sigma_x(0.000001,10)
+ random_effects_vector x(1,nobs)
+ objective_function_value f
-PROCEDURE_SECTION // This section is pure C++
- f = 0;
- pred_Y=a*x+b; // Vectorized operations
+PROCEDURE_SECTION // This section is pure C++
+ f = 0;
+ pred_Y=a*x+b; // Vectorized operations
- // Prior part for random effects x
- f += -nobs*log(sigma_x) - 0.5*norm2((x-mu)/sigma_x);
+ // Prior part for random effects x
+ f += -nobs*log(sigma_x) - 0.5*norm2((x-mu)/sigma_x);
- // Likelihood part
- f += -nobs*log(sigma_Y) - 0.5*norm2((pred_Y-Y)/sigma_Y);
- f += -0.5*norm2((X-x)/0.5);
+ // Likelihood part
+ f += -nobs*log(sigma_Y) - 0.5*norm2((pred_Y-Y)/sigma_Y);
+ f += -0.5*norm2((X-x)/0.5);
- f *= -1; // ADMB does minimization!
+ f *= -1; // ADMB does minimization!
\end{lstlisting}
\paragraph{Comments}
@@ -532,7 +537,7 @@
(There is also a type called \texttt{random\_effects\_matrix}.) There can be
more than one such object, but they must all be defined after the
hyper-parameters are---otherwise, you will get an error message from the
- preprocessor \texttt{tpl2rem}.
+ translator \texttt{tpl2rem}.
\item Objects that are neither hyper-parameters nor random effects are
ordinary programming variables that can be used in the
@@ -575,14 +580,14 @@
\section{The flexibility of \scAR\label{lognormal}}
-Say that you doubt the distributional assumption $x_i\sim N(\mu,\sigma
-_{x}^{2})$ made in \texttt{simple.tpl}, and that you want to check if a skewed
-distribution gives a better fit. You could, for instance,~take
+Say that you doubt the distributional assumption $x_i\sim N(\mu,\sigma_x^2)$
+made in \texttt{simple.tpl}, and that you want to check if a skewed
+distribution gives a better fit. You could, for instance, take
\[
-x_i=\mu +\sigma_{x}\exp (z_i),\qquad z_i\sim N(0,1).
+x_i=\mu +\sigma_x\exp (z_i),\qquad z_i\sim N(0,1).
\]
Under this model, the standard deviation of $x_i$ is proportional to, but not
-directly equal to, $\sigma_{x}$. It is easy to make this modification in
+directly equal to, $\sigma_x$. It is easy to make this modification in
\texttt{simple.tpl}. In the \texttt{PARAMETER\_SECTION}, we replace the
declaration of \texttt{x} by
\begin{lstlisting}
@@ -660,16 +665,16 @@
\section{The random effects distribution (prior)}
-In \texttt{simple.tpl}, we declared $x_{1},\ldots,x_{n}$ to be of type
-\texttt{random\_effects\_vector}. This statement tells \scAB\ that $x_{1},\ldots
-,x_{n}$ should be treated as random effects (i.e., be the targets for the
-Laplace approximation), but it does not say anything about what distribution the
-random effects should have. We assumed that $x_i\sim N(\mu,\sigma_{x}^{2})$,
+In \texttt{simple.tpl}, we declared $x_1,\ldots,x_n$ to be of type
+\texttt{random\_effects\_vector}. This statement tells \scAB\ that
+$x_1,\ldots,x_n$ should be treated as random effects (i.e., be the targets for
+the Laplace approximation), but it does not say anything about what distribution
+the random effects should have. We assumed that $x_i\sim N(\mu,\sigma_x^2)$,
and (without saying it explicitly) that the $x_i$s were statistically
independent. We know that the corresponding prior contribution to the
log-likelihood is
\[
--n\log (\sigma_{x})-\frac{1}{2\sigma_x^2}\sum_{i=1}\left( x_i-\mu \right) ^{2}
+-n\log (\sigma_x)-\frac{1}{2\sigma_x^2}\sum_{i=1}\left(x_i-\mu \right)^2
\]
with \scAB\ implementation
\begin{lstlisting}
@@ -684,8 +689,8 @@
A frequent source of error when writing \scAR\ programs is that priors get
wrongly specified. The following trick can make the code easier to read, and has
the additional advantage of being numerically stable for small values of
-$\sigma_{x}$. From basic probability theory, we know that if $u\sim N(0,1)$,
-then $x=\sigma_{x}u+\mu$ will have a $N(\mu,\sigma_{x}^{2})$ distribution. The
+$\sigma_x$. From basic probability theory, we know that if $u\sim N(0,1)$,
+then $x=\sigma_xu+\mu$ will have a $N(\mu,\sigma_x^2)$ distribution. The
corresponding \scAB\ code would be
\begin{lstlisting}
f += - 0.5*norm2(u);
@@ -709,29 +714,29 @@
In some situations, you will need correlated random effects, and as part of your
problem, you may want to estimate the elements of the covariance matrix. A
-typical example is mixed regression, where the intercept random effect ($u_{i}$)
-is correlated with the slope random effect~($v_{i}$):
+typical example is mixed regression, where the intercept random effect ($u_i$)
+is correlated with the slope random effect~($v_i$):
\[
- y_{ij}=(a+u_{i})+\left(b+v_{i}\right)x_{ij}+\varepsilon_{ij}.
+y_{ij}=(a+u_i)+\left(b+v_i\right)x_{ij}+\varepsilon_{ij}.
\]
(If you are not familiar with the notation, please consult an introductory book
-on mixed regression, such \citeasnoun{pinh:bate:2000}.) In this case, we can
+on mixed regression, such as~\citeasnoun{pinh:bate:2000}.) In this case, we can
define the correlation matrix
\[
- C=\left[\begin{array}{cc}
- 1 & \rho\\
- \rho & 1\end{array}\right],
+C=\left[\begin{array}{cc}
+ 1 & \rho \\
+ \rho & 1\end{array}\right],
\]
-and we want to estimate $\rho$ along with the variances of $u_{i}$ and~$v_{i}$.
-Here, it is trivial to ensure that~$C$ is positive-definite, by requiring
+and we want to estimate $\rho$ along with the variances of $u_i$ and $v_i$.
+Here, it is trivial to ensure that $C$ is positive-definite, by requiring
$-1<\rho<1$, but in higher dimensions, this issue requires more careful
consideration.
To ensure that $C$ is positive-definite, you can parameterize the problem in
-terms of the Cholesky factor $L$, i.e.,~$C=LL^\prime$, where~$L$ is a lower
+terms of the Cholesky factor $L$, i.e., $C=LL^\prime$, where $L$ is a lower
diagonal matrix with positive diagonal elements. There are $q(q-1)/2)$ free
parameters (the non-zero elements of $L$) to be estimated, where $q$ is the
-dimension of~$C$. Since $C$ is a correlation matrix, we must ensure that its
+dimension of $C$. Since $C$ is a correlation matrix, we must ensure that its
diagonal elements are unity. An example with $q=4$ is
\begin{lstlisting}
PARAMETER_SECTION
@@ -753,7 +758,7 @@
\end{lstlisting}
Given the Cholesky factor $L$, we can proceed in different directions. One
option is to use the same transformation-of-variable technique as above: Start
-out with a vector~$u$ of independent $N(0,1)$ distributed random effects. Then,
+out with a vector $u$ of independent $N(0,1)$ distributed random effects. Then,
the vector
\begin{lstlisting}
x = L*u;
@@ -838,7 +843,7 @@
probability density
\begin{equation*}
f(x) = 0.95\frac{1}{\sqrt{2\pi}}e^{-0.5x^2}
- + 0.05\frac{1}{c\sqrt{2\pi}}e^{-0.5(x/c)^2}
+ + 0.05\frac{1}{c\sqrt{2\pi}}e^{-0.5(x/c)^2}
\end{equation*}
where $c$ is a ``robustness'' parameter which by default is set to $c=3$ in
\texttt{df1b2fun.h}. Note that this is a mixture distribution consisting of 95\%
@@ -882,7 +887,7 @@
implemented over time, and user contributions are welcomed!
We stress that these functions should be only be used for data likelihoods, and
-in fact, they will not compile if you try to let~$X$ be a random effect. So, for
+in fact, they will not compile if you try to let $X$ be a random effect. So, for
instance, if you have observations $x_i$ that are Poisson distributed with
expectation $\mu_i$, you would write
\begin{lstlisting}
@@ -890,26 +895,32 @@
f -= log_density_poisson(x(i),mu(i));
\end{lstlisting}
Note that functions do not accept vector arguments.
-\begin{table}[h]
-\begin{tabular}%
- {@{\vrule height 12pt depth 6pt width0pt} @{\extracolsep{1em}} cccc}
-\hline
-\textbf{Density} & \textbf{Expression} & \textbf{Parameters} & \textbf{Name} \\
-\hline\\[-16pt]
-Poisson & $\frac{\mu^{x}}{\Gamma(x+1)}e^{-\mu}$ & $\mu>0$
-& \texttt{log\_density\_poisson}\\[6pt]
-Neg.\ binomial %$^{1}$
-& $\mu=E(X)$, $\tau=\frac{Var(X)}{E(X)}$ & $\mu,\tau>0$
-& \texttt{log\_negbinomial\_density}\\[6pt]
-\hline
-\end{tabular}
-\caption{Distributions that currently can be used as high-level data
- distributions (for data $X$) in \scAR. % $^{1}$
-The expression for the negative binomial distribution is omitted, due to its
-somewhat complicated form. Instead, the parameterization, via the overdispersion
-coefficient, is given. The interested reader can look at the actual
-implementation in the source file \texttt{df1b2negb.cpp}}.
-\label{tab:distributions}
+\begin{table}[htbp]
+ \begin{tabular}%
+ {@{\vrule height 12pt depth 6pt width0pt} @{\extracolsep{1em}} cccc}
+ \hline
+ \textbf{Density}
+ & \textbf{Expression}
+ & \textbf{Parameters}
+ & \textbf{Name} \\
+ \hline\\[-16pt]
+ Poisson
+ & $\frac{\mu^x}{\Gamma(x+1)}e^{-\mu}$
+ & $\mu>0$
+ & \texttt{log\_density\_poisson} \\[6pt]
+ Neg.\ binomial
+ & $\mu=E(X)$, $\tau=\frac{Var(X)}{E(X)}$
+ & $\mu,\tau>0$
+ & \texttt{log\_negbinomial\_density} \\[6pt]
+ \hline
+ \end{tabular}
+ \caption{Distributions that currently can be used as high-level data
+ distributions (for data $X$) in \scAR. The expression for the negative
+ binomial distribution is omitted, due to its somewhat complicated form.
+ Instead, the parameterization, via the overdispersion coefficient, is given.
+ The interested reader can look at the actual implementation in the source
+ file \texttt{df1b2negb.cpp}}.
+ \label{tab:distributions}
\end{table}
\section{Phases}
@@ -920,29 +931,29 @@
parameters, with the remaining parameters being fixed at their initial values.
In the second phase, more parameters are turned on, and so it goes. The phase in
which a parameter becomes active is specified in the declaration of the
-parameter. By default, a parameter has phase~1. A simple example would~be:
+parameter. By default, a parameter has phase~1. A simple example would be:
\begin{lstlisting}
PARAMETER_SECTION
- init_number a(1)
- random_effects_vector b(1,10,2)
+ init_number a(1)
+ random_effects_vector b(1,10,2)
\end{lstlisting}
where \texttt{a} becomes active in phase~1, while \texttt{b}\ is a vector of
length~10 that becomes active in phase~2. With random effects, we have the
following rule-of-thumb (which may not always apply):
\begin{description}
-\item[Phase 1] Activate all parameters in the data likelihood, except those
-related to random effects.
-\item[Phase 2] Activate random effects and their standard deviations.
-\item[Phase 3] Activate correlation parameters (of random effects).
+ \item[Phase 1] Activate all parameters in the data likelihood, except those
+ related to random effects.
+ \item[Phase 2] Activate random effects and their standard deviations.
+ \item[Phase 3] Activate correlation parameters (of random effects).
\end{description}
In complicated models, it may be useful to break Phase~1 into several
sub-phases.
During program development, it is often useful to be able to completely switch
-off a parameter. A parameter is inactivated when given phase~$-1$, as in
+off a parameter. A parameter is inactivated when given phase $-1$, as in
\begin{lstlisting}
PARAMETER_SECTION
- init_number c(-1)
+ init_number c(-1)
\end{lstlisting}
The parameter is still part of the program, and its value will still be read
from the \texttt{pin}~file, but it does not take part in the optimization (in
@@ -992,7 +1003,7 @@
We can now return to the role of the initial values specified for the random
effects in the \texttt{.pin} file. Each time step~1 above is performed, these
values are used---unless you use the command line option \texttt{-noinit}, in
-which case the previous optimum is used as the starting~value.
+which case the previous optimum is used as the starting value.
\paragraph{Empirical Bayes} is commonly used to refer to Bayesian estimates of
the random effects, with the hyper-parameters fixed at their maximum likelihood
@@ -1007,17 +1018,16 @@
random effects is underestimated. \scAR\ does, however, take this into account
and uses the following formula:
\begin{equation}
-\textrm{cov}(u)
- =
- -\left[\frac{\partial^{2}\log p(u \mid, \hbox{data};\theta)}
- {\partial u\partial u^{\prime}}\right]^{-1}+ \frac{
- \partial u}{\partial\theta}
+ \textrm{cov}(u) =
+ -\left[\frac{\partial^2\log p(u \mid, \hbox{data};\theta)}
+ {\partial u\partial u^{\prime}}\right]^{-1}+ \frac{
+ \partial u}{\partial\theta}
\hbox{cov}(\theta) \left(\frac{\partial u}{\partial\theta}\right)^{\prime}
\label{eq:EB_variance}
\end{equation}
where $u$ is the vector of random effect, $\theta$ is the vector of
hyper-parameters, and $\partial u/\partial\theta$ is the sensitivity of the
-penalized likelihood estimator on the value of~$\theta$. The first term on the
+penalized likelihood estimator on the value of $\theta$. The first term on the
r.h.s.~is the ordinary Fisher information based variance of $u$, while the
second term accounts for the uncertainty in $\theta$.
@@ -1026,10 +1036,10 @@
In all nonlinear parameter estimation problems, there are two possible
explanations when your program does not produce meaningful results:
\begin{enumerate}
-\item The underlying mathematical model is not well-defined, e.g., it may be
-over-parameterized.
-\item You have implemented the model incorrectly, e.g., you have forgotten a
-minus sign somewhere.
+ \item The underlying mathematical model is not well-defined, e.g., it may be
+ over-parameterized.
+ \item You have implemented the model incorrectly, e.g., you have forgotten a
+ minus sign somewhere.
\end{enumerate}
In an early phase of the code development, it may not be clear which of these is
causing the problem. With random effects, the two-step iteration scheme
@@ -1043,41 +1053,41 @@
very easy to switch from a random-effects version of the program to a
penalized-likelihood version. In \texttt{simple.tpl}, we would simply redefine
the random effects vector \texttt{x} to be of type \texttt{init\_vector}. The
-parameters would then be $a$, $b$, $\mu $, $\sigma $, $\sigma_{x}$, and
-$x_{1}$,\ldots, $x_{n}$. It is not recommended, or even possible, to estimate
-all of these simultaneously, so you should fix~$\sigma_{x}$ (by giving it a
-phase~$-\kern-.1em1$) at some reasonable value. The actual value at which you
-fix~$\sigma_{x}$ is not critically important, and you could even try a range of
-$\sigma_{x}$ values. In larger models, there will be more than one parameter
+parameters would then be $a$, $b$, $\mu $, $\sigma $, $\sigma_x$, and
+$x_1$,\ldots, $x_n$. It is not recommended, or even possible, to estimate
+all of these simultaneously, so you should fix $\sigma_x$ (by giving it a
+phase $-1$) at some reasonable value. The actual value at which you
+fix $\sigma_x$ is not critically important, and you could even try a range of
+$\sigma_x$ values. In larger models, there will be more than one parameter
that needs to be fixed. We recommend the following scheme:
\begin{enumerate}
-\item Write a simulation program (in R, S-Plus, Matlab, or some other
-program) that generates data from the random effects model (using some
-reasonable values for the parameters) and writes to \texttt{simple.dat}.
+ \item Write a simulation program (in R, S-Plus, Matlab, or some other
+ program) that generates data from the random effects model (using some
+ reasonable values for the parameters) and writes to \texttt{simple.dat}.
-\item Fit the penalized likelihood program with $\sigma_{x}$ (or the
-equivalent parameters) fixed at the value used to simulate data.
+ \item Fit the penalized likelihood program with $\sigma_x$ (or the
+ equivalent parameters) fixed at the value used to simulate data.
-\item Compare the estimated parameters with the parameter values used to
-simulate data. In particular, you should plot the estimated $x_{1},\ldots,x_{n}$
-against the simulated random effects. The plotted points should center
-around a straight line. If they do (to some degree of approximation), you
-most likely have got a correct formulation of the penalized likelihood.
+ \item Compare the estimated parameters with the parameter values used to
+ simulate data. In particular, you should plot the estimated $x_1,\ldots,x_n$
+ against the simulated random effects. The plotted points should center
+ around a straight line. If they do (to some degree of approximation), you
+ most likely have got a correct formulation of the penalized likelihood.
\end{enumerate}
-If your program passes this test, you are ready to test the random effects
-version of the program. You redefine~\texttt{x} to be of type
-\texttt{random\_effects\_vector}, free up $\sigma_{x}$, and apply your program
+If your program passes this test, you are ready to test the random-effects
+version of the program. You redefine \texttt{x} to be of type
+\texttt{random\_effects\_vector}, free up $\sigma_x$, and apply your program
again to the same simulated data set. If the program produces meaningful
estimates of the hyper-parameters, you most likely have implemented your model
correctly, and you are ready to move on to your real data!
With random effects, it often happens that the maximum likelihood estimate of a
-variance component is zero ($\sigma_{x}=0$). Parameters bouncing against the
+variance component is zero ($\sigma_x=0$). Parameters bouncing against the
boundaries usually makes one feel uncomfortable, but with random effects, the
-interpretation of $\sigma_{x}=0$ is clear and unproblematic. All it really means
+interpretation of $\sigma_x=0$ is clear and unproblematic. All it really means
is that data do not support a random effect, and the natural consequence is to
-remove (or inactivate) $x_{1},\ldots,x_{n}$, together with the corresponding
-prior (and hence $\sigma_{x}$), from the model.
+remove (or inactivate) $x_1,\ldots,x_n$, together with the corresponding
+prior (and hence $\sigma_x$), from the model.
\section{\scMCMC}
\index{MCMC@\textsc{mcmc}}
@@ -1126,7 +1136,7 @@
calculations relating to importance sampling into 5 (you can replace the 5 with
any number you like) batches. In combination with the techniques discussed in
Section~\ref{Memory_management}, this should reduce the storage requirements. An
-example of a command line~is:
+example of a command line is:
\begin{lstlisting}
lessafre -isb 1000 9811 -isf 20 -cbs 50000000 -gbs 50000000
\end{lstlisting}
@@ -1151,20 +1161,20 @@
mean-related parameters. The simplest example of a \scREML\ estimator is the
ordinary sample variance
$$
-s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar x)^2,
+s^2 = \frac{1}{n-1}\sum_{i=1}^n(x_i-\bar x)^2,
$$
-where the divisor $(n-1)$, rather than the~$n$ that occurs for the maximum
+where the divisor $(n-1)$, rather than the $n$ that occurs for the maximum
likelihood estimator, accounts for the fact that we have estimated a single mean
parameter.
There are many ways of deriving the \scREML\ correction, but in the current
context, the most natural explanation is that we integrate the likelihood
function (\textit{note:} not the log-likelihood) with respect to the mean
-parameters---$\beta$,~say. This is achieved in \scAR\ by defining~$\beta$ as
+parameters---$\beta$, say. This is achieved in \scAR\ by defining $\beta$ as
being of type \texttt{random\_effects\_vector}, but without specifying a
distribution/prior for the parameters. It should be noted that the only thing
that the \texttt{random\_effects\_vector} statement tells \scAR\ is that the
-likelihood function should be integrated with respect to~$\beta$. In
+likelihood function should be integrated with respect to $\beta$. In
linear-Gaussian models, the Laplace approximation is exact, and hence this
approach yields exact \scREML\ estimates. In nonlinear models, the notion of
\scREML\ is more difficult, but \scREML-like corrections are still being used.
@@ -1199,21 +1209,23 @@
requirements increase dramatically, and \scAR\ deals with this by producing
(when needed) six temporary files. (See Table~\ref{tab:temporary-files}.):
\index{temporary files!f1b2list1@\texttt{f1b2list1}}
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} l c }
-\hline
-\textbf{ File name} & \textbf{Command line option} \\ \hline
-\texttt{ f1b2list1} & \texttt{ -l1 N} \\
-\texttt{ f1b2list12} & \texttt{ -l2 N} \\
-\texttt{ f1b2list13} & \texttt{ -l3 N} \\
-\texttt{ nf1b2list1} & \texttt{-nl1 N} \\
-\texttt{ nf1b2list12} & \texttt{-nl2 N} \\
-\texttt{ nf1b2list13} & \texttt{-nl3 N} \\ \hline
-\end{tabular}%
-\end{center}
-\caption{Temporary file command line options.}
-\label{tab:temporary-files}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lc}
+ \hline
+ \textbf{File name} & \textbf{Command line option}\\
+ \hline
+ \texttt{f1b2list1} & \texttt{-l1 N} \\
+ \texttt{f1b2list12} & \texttt{-l2 N} \\
+ \texttt{f1b2list13} & \texttt{-l3 N} \\
+ \texttt{nf1b2list1} & \texttt{-nl1 N} \\
+ \texttt{nf1b2list12} & \texttt{-nl2 N} \\
+ \texttt{nf1b2list13} & \texttt{-nl3 N} \\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{Temporary file command line options.}
+ \label{tab:temporary-files}
\end{table}
The table also shows the command line arguments you can use to manually set the
size (determined by~\texttt{N}) of the different memory buffers.
@@ -1222,13 +1234,13 @@
application and restart it with the appropriate command line options. In
addition to the options shown above, there is \texttt{-ndb N}, which splits the
computations into $N$~chunks. This effectively reduces the memory requirements
-by a factor of~$N$---at the cost of a somewhat longer run time. $N$~must be a
+by a factor of $N$---at the cost of a somewhat longer run time. $N$~must be a
divisor of the total number of random effects in the model, so that it is
-possible to split the job into~$N$ equally large parts. The \texttt{-ndb} option
+possible to split the job into $N$ equally large parts. The \texttt{-ndb} option
can be used in combination with the \texttt{-l} and \texttt{-nl} options listed
above. The following rule-of-thumb for setting $N$ in~\texttt{-ndb N} can be
used: if there are a total of $m$ random effects in the model, one should
-choose~$N$ such that $m/N\approx 50$. For most of the models in the example
+choose $N$ such that $m/N\approx 50$. For most of the models in the example
collection (Chapter~\ref{ch:random-effects-modeling}),% 3
this choice of $N$ prevents any temporary files being created.
@@ -1260,7 +1272,7 @@
described in Section~\ref{separability} below.
In this case the memory options in Table~\ref{tab:temporary-files}
are less relevant (although sometimes useful), and instead the
-memory use can be controlled with the classical \ADM~command line options
+memory use can be controlled with the classical \ADM\ command line options
\texttt{-cbs}, \texttt{-gbs} etc.
\subsection{Limited memory Newton optimization}
@@ -1289,9 +1301,9 @@
\item Latent Markov random fields
\end{itemize}
The presence of separability allows \scAB\ to calculate the ``Hessian'' matrix
-very efficiently.The Hessian~$H$ is defined as the (negative) Fisher information
+very efficiently.The Hessian $H$ is defined as the (negative) Fisher information
matrix (inverse covariance matrix) of the posterior distribution of the
-random~effects, and is a key component of the Laplace approximation.
+random effects, and is a key component of the Laplace approximation.
How do we inform \scAR\ that the model is separable? We define
\texttt{SEPARABLE\_FUNCTION}s in the \texttt{PROCEDURE\_SECTION} to specify the
@@ -1305,10 +1317,10 @@
the model type:
\begin{itemize}
\item \textit{Grouped or nested random effects:} $H$ is block diagonal.
- \item \textit{State-space models:} $H$ is a banded matrix with a narrow band.
- \item \textit{Crossed random effects:} unstructured sparsity pattern.
- \item \textit{Latent Markov random fields:} often banded, but with a wide
- band.
+ \item \textit{State-space models:} $H$ is a banded matrix with a narrow band.
+ \item \textit{Crossed random effects:} unstructured sparsity pattern.
+ \item \textit{Latent Markov random fields:} often banded, but with a wide
+ band.
\end{itemize}
For block diagonal and banded $H$, \scAR\ automatically will detect the
structure from the \texttt{SEPARABLE\_FUNCTION} specification, and will print
@@ -1325,12 +1337,12 @@
A simple example is the one-way variance component model
\[
- y_{ij}=\mu +\sigma_u u_{i}+\varepsilon_{ij},
- \qquad i=1,\ldots,q,\quad j=1,\ldots,n_{i}
+y_{ij}=\mu +\sigma_u u_i+\varepsilon_{ij},
+\qquad i=1,\ldots,q,\quad j=1,\ldots,n_i
\]
-where $u_{i}\sim N(0,1)$ is a random effect and $\varepsilon_{ij}\sim
+where $u_i\sim N(0,1)$ is a random effect and $\varepsilon_{ij}\sim
N(0,\sigma^2)$ is an error term. The straightforward (non-separable)
-implementation of this model (shown only in part)~is
+implementation of this model (shown only in part) is
\begin{lstlisting}
PARAMETER_SECTION
random_effects_vector u(1,q)
@@ -1359,15 +1371,15 @@
It is the function call \texttt{g\_cluster(i,u(i),mu,sigma,sigma\_u)} that
enables \scAR\ to identify that the posterior distribution
-factors (over~$i$):
+factors (over $i$):
\[
- p(u \mid y) \propto \prod_{i=1}^{q}
- \left\{ \prod_{j=1}^{n_{i}}p \bigl(u_{i} \mid y_{ij}\bigr)\right\}
+p(u \mid y) \propto \prod_{i=1}^q
+\left\{\prod_{j=1}^{n_i}p \bigl(u_i \mid y_{ij}\bigr)\right\}
\]
and hence that the Hessian is block diagonal (with block size~1). Knowing that
the Hessian is block diagonal enables \scAR\ to do a series of univariate
Laplace approximations, rather than a single Laplace approximation in full
-dimension~$q$. It should then be possible to fit models where~$q$ is in the
+dimension $q$. It should then be possible to fit models where $q$ is in the
order of thousands, but this clearly depends on the complexity of the function
\texttt{g\_cluster}.
@@ -1380,7 +1392,7 @@
\item[$\bigstar$] Objects defined in the \texttt{PARAMETER\_SECTION}
\textit{must} be passed as arguments to \texttt{g\_cluster}. There is one
- exception: the objective function~\texttt{g} is a global object, and does not
+ exception: the objective function \texttt{g} is a global object, and does not
need to be an argument. Temporary/programming variables should be defined
locally within the \texttt{SEPARABLE\_FUNCTION}.
@@ -1426,7 +1438,7 @@
variables.
\end{itemize}
All temporary variables should be defined locally in the
-\texttt{SEPARABLE\_FUNCTION}, as shown~here:
+\texttt{SEPARABLE\_FUNCTION}, as shown here:
\begin{lstlisting}
SEPARABLE_FUNCTION void prior(const dvariable& log_s, const dvariable& u)
dvariable sigma_u = exp(log_s);
@@ -1435,7 +1447,7 @@
See a full example
\href{http://otter-rsch.com/admbre/examples/orange/orange.html}{here}. The
-orange model has block size~1.
+orange model has block size 1.
\section{Nested or clustered random effects:\br block diagonal $H$}
\label{sec:nested}
@@ -1444,11 +1456,11 @@
variables (the \texttt{u}s). A more complicated example is provided by the
following model:
\[
- y_{ijk}= \sigma_v v_{i} + \sigma_u u_{ij}+\varepsilon_{ijk},
- \qquad i=1,\ldots,q,\quad j=1,\ldots,m,\quad k=1,\ldots,n_{ij},
+y_{ijk}= \sigma_v v_i + \sigma_u u_{ij}+\varepsilon_{ijk},
+\qquad i=1,\ldots,q,\quad j=1,\ldots,m,\quad k=1,\ldots,n_{ij},
\]
-where the random effects $v_{i}$ and $u_{ij}$ are independent
-$N(0,1)$~distributed, and $\varepsilon_{ijk}\sim N(0,\sigma^2)$ is still the
+where the random effects $v_i$ and $u_{ij}$ are independent
+$N(0,1)$ distributed, and $\varepsilon_{ijk}\sim N(0,\sigma^2)$ is still the
error term. One often says that the \texttt{u}s are nested within
the~\texttt{v}s.
@@ -1487,14 +1499,14 @@
a \texttt{SEPARABLE\_FUNCTION}.
\end{itemize}
To ensure that you have not broken this rule, you should look for an message
-like this at run~time:
+like this at run time:
\begin{lstlisting}
Block diagonal Hessian (Block size = 3)
\end{lstlisting}
The ``block size'' is the number of random effects in each call
to the \texttt{SEPARABLE\_FUNCTION}, which in this case is
one \texttt{v(i)} and a vector \texttt{u(i)} of length two.
-It is possible that the groups or clusters (as indexed by~$i$, in this case) are
+It is possible that the groups or clusters (as indexed by $i$, in this case) are
of different size. Then, the ``Block diagonal Hessian'' that is printed is an
average.
@@ -1514,7 +1526,7 @@
take advantage of block diagonal Hessiand, as indicated by
the absence of runtime message
\begin{lstlisting}
- Block diagonal Hessian (Block size = 3).
+ Block diagonal Hessian (Block size = 3)
\end{lstlisting}
\subsection{Gauss-Hermite quadrature}
@@ -1525,7 +1537,7 @@
available as an option to improve upon the Laplace approximation. It is invoked
with the command line option \texttt{-gh N}, where \texttt{N} is the number of
quadrature points determining the accuracy of the integral approximation. For a
-block size of~1, the default choice should be $N=10$ or greater, but for larger
+block size of 1, the default choice should be $N=10$ or greater, but for larger
block sizes the computational and memory requirements very quickly limits the
range of $N$. If $N$ is chosen too large \scAR\ will crash witout giving a
meaningful error message. To avoid \scAR\ creating large temporary files the
@@ -1535,50 +1547,50 @@
(\texttt{-is}).
-%\subsection{Frequency weighting for multinomial likelihoods}
+% \subsection{Frequency weighting for multinomial likelihoods}
%
-%In situations where the response variable only can take on a finite number of
-%different values, it is possibly to reduce the computational burden enormously.
-%As an example, consider a situation where observation $y_{i}$ is binomially
-%distributed with parameters $N=2$ and $p_{i}$. Assume that
-%\begin{equation*}
-% p_{i}=\frac{\exp (\mu +u_{i})}{1+\exp (\mu +u_{i})},
-%\end{equation*}
-%where $\mu$ is a parameter and $u_{i}\sim N(0,\sigma ^{2})$ is a random effect.
-%For independent observations $y_1,\ldots,y_n$, the log-likelihood function for
-%the parameter $\theta =(\mu,\sigma )$ can be written as
-%\begin{equation}
-% l(\theta )=\sum_{i=1}^{n}\log \bigl[\, p(x_{i};\theta )\bigr] .
-%\end{equation}
-%In \scAR, $p(x_{i};\theta)$ is approximated using the Laplace approximation.
-%However, since $y_i$ only can take the values~$0$, $1$, and~$2$, we can rewrite
-%the log-likelihood~as
-%$$
-%l(\theta )=\sum_{j=0}^{2}n_{j}\log \left[ p(j;\theta )\right],
-%$$
-%where $n_j$ is the number $y_i$s being equal to $j$. Still, the Laplace
-%approximation must be used to approximate $p(j;\theta )$, but now only for
-%$j=0,1,2$, as opposed to $n$~times above. For large~$n$, this can give large a
-%large reduction in computing time.
+% In situations where the response variable only can take on a finite number of
+% different values, it is possibly to reduce the computational burden
+% enormously. % As an example, consider a situation where observation $y_i$ is
+% binomially distributed with parameters $N=2$ and $p_i$. Assume that
+% \begin{equation*}
+% p_i=\frac{\exp (\mu +u_i)}{1+\exp (\mu +u_i)},
+% \end{equation*}
+% where $\mu$ is a parameter and $u_i\sim N(0,\sigma^2)$ is a random effect.
+% For independent observations $y_1,\ldots,y_n$, the log-likelihood function for
+% the parameter $\theta =(\mu,\sigma)$ can be written as
+% \begin{equation}
+% l(\theta)=\sum_{i=1}^n\log \bigl[\, p(x_i;\theta)\bigr] .
+% \end{equation}
+% In \scAR, $p(x_i;\theta)$ is approximated using the Laplace approximation.
+% However, since $y_i$ only can take the values $0$, $1$, and $2$, we can
+% rewrite the log-likelihood as
+% $$
+% l(\theta)=\sum_{j=0}^2n_j\log \left[ p(j;\theta)\right],
+% $$
+% where $n_j$ is the number $y_i$s being equal to $j$. Still, the Laplace
+% approximation must be used to approximate $p(j;\theta)$, but now only for
+% $j=0,1,2$, as opposed to $n$ times above. For large $n$, this can give large a
+% large reduction in computing time.
%
-%To implement the weighted log-likelihood~(\ref{l_w}), we define a weight vector
-%$(w_1,w_2,w_3)=(n_{0},n_{1},n_{2})$. To read the weights from file, and to tell
-%\scAR\ that~\texttt{w} is a weights vector, the following code is used:
-%\begin{lstlisting}
-%DATA_SECTION
+% To implement the weighted log-likelihood~(\ref{l_w}), we define a weight
+% vector $(w_1,w_2,w_3)=(n_0,n_1,n_2)$. To read the weights from file, and to
+% tell \scAR\ that~\texttt{w} is a weights vector, the following code is used:
+% \begin{lstlisting}
+% DATA_SECTION
% init_vector w(1,3)
%
-%PARAMETER_SECTION
+% PARAMETER_SECTION
% !! set_multinomial_weights(w);
-%\end{lstlisting}
-%In addition, it is necessary to explicitly multiply the likelihood
-%contributions in~(\ref{l_w}) by~$w$. The program must be written with
-%\texttt{SEPARABLE\_FUNCTION}, as explained in Section~\ref{sec:nested}. For the
-%likelihood~(\ref{l_w}), the \texttt{SEPARABLE\_FUNCTION} will be invoked three
-%times.
+% \end{lstlisting}
+% In addition, it is necessary to explicitly multiply the likelihood
+% contributions in~(\ref{l_w}) by $w$. The program must be written with
+% \texttt{SEPARABLE\_FUNCTION}, as explained in Section~\ref{sec:nested}. For
+% the likelihood~(\ref{l_w}), the \texttt{SEPARABLE\_FUNCTION} will be invoked
+% three times.
%
-%See a full example
-%\href{http://www.otter-rsch.com/admbre/examples/weights/weights.html}{here}.
+% See a full example
+% \href{http://www.otter-rsch.com/admbre/examples/weights/weights.html}{here}.
\section{State-space models: banded $H$}
\label{sec:state-space}\index{state-space models}
@@ -1589,10 +1601,10 @@
u_i &= \rho u_{i-1} + e_i,
\end{align*}
where $e_i\sim N(0,\sigma^2)$ is an innovation term. The log-likelihood
-contribution coming from the state vector $(u_1,\ldots,u_n)$~is
+contribution coming from the state vector $(u_1,\ldots,u_n)$ is
\[
- \sum_{i=2}^n \log\left(\frac{1}{\sqrt{2\pi }\sigma }
- \exp\left[-\frac{(u_{i}-\rho u_{i-1})^{2}}{2\sigma^2}\right]\right),
+\sum_{i=2}^n \log\left(\frac{1}{\sqrt{2\pi}\sigma}
+ \exp\left[-\frac{(u_i-\rho u_{i-1})^2}{2\sigma^2}\right]\right),
\]
where $(u_1,\ldots,u_n)$ is the state vector. To make \scAR\ exploit this
special structure, we write a \texttt{SEPARABLE\_FUNCTION} named
@@ -1620,35 +1632,35 @@
where $m=2n$. The call to the \texttt{SEPARABLE\_FUNCTION} would now look like
\begin{lstlisting}
for(i=2;i<=n;i++)
- g_conditional(u(2*(i-2)+1),u(2*(i-2)+2),u(2*(i-2)+3),u(2*(i-2)+4),...);
+ g_conditional(u(2*(i-2)+1),u(2*(i-2)+2),u(2*(i-2)+3),u(2*(i-2)+4),...);
\end{lstlisting}
where the ellipsis (\texttt{...}) denotes the arguments $\rho_1$, $\rho_2$,
-$\sigma_e$, and~$\sigma_d$.
+$\sigma_e$, and $\sigma_d$.
\section{Crossed random effects: sparse $H$}
\index{crossed effects}
The simplest instance of a crossed random effects model is
\[
- y_{k}= \sigma_u u_{i(k)} + \sigma_v v_{j(k)}+\varepsilon_{k},
- \qquad i=1,\ldots n,
+y_k = \sigma_u u_{i(k)} + \sigma_v v_{j(k)}+\varepsilon_k,
+\qquad i=1,\ldots n,
\]
-where $u_{1},\ldots,u_{N}$ and $v_{1},\ldots,v_{M}$ are random effects, and
+where $u_1,\ldots,u_n$ and $v_1,\ldots,v_M$ are random effects, and
where $i(k)\in\{1,N\}$ and $j(k)\in\{1,M\}$ are index maps. The $y$s sharing
-either a~$u$ or a~$v$ will be dependent, and in general, no complete factoring
+either a $u$ or a $v$ will be dependent, and in general, no complete factoring
of the likelihood will be possible. However, it is still important to exploit
the fact that the $u$s and $v$s only enter the likelihood through pairs
$(u_{i(k)},v_{j(k)})$. Here is the code for the crossed model:
\begin{lstlisting}
for (k=1;k<=n;k++)
- log_lik(k,u(i(k)),v(j(k)),mu,s,s_u,s_v);
+ log_lik(k,u(i(k)),v(j(k)),mu,s,s_u,s_v);
SEPARABLE_FUNCTION void log_lik(int k, const dvariable& u,...)
- g -= -log(s) - 0.5*square((y(k)-(mu + s_u*u + s_v*v))/s);
+ g -= -log(s) - 0.5*square((y(k)-(mu + s_u*u + s_v*v))/s);
\end{lstlisting}
If only a small proportion of all the possible combinations of $u_i$ and $v_j$
actually occurs in the data, then the posterior covariance matrix of
-$(u_{1},\ldots,u_{N},v_{1},\ldots,v_{M})$ will be sparse. When an executable
+$(u_1,\ldots,u_n,v_1,\ldots,v_M)$ will be sparse. When an executable
program produced by \scAR\ is invoked with the \texttt{-shess} command line
option, sparse matrix calculations are used.
@@ -1674,7 +1686,7 @@
situations, such as in spatial statistics, all the individual components of the
random effects vector will be jointly correlated. \scAB\ contains a special
feature (the \texttt{normal\_prior} keyword) for dealing efficiently with such
-models. The construct used to declaring a correlated Gaussian prior~is
+models. The construct used to declaring a correlated Gaussian prior is
\begin{lstlisting}
random_effects_vector u(1,n)
normal_prior S(u);
@@ -1682,24 +1694,24 @@
The first of these lines is an ordinary declaration of a random effects vector.
The second line tells \scAB\ that \texttt{u} has a multivariate Gaussian
distribution with zero expectation and covariance matrix~\texttt{S}, i.e., the
-probability density of $\mathbf{u}$~is
+probability density of $\mathbf{u}$ is
\[
-h(\mathbf{u)=}\left( 2\pi \right) ^{-\dim(S)/2}\det (S)^{-1/2}\,
-\exp \left( - \tfrac{1}{2}\mathbf{u}^{\prime} \, S^{-1}u\right) .
+h(\mathbf{u})=\left(2\pi \right)^{-\dim(S)/2}\det (S)^{-1/2}\,
+\exp \left(-\tfrac{1}{2}\mathbf{u}^{\prime} \, S^{-1}u\right) .
\]
Here, $S$ is allowed to depend on the hyper-parameters of the model. The part of
the code where \texttt{S} gets assigned its value must be placed in a
-\texttt{SEPARABLE\_FUNCTION}. % (see
+\texttt{SEPARABLE\_FUNCTION}.
\begin{itemize}
- \item[$\bigstar$] The log-prior $\log \left( h\left( \mathbf{u}\right) \right)
+ \item[$\bigstar$] The log-prior $\log \left(h\left(\mathbf{u}\right) \right)
$ is automatically subtracted from the objective function. Therefore, the
objective function must hold the negative log-likelihood when using the
\texttt{normal\_prior}.
-% \item[$\bigstar$] To verify that your model really is partially separable,
-% you should try replacing the \texttt{SEPARABLE\_FUNCTION} keyword with an
-% ordinary \texttt{FUNCTION}. Then verify on a small subset of your data that
-% the two versions of the program produce the same results. You should be able
-% to observe that the \texttt{SEPARABLE\_FUNCTION} version runs faster.
+ % \item[$\bigstar$] To verify that your model really is partially separable,
+ % you should try replacing the \texttt{SEPARABLE\_FUNCTION} keyword with an
+ % ordinary \texttt{FUNCTION}. Then verify on a small subset of your data that
+ % the two versions of the program produce the same results. You should be able
+ % to observe that the \texttt{SEPARABLE\_FUNCTION} version runs faster.
\end{itemize}
See a full example
\href{http://otter-rsch.com/admbre/examples/spatial/spatial.html}{here}.
@@ -1736,13 +1748,13 @@
Let $\mathbf{y}=(y_1,\ldots,y_n)$ be a vector of dichotomous observations
($y_i\in\{0,1\}$), and let $\mathbf{u}=(u_1,\ldots,u_q)$ be a vector of
-independent random effects, each with Gaussian distribution (expectation~$0$ and
-variance~$\sigma^2$). Define the success probability $\pi_i=\Pr(y_i=1)$. The
+independent random effects, each with Gaussian distribution (expectation $0$ and
+variance $\sigma^2$). Define the success probability $\pi_i=\Pr(y_i=1)$. The
following relationship between $\pi_i$ and explanatory variables (contained in
-matrices~$\mathbf{X}$ and~$\mathbf{Z}$) is assumed:
+matrices $\mathbf{X}$ and $\mathbf{Z}$) is assumed:
\[
- \log\left(\frac{\pi_i}{1-\pi_i}\right) = \mathbf{X}_i\mathbf{\beta} +
- \mathbf{Z}_i\mathbf{u},
+\log\left(\frac{\pi_i}{1-\pi_i}\right) = \mathbf{X}_i\mathbf{\beta} +
+\mathbf{Z}_i\mathbf{u},
\]
where $\mathbf{X}_i$ and $\mathbf{Z}_i$ are the $i^{\textrm{th}}$ rows of the
known covariates matrices $\mathbf{X}$ ($n\times p$) and $\mathbf{Z}$ ($n\times
@@ -1757,35 +1769,34 @@
hyper-parameters, as shown in the Table~\ref{tab:true-values}. The matrices
$\mathbf{X}$ and $\mathbf{Z}$ were generated randomly with each element
uniformly distributed on $[-2,2\,]$. As start values for both \ADM\ and \scBUGS,
-we used $\beta _{\mathrm{init},j}=-1$ and $\sigma_\mathrm{init}=4.5$. In
+we used $\beta_{\mathrm{init},j}=-1$ and $\sigma_\mathrm{init}=4.5$. In
\scBUGS, we used a uniform $[-10,10\,]$ prior on $\beta_j$ and a standard (in
the \scBUGS\ literature) noninformative gamma prior on $\tau=\sigma^{-2}$. In
\ADM, the parameter bounds $\beta_j\in[-10,10\,]$ and $\log\sigma\in[-5,3\,]$
were used in the optimization process.
-\begin{table}[h]
-\begin{center}
- \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrr}
- \hline
- ~ & $\beta_1$ & $\beta_2$ & $\beta_3$ & $\beta_4$ & $\beta_5$
- & $\sigma$\\
- \hline\\[-16pt]
- True values & 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000
- & 0.1000 \\
- \scAR\ & 0.0300 & -0.0700 & 0.0800 & 0.0800 & -0.1100
- & 0.1700 \\
- Std.\ dev. & 0.1500 & 0.1500 & 0.1500 & 0.1400 & 0.1600
- & 0.0500 \\
- \scWinBUGS\ & 0.0390 & -0.0787 & 0.0773 & 0.0840 & -0.1041
- & 0.1862 \\
- \hline
- \end{tabular}
-\end{center}
-\caption{True values}
-\label{tab:true-values}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrr}
+ \hline
+ & $\beta_1$ & $\beta_2$ & $\beta_3$ & $\beta_4$ & $\beta_5$ & $\sigma$\\
+ \hline\\[-16pt]
+ True values
+ & $0.0000$ & $ 0.0000$ & $0.0000$ & $0.0000$ & $ 0.0000$ & $0.1000$\\
+ \scAR
+ & $0.0300$ & $-0.0700$ & $0.0800$ & $0.0800$ & $-0.1100$ & $0.1700$\\
+ Std.\ dev.
+ & $0.1500$ & $ 0.1500$ & $0.1500$ & $0.1400$ & $ 0.1600$ & $0.0500$\\
+ \scWinBUGS
+ & $0.0390$ & $-0.0787$ & $0.0773$ & $0.0840$ & $-0.1041$ & $0.1862$\\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{True values}
+ \label{tab:true-values}
\end{table}
%
On the simulated data set, \ADM\ used $27$ seconds to converge to the optimum of
-likelihood surface. On the same data set, we first ran \scWinBUGS\ (Version~1.4)
+likelihood surface. On the same data set, we first ran \scWinBUGS\
for $5,000$ iterations. The recommended convergence diagnostic in \scWinBUGS\ is
the Gelman-Rubin plot (see the help files available from the menus in
\scWinBUGS), which require that two Markov chains are run in parallel. From the
@@ -1796,7 +1807,7 @@
See the files
\href{http://otter-rsch.com/admbre/examples/logistic/logistic.html}{here}.
-%x\newpage
+% x\newpage
\subsection{Generalized additive models (\scGAM{}s)}
\label{sec:gam}
@@ -1806,34 +1817,34 @@
A very useful generalization of the ordinary multiple regression
\[
-y_{i}=\mu +\beta _{1}x_{1,i}+\cdots +\beta _{p}x_{p,i}+\varepsilon _{i},
+y_i=\mu +\beta_1x_{1,i}+\cdots +\beta_px_{p,i}+\varepsilon_i,
\]%
is the class of additive model
\begin{equation}
-y_{i}=\mu +f_{1}(x_{1,i})+\cdots +f_{p}(x_{p,i})+\varepsilon _{i}.
-\label{eqn:gam}
+ y_i=\mu+f_1(x_{1,i})+\cdots +f_p(x_{p,i})+\varepsilon_i.
+ \label{eqn:gam}
\end{equation}%
\index{GAM@\textsc{gam}}
-Here, the $f_{j}$ are ``nonparametric'' components that can be modelled by
+Here, the $f_j$ are ``nonparametric'' components that can be modelled by
penalized splines. When this generalization is carried over to generalized
linear models, and we arrive at the class of \scGAM{}s~\cite{hast:tibs:1990}.
From a computational perspective, penalized splines are equivalent to random
effects, and thus GAMs fall naturally into the domain of \scAR.
-For each component $f_{j}$ in equation~(\ref{eqn:gam}), we construct a design
-matrix $\mathbf{X}$ such that $f_{j}(x_{i,j})=\mathbf{X}^{(i)}\mathbf{u}$, where
+For each component $f_j$ in equation~(\ref{eqn:gam}), we construct a design
+matrix $\mathbf{X}$ such that $f_j(x_{i,j})=\mathbf{X}^{(i)}\mathbf{u}$, where
$\mathbf{X}^{(i)}$ is the $i^{\textrm{th}}$ row of $\mathbf{X}$ and
-$\mathbf{u}$\ is a coefficient vector. We use the R-function
+$\mathbf{u}$\ is a coefficient vector. We use the R function
\texttt{splineDesign} (from the \texttt{splines} library) to construct a design
-matrix~$\mathbf{X}$. To avoid overfitting, we add a first-order difference
+matrix $\mathbf{X}$. To avoid overfitting, we add a first-order difference
penalty~\cite{eile:marx:1996}
\index{splines!difference penalty}
\begin{equation}
--\lambda ^{2}\sum_{k=2}\left( u_{k}-u_{k-1}\right) ^{2},
-\label{eqn:first_order}
+ -\lambda^2\sum_{k=2}\left(u_k-u_{k-1}\right)^2,
+ \label{eqn:first_order}
\end{equation}
to the ordinary \scGLM\ log-likelihood, where $\lambda $ is a smoothing
-parameter to be estimated. By viewing~$\mathbf{u}$ as a random effects vector
+parameter to be estimated. By viewing $\mathbf{u}$ as a random effects vector
with the above Gaussian prior, and by taking $\lambda $ as a hyper-parameter, it
becomes clear that GAM's are naturally handled in \scAR.
@@ -1841,12 +1852,12 @@
\begin{itemize}
\item A computationally more efficient implementation is obtained by moving
$\lambda $ from the penalty term to the design matrix, i.e.,
- $f_{j}(x_{i,j})=\lambda ^{-1}\mathbf{X}^{(i)}\mathbf{u}$.
+ $f_j(x_{i,j})=\lambda^{-1}\mathbf{X}^{(i)}\mathbf{u}$.
\item Since equation~(\ref{eqn:first_order}) does not penalize the mean
- of~$\mathbf{u}$, we impose the restriction that $\sum_{k=1}u_{k}=0$ (see
+ of $\mathbf{u}$, we impose the restriction that $\sum_{k=1}u_k=0$ (see
\texttt{union.tpl} for details). Without this restriction, the model would be
- over-parameterized, since we already have an overall mean~$\mu $ in
+ over-parameterized, since we already have an overall mean $\mu $ in
equation~(\ref{eqn:gam}).
\item To speed up computations, the parameter $\mu $ (and other regression
@@ -1854,24 +1865,24 @@
the $\mathbf{u}$s should be given ``phase 2.''
\end{itemize}
-\begin{figure}[h]
-\centering\hskip1pt
-\includegraphics[width=6in]{union_fig.pdf}
-\caption{Probability of membership as a function of covariates. In each plot,
-the remaining covariates are fixed at their sample means. The effective
-degrees of freedom (df) are also given \protect\cite{hast:tibs:1990}.}
-\label{fig:union}
+\begin{figure}[htbp]
+ \centering\hskip1pt
+ \includegraphics[width=6in]{union_fig.pdf}
+ \caption{Probability of membership as a function of covariates. In each plot,
+ the remaining covariates are fixed at their sample means. The effective
+ degrees of freedom (df) are also given~\protect\cite{hast:tibs:1990}.}
+ \label{fig:union}
\end{figure}
\subsubsection{The Wage-union data}
The data, which are available from \href{lib.stat.cmu.edu/}{Statlib}, contain
-information for each of 534 workers about whether they are members ($y_{i}=1$)
-of a workers union or are not ($y_{i}=0$). We study the probability of
+information for each of 534 workers about whether they are members ($y_i=1$)
+of a workers union or are not ($y_i=0$). We study the probability of
membership as a function of six covariates. Expressed in the notation used by
the R (S-Plus) function \texttt{gam}, the model is:
\begin{verbatim}
- union ~race + sex + south + s(wage) + s(age) + s(ed), family=binomial
+ union ~ race + sex + south + s(wage) + s(age) + s(ed), family=binomial
\end{verbatim}
Here, \texttt{s()} denotes a spline functions with 20 knots each. For
@@ -1885,13 +1896,14 @@
\begin{itemize}
\item The linear predictor may be a mix of ordinary regression terms
- ($f_{j}(x)=\beta _{j}x$) and nonparametric terms. \scAR\ offers a unified
- approach to fitting such models, in which the smoothing parameters $\lambda
- _{j}$ and the regression parameters $\beta _{j}$ are estimated simultaneously.
+ ($f_j(x)=\beta_jx$) and nonparametric terms. \scAR\ offers a unified
+ approach to fitting such models, in which the smoothing parameters
+ $\lambda_j$ and the regression parameters $\beta_j$ are estimated
+ simultaneously.
\item It is straightforward in \scAR\ to add ``ordinary'' random effects to
the model, for instance, to accommodate for correlation within groups of
- observations, as in \citeasnoun{lin:zhan:1999}.
+ observations, as in~\citeasnoun{lin:zhan:1999}.
\end{itemize}
See the files
@@ -1904,30 +1916,30 @@
\subsubsection{Model description}
An assumption underlying the ordinary regression
\[
-y_{i}=a+bx_{i}+\varepsilon _{i}^{\prime }
+y_i=a+bx_i+\varepsilon_i^{\prime}
\]
is that all observations have the same variance, i.e.,
-Var$\left( \varepsilon_{i}^{\prime }\right) =\sigma^{2}$. This assumption does
+Var$\left(\varepsilon_i^{\prime}\right) =\sigma^2$. This assumption does
not always hold, e.g., for the data shown in the upper panel of
Figure~\ref{fig:lidar}. This example is taken
from~\citeasnoun{rupp:wand:carr:2003}.
-It is clear that the variance increases to the right (for large values of~$x$).
-It is also clear that the mean of~$y$ is not a linear function of~$x$. We thus
+It is clear that the variance increases to the right (for large values of $x$).
+It is also clear that the mean of $y$ is not a linear function of $x$. We thus
fit the model
\[
-y_{i}=f(x_{i})+\sigma (x_{i})\varepsilon _{i},
+y_i=f(x_i)+\sigma (x_i)\varepsilon_i,
\]
-where $\varepsilon _{i}\sim N(0,1),$ and $f(x)$ and $\sigma (x)$ are modelled
-nonparametrically. We take~$f$ to be a penalized spline. To ensure that $\sigma
+where $\varepsilon_i\sim N(0,1),$ and $f(x)$ and $\sigma (x)$ are modelled
+nonparametrically. We take $f$ to be a penalized spline. To ensure that $\sigma
(x)>0$, we model $\log \left[ \sigma (x)\right] $, rather than $\sigma (x)$, as
-a spline function. For~$f$, we use a cubic spline (20 knots) with a second-order
+a spline function. For $f$, we use a cubic spline (20 knots) with a second-order
difference penalty%
\[
--\lambda ^{2}\sum_{k=3}^{20}\left( u_{j}-2u_{j-1}+u_{j-2}\right) ^{2},
+-\lambda^2\sum_{k=3}^{20}\left(u_j-2u_{j-1}+u_{j-2}\right)^2,
\]
while we take $\log \left[ \sigma (x)\right]$ to be a linear spline (20 knots)
-with the first-order difference penalty (see equation~(\ref{eqn:first_order})).
+with the first-order difference penalty (see equation~\ref{eqn:first_order}).
\subsubsection{Implementation details}
@@ -1936,12 +1948,12 @@
\begin{itemize}
\item Parameters associated with $f$ should be given ``phase~1'' in \scAB,
- while those associated with~$\sigma $ should be given ``phase~2.'' The reason
+ while those associated with $\sigma $ should be given ``phase~2.'' The reason
is that in order to estimate the variation, one first needs to have fitted the
mean part.
\item In order to estimate the variation function, one first needs to have
- fitted the mean part. Parameters associated with~$f$ should thus be given
+ fitted the mean part. Parameters associated with $f$ should thus be given
``phase 1'' in \scAB, while those associated with $\sigma$ should be given
``phase 2.''
\end{itemize}
@@ -1949,9 +1961,9 @@
\begin{figure}[h]
\centering\hskip1pt
\includegraphics[width=4in]{lidar_fig.pdf}
- \caption{\scLIDAR\ data (upper panel) used by
- \protect\citeasnoun{rupp:wand:carr:2003}, with fitted mean. Fitted standard
- deviation is shown in the lower panel.}
+ \caption{\scLIDAR\ data (upper panel) used
+ by~\protect\citeasnoun{rupp:wand:carr:2003}, with fitted mean. Fitted
+ standard deviation is shown in the lower panel.}
\label{fig:lidar}
\end{figure}
@@ -1962,27 +1974,27 @@
\subsubsection{Model description}
\label{sec:kidney_example}
-A typical setting in survival analysis is that we observe the time point~$t$ at
+A typical setting in survival analysis is that we observe the time point $t$ at
which the death of a patient occurs. Patients may leave the study (for some
reason) before they die. In this case, the survival time is said to be
-``censored,'' and~$t$ refers to the time point when the patient left the study.
-The indicator variable~$\delta$ is used to indicate whether~$t$ refers to the
+``censored,'' and $t$ refers to the time point when the patient left the study.
+The indicator variable $\delta$ is used to indicate whether $t$ refers to the
death of the patient ($\delta=1$) or to a censoring event ($\delta=0$). The key
-quantity in modelling the probability distribution of~$t$ is the hazard
-function~$h(t)$, which measures the instantaneous death rate at time~$t$. We
+quantity in modelling the probability distribution of $t$ is the hazard
+function $h(t)$, which measures the instantaneous death rate at time $t$. We
also define the cumulative hazard function $\Lambda(t)=\int_0^t \,
h(s)\,\textrm{ds}$, implicitly assuming that the study started at time $t=0$.
The log-likelihood contribution from our patient is $\delta\log(h(t))-H(t)$. A
commonly used model for $h(t)$ is Cox's proportional hazard model, in which the
hazard rate for the $i^{\textrm{th}}$ patient is assumed to be on the form
\[
- h_it = h_0(t)\exp(\eta_i\mathbf), \qquad i=1,\ldots n.
+h_it = h_0(t)\exp(\eta_i\mathbf), \qquad i=1,\ldots n.
\]
Here, $h_0(t)$ is the ``baseline'' hazard function (common to all patients) and
$\eta_i=\mathbf{X}_i\mathbf{\beta}$, where $\mathbf{X}_i$ is a covariate vector
specific to the $i^{\textrm{th}}$ patient and $\mathbf{\beta}$ is a vector of
regression parameters. In this example, we shall assume that the baseline hazard
-belongs to the Weibull family: $h_0(t)=rt^{r-1}$ for~$r>0$.
+belongs to the Weibull family: $h_0(t)=rt^{r-1}$ for $r>0$.
In the collection of examples following the distribution of \scWinBUGS, this
model is used to analyse a data set on times to kidney infection for a set of
@@ -1996,34 +2008,33 @@
individual-specific random effect $u_i\sim N(0,\sigma^2)$. Thus, the linear
predictor becomes
\[
- \eta_i = \beta_0 + \beta_\mathrm{sex} \cdot \mathrm{sex}_i +
- \beta_\mathrm{age} \cdot \mathrm{age}_i +
- \mathbf{\beta}_\mathrm{D}\,\mathbf{x}_i + u_i,
+\eta_i = \beta_0 + \beta_\mathrm{sex} \cdot \mathrm{sex}_i +
+\beta_\mathrm{age} \cdot \mathrm{age}_i +
+\mathbf{\beta}_\mathrm{D}\,\mathbf{x}_i + u_i,
\]
where $\mathbf{\beta}_\mathrm{D}=(\beta_1,\beta_2,\beta_3)$ and $\mathbf{x}_i$
is a dummy vector coding for the disease type. Parameter estimates are shown in
Table~\ref{kidney-parameter-estimates}.
-\begin{table}[h]
-\begin{center}
- %\footnotesize
- \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrrrr}
- \hline
- & $\beta_0$ & $\beta_\mathrm{age}$ & $\beta_1$ & $\beta_2$
- & $ \beta_3$ & $\beta_\mathrm{sex}$ & $r$ & $\sigma$\\
- \hline\\[-16pt]
- \scAR\ & -4.3440 & 0.0030 & 0.1208 & 0.6058
- & -1.1423 & -1.8767 & 1.1624 & 0.5617 \\
- Std.\ dev. & 0.8720 & 0.0137 & 0.5008 & 0.5011
- & 0.7729 & 0.4754 & 0.1626 & 0.2970 \\
- BUGS & -4.6000 & 0.0030 & 0.1329 & 0.6444
- & -1.1680 & -1.9380 & 1.2150 & 0.6374 \\
- Std.\ dev. & 0.8962 & 0.0148 & 0.5393 & 0.5301
- & 0.8335 & 0.4854 & 0.1623 & 0.3570 \\
- \hline
- \end{tabular}
-\end{center}
-\caption{Parameter estimates for Weibull regression with random effects.}
-\label{kidney-parameter-estimates}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrrrr}
+ \hline
+ ~ & $\beta_0$ & $\beta_\mathrm{age}$ & $\beta_1$
+ & $\beta_2$ & $\beta_3$ & $\beta_\mathrm{sex}$ & $r$ & $\sigma$\\
+ \hline\\[-16pt]
+ \scAR\ & $-4.3440$ & $ 0.0030$ & $0.1208$
+ & 0.6058 & $-1.1423$ & $-1.8767$ & $1.1624$ & $0.5617$\\
+ Std.\ dev. & $ 0.8720$ & $ 0.0137$ & $0.5008$
+ & 0.5011 & $ 0.7729$ & $ 0.4754$ & $0.1626$ & $0.2970$\\
+ BUGS & $-4.6000$ & $ 0.0030$ & $0.1329$
+ & 0.6444 & $-1.1680$ & $-1.9380$ & $1.2150$ & $0.6374$\\
+ Std.\ dev. & $ 0.8962$ & $ 0.0148$ & $0.5393$
+ & 0.5301 & $ 0.8335$ & $ 0.4854$ & $0.1623$ & $0.3570$\\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{Parameter estimates for Weibull regression with random effects.}
+ \label{kidney-parameter-estimates}
\end{table}
See the files
\href{http://otter-rsch.com/admbre/examples/kidney/kidney.html}{here}.
@@ -2037,45 +2048,45 @@
\subsubsection{Model description}
-The orange tree growth data was used by \citeasnoun[Ch.8.2]{pinh:bate:2000} to
+The orange tree growth data was used by~\citeasnoun[Ch.~8.2]{pinh:bate:2000} to
illustrate how a logistic growth curve model with random effects can be fit with
the S-Plus function \texttt{nlme}. The data contain measurements made at seven
occasions for each of five orange trees. See Table~\ref{tab:orange-trees}.
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{ll}
-$t_{ij}$
-& Time point when the $j^{\textrm{th}}$ measurement was made on tree~$i$. \\
-$y_{ij}$
-& Trunk circumference of tree $i$ when measured at time point~$t_{ij}$.
-\end{tabular}
-\end{center}
-\caption{Orange tree data.}
-\label{tab:orange-trees}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{ll}
+ $t_{ij}$
+ & Time when the $j^{\textrm{th}}$ measurement was made on tree $i$.\\
+ $y_{ij}$
+ & Trunk circumference of tree $i$ when measured at time $t_{ij}$. \\
+ \end{tabular}
+ \end{center}
+ \caption{Orange tree data.}
+ \label{tab:orange-trees}
\end{table}
-The following logistic model is used:
+\\The following logistic model is used:
\[
-y_{ij}=\frac{\phi_{1}+u_i}{1+\exp \left[ -\left( t_{ij}-\phi_{2}\right)
-/\phi_{3}\right] }+\varepsilon_{ij},
+y_{ij}=\frac{\phi_1+u_i}{1+\exp \left[ -\left(t_{ij}-\phi_2\right)
+ /\phi_3\right]}+\varepsilon_{ij},
\]%
-where $(\phi_{1},\phi_{2},\phi_{3})$ are hyper-parameters, and
-$u_i\sim N(0,\sigma_{u}^{2})$ is a random effect, and
-$\varepsilon_{ij}\sim N(0,\sigma ^{2})$ is the residual noise term.
+where $(\phi_1,\phi_2,\phi_3)$ are hyper-parameters,
+$u_i\sim N(0,\sigma_u^2)$ is a random effect, and
+$\varepsilon_{ij}\sim N(0,\sigma^2)$ is the residual noise term.
\subsubsection{Results}
Parameter estimates are shown in Table~\ref{tab:parameter-estimates}.
-\begin{table}[h]
+\begin{table}[htbp]
\begin{center}
- \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} llllll}
- \hline
-& $\phi_{1}$ & $\phi_{2}$ & $\phi_{3}$ & $\sigma $ & $\sigma_{u}$ \\
- \hline\\[-16pt]
- \scAR\ & 192.1 & 727.9 & 348.1 & 7.843 & 31.65 \\
- Std. dev. & 15.658 & 35.249 & 27.08 & 1.013 & 10.26 \\
- \texttt{nlme} & 191.0 & 722.6 & 344.2 & 7.846 & 31.48 \\
- \hline
- \end{tabular}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrr}
+ \hline
+ ~ & $\phi_1$ & $\phi_2$ & $\phi_3$ & $\sigma$ & $\sigma_u$\\
+ \hline\\[-16pt]
+ \scAR & 192.1 & 727.9 & 348.1 & 7.843 & 31.65\\
+ Std.\ dev. & 15.7 & 35.2 & 27.1 & 1.013 & 10.26\\
+ \texttt{nlme} & 191.0 & 722.6 & 344.2 & 7.846 & 31.48\\
+ \hline
+ \end{tabular}
\end{center}
\caption{Parameter estimates.}
\label{tab:parameter-estimates}
@@ -2098,45 +2109,48 @@
\subsubsection{Model description}
The ``one-compartment open model'' is commonly used in pharmacokinetics. It can
-be described as follows. A patient receives a dose~$D$ of some substance at
-time~$t_{d}$. The concentration~$c_t$ at a later time point~$t$ is governed by
+be described as follows. A patient receives a dose $D$ of some substance at
+time $t_d$. The concentration $c_t$ at a later time point $t$ is governed by
the equation
\[
-c_t=\tfrac{D}{V}\exp \left[ -\tfrac{Cl}{V}(t-t_{d})\right]
+c_t=\tfrac{D}{V}\exp \left[ -\tfrac{Cl}{V}(t-t_d)\right]
\]
where $V$ and $Cl$ are parameters (the so-called ``Volume of concentration'' and
the ``Clearance''). Doses given at different time points contribute additively
-to~$c_t$. \citeasnoun[Ch.~6.4]{pinh:bate:2000} fitted this model to a data set
-using the S-Plus routine \texttt{nlme}. The linear predictor used by
-\citeasnoun[p.~300]{pinh:bate:2000} is:
+to $c_t$. This model has been fitted to a data set using the S-Plus routine
+\texttt{nlme}, see~\citeasnoun[Ch.~6.4]{pinh:bate:2000}, with the linear
+predictor
\begin{align*}
-\log \left( V\right) &=\beta_{1}+\beta_{2}Wt+u_{V}, \\
-\log \left( Cl\right) &=\beta_{3}+\beta_{4}Wt+u_{Cl},
+ \log\left(V\right) &=\beta_1+\beta_2W+u_V, \\
+ \log\left(Cl\right) &=\beta_3+\beta_4W+u_{Cl},
\end{align*}
-where $Wt$ is a continuous covariate, while $u_{V}\sim N(0,\sigma_{V}^{2})$ and
-$u_{Cl}\sim N(0,\sigma_{Cl}^{2})$ are random effects. The model specification is
-completed by the requirement that the observed concentration~$y$ in the patient
+where $W$ is a continuous covariate, while $u_V\sim N(0,\sigma_V^2)$ and
+$u_{Cl}\sim N(0,\sigma_{Cl}^2)$ are random effects. The model specification is
+completed by the requirement that the observed concentration $y$ in the patient
is related to the true concentration by $y=c_t+\varepsilon $, where $\varepsilon
-\sim N(0,\sigma ^{2})$ is a measurement error term.
+\sim N(0,\sigma^2)$ is a measurement error term.
\subsubsection{Results}
Estimates of hyper-parameters are shown in Table~\ref{tab:hyper-estimates}.
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} llllllll}
- \hline
-& $\beta_{1}$ & $\beta_{2}$ & $\beta_{3}$ & $\beta_{4}$ & $\sigma $ & $
-\sigma_{V}$ & $\sigma_{Cl}$ \\
- \hline\\[-17pt]
-\scAR\ & -5.99 & 0.622 & -0.471 & 0.532 & 2.72 & 0.171 & 0.227 \\
-Std. Dev & 0.13 & 0.076 & 0.067 & 0.040 & 0.23 & 0.024 & 0.054 \\
-\texttt{nlme} & -5.96 & 0.620 & -0.485 & 0.532 & 2.73 & 0.173 & 0.216\\
- \hline
-\end{tabular}
-\end{center}
-\caption{Hyper-parameter estimates: pharmacokinetics.}
-\label{tab:hyper-estimates}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrrr}
+ \hline
+ ~ & $\beta_1$ & $\beta_2$ & $\beta_3$
+ & $\beta_4$ & $\sigma$ & $\sigma_V$ & $\sigma_{Cl}$\\
+ \hline\\[-17pt]
+ \scAR & $-5.99$ & $0.622$ & $-0.471$
+ & $0.532$ & $ 2.72$ & $0.171$ & $ 0.227$\\
+ Std.\ dev. & $ 0.13$ & $0.076$ & $ 0.067$
+ & $0.040$ & $ 0.23$ & $0.024$ & $ 0.054$\\
+ \texttt{nlme} & $-5.96$ & $0.620$ & $-0.485$
+ & $0.532$ & $ 2.73$ & $0.173$ & $ 0.216$\\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{Hyper-parameter estimates: pharmacokinetics.}
+ \label{tab:hyper-estimates}
\end{table}
The differences between the estimates obtained with \scAR\ and \texttt{nlme} are
caused by the fact that the two methods use different approximations of the
@@ -2154,26 +2168,26 @@
\subsubsection{Model description}
-Let $X_{i}$ be binomially distributed with parameters $N=2$ and $p_{i}$, and
+Let $X_i$ be binomially distributed with parameters $N=2$ and $p_i$, and
further assume that
\begin{equation}
-p_{i}=\frac{\exp (\mu +u_{i})}{1+\exp (\mu +u_{i})},
+ p_i=\frac{\exp (\mu +u_i)}{1+\exp (\mu +u_i)},
\end{equation}%
-where $\mu $ is a parameter and $u_{i}\sim N(0,\sigma ^{2})$ is a random effect.
+where $\mu $ is a parameter and $u_i\sim N(0,\sigma^2)$ is a random effect.
Assuming independence, the log-likelihood function for the parameter $\theta
-=(\mu,\sigma )$ can be written as
+=(\mu,\sigma)$ can be written as
\begin{equation}
-l(\theta )=\sum_{i=1}^{n}\log \bigl[\, p(x_{i};\theta )\bigr] .
+ l(\theta)=\sum_{i=1}^n\log \bigl[\, p(x_i;\theta)\bigr] .
\end{equation}%
-In \scAR, $p(x_{i};\theta )$ is approximated using the Laplace approximation.
-However, since $x_{i}$ only can take the values~$0$, $1$, and~$2$, we can
+In \scAR, $p(x_i;\theta)$ is approximated using the Laplace approximation.
+However, since $x_i$ only can take the values $0$, $1$, and $2$, we can
rewrite the log-likelihood as
\begin{equation}
-l(\theta )=\sum_{j=0}^{2}n_{j}\log \bigl[\, p(j;\theta )\bigr], \label{l_w}
+ l(\theta)=\sum_{j=0}^2n_j\log \bigl[\, p(j;\theta)\bigr], \label{l_w}
\end{equation}%
-where $n_{j}$ is the number $x_{i}$ being equal to~$j$. Still, the Laplace
-approximation must be used to approximate $p(j;\theta )$, but now only for
-$j=0,1,2$, as opposed to $j = 1,\dots n$, as above. For large~$n$, this can give
+where $n_j$ is the number $x_i$ being equal to $j$. Still, the Laplace
+approximation must be used to approximate $p(j;\theta)$, but now only for
+$j=0,1,2$, as opposed to $j = 1,\dots n$, as above. For large $n$, this can give
large savings.
To implement the log-likelihood (\ref{l_w}) in \scAR, you must organize your
@@ -2186,9 +2200,9 @@
\begin{lstlisting}
!! set_multinomial_weights(w);
\end{lstlisting}
-in the \texttt{PARAMETER\_SECTION}. The variable \texttt{w} is a vector (with
-indexes starting at~1) containing the weights, so in our case,
-$w=(n_{0},n_{1},n_{2})$.
+ in the \texttt{PARAMETER\_SECTION}. The variable \texttt{w} is a vector (with
+ indexes starting at~1) containing the weights, so in our case,
+ $w=(n_0,n_1,n_2)$.
\end{itemize}
See the files
@@ -2204,20 +2218,20 @@
$s=1,\ldots,S-1$, define $P_s=P(y\leq y^{(s)})$ and $\kappa_s=\log
[P_s/(1-P_s)]$. To allow $\kappa_s$ to depend on covariates specific to the
$i^{\textrm{th}}$ observation ($i=1,\ldots,n$), we introduce a
-disturbance~$\eta_i$ of~$\kappa_s$:
+disturbance $\eta_i$ of $\kappa_s$:
\[
- P(y_i\leq y^{(s)}) =
- \frac{\exp(\kappa_s-\eta_i)}
- {1+\exp(\kappa _{s}-\eta _{i})}, \qquad s=1,\ldots,S-1.
+P(y_i\leq y^{(s)}) =
+\frac{\exp(\kappa_s-\eta_i)}
+{1+\exp(\kappa_s-\eta_i)}, \qquad s=1,\ldots,S-1.
\]
with
\[
- \eta_i = \mathbf{X}_i\mathbf{\beta}+u_{j_i},
+\eta_i = \mathbf{X}_i\mathbf{\beta}+u_{j_i},
\]
where $\mathbf{X}_i$ and $\mathbf{\beta}$ play the sample role, as in earlier
-examples. %Example~1-3
+examples. %Example 1-3
The $u_j$ ($j=1,\ldots,q$) are independent $N(0,\sigma^2)$ variables, and $j_i$
-is the latent variable class of individual~$i$.
+is the latent variable class of individual $i$.
See the files
\href{http://otter-rsch.com/admbre/examples/socatt/socatt.html}{here}.
@@ -2233,19 +2247,19 @@
Stochastic volatility models are used in mathematical finance to describe the
evolution of asset returns, which typically exhibit changing variances over
time. As an illustration, we use a time series of daily pound/dollar exchange
-rates~$\{z_t\}$ from the period 01/10/81 to 28/6/85, previously analyzed
+rates $\{z_t\}$ from the period 01/10/81 to 28/6/85, previously analyzed
by~\citeasnoun{harv:ruiz:shep:1994}. The series of interest are the daily
-mean-corrected returns~$\{y_t\}$, given by the transformation
+mean-corrected returns $\{y_t\}$, given by the transformation
\[
- y_t = \log z_t - \log z_{t-1} - n^{-1}\sum_{i=1}^n(\log z_t-\log z_{t-1}).
+y_t = \log z_t - \log z_{t-1} - n^{-1}\sum_{i=1}^n(\log z_t-\log z_{t-1}).
\]
-The stochastic volatility model allows the variance of~$y_t$ to vary smoothly
+The stochastic volatility model allows the variance of $y_t$ to vary smoothly
with time. This is achieved by assuming that $yt\sim N(\mu,\sigma_t^2)$, where
-$\sigma_t^2=\exp(\mu_x+x_t)$. The smoothly varying component~$x_t$ follows the
+$\sigma_t^2=\exp(\mu_x+x_t)$. The smoothly varying component $x_t$ follows the
autoregression
\[
- x_t = \beta x_{t-1} + \varepsilon_t, \qquad \varepsilon_t \sim N(0,\sigma^2).
+x_t = \beta x_{t-1} + \varepsilon_t, \qquad \varepsilon_t \sim N(0,\sigma^2).
\]
The vector of hyper-parameters is for this model is thus
@@ -2258,59 +2272,62 @@
\subsubsection{Model description}
-\citeasnoun{zege:1988} analyzed a time series of monthly numbers of
-poliomyelitis cases during the period 1970--1983 in the U.S. We make a
+A time series of monthly numbers of poliomyelitis cases during the period
+1970--1983 in the U.S.\ was analyzed by~\citeasnoun{zege:1988}. We make a
comparison to the performance of the Monte Carlo Newton-Raphson method, as
reported in~\citeasnoun{kuk:chen:1999}. We adopt their model formulation.
-Let $y_{i}$ denote the number of polio cases in the $i^{\textrm{th}}$ period
-$(i=1,\ldots,168)$. It is assumed that the distribution of~$y_{i}$ is governed
-by a latent stationary AR(1) process~$\{u_i\}$ satisfying
+Let $y_i$ denote the number of polio cases in the $i^{\textrm{th}}$ period
+$(i=1,\ldots,168)$. It is assumed that the distribution of $y_i$ is governed
+by a latent stationary AR(1) process $\{u_i\}$ satisfying
\[
- u_i = \rho u_{i-1} + \varepsilon_i,
+u_i = \rho u_{i-1} + \varepsilon_i,
\]
where the $\varepsilon_i\sim N(0,\sigma^2)$. % variables.
To account for trend and seasonality, the following covariate vector is
introduced:
\[
- \mathbf{x}_i = \Bigg(
- 1,
- \frac{i}{1000},
- \cos\left(\frac{2\pi}{12}i\right),
- \sin\left(\frac{2\pi}{12}i\right),
- \cos\left(\frac{2\pi}{6}i\right),
- \sin\left(\frac{2\pi}{6}i\right)
- \Bigg).
+\mathbf{x}_i = \Bigg(
+1,
+\frac{i}{1000},
+\cos\left(\frac{2\pi}{12}i\right),
+\sin\left(\frac{2\pi}{12}i\right),
+\cos\left(\frac{2\pi}{6}i\right),
+\sin\left(\frac{2\pi}{6}i\right)
+\Bigg).
\]
Conditionally on the latent process $\{u_i\}$, the counts $y_i$ are
independently Poisson distributed with intensity
\[
- \lambda_i=\exp(\mathbf{x}_i{}'\mathbf{\beta}+u_i).
+\lambda_i=\exp(\mathbf{x}_i{}'\mathbf{\beta}+u_i).
\]
\subsubsection{Results}
Estimates of hyper-parameters are shown in Table~\ref{tab:hyper-estimates-2}.
-\begin{table}[h]
-\begin{center}
- \footnotesize
- \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrrrr}
- \hline
- ~ & $\beta_1$ & $\beta_2$ & $\beta_3$ & $\beta_4$
- & $\beta_5$ & $\beta_6$ & $\rho$ & $\sigma$\\
- \hline\\[-16pt]
- \scAR\ & 0.242 & -3.81 & 0.162 & -0.482
- & 0.413 & -0.0109 & 0.627 & 0.538 \\
- Std.\ dev. & 0.270 & 2.76 & 0.150 & 0.160
- & 0.130 & 0.1300 & 0.190 & 0.150 \\
- \citeasnoun{kuk:chen:1999} & 0.244 & -3.82 & 0.162 & -0.478
- & 0.413 & -0.0109 & 0.665 & 0.519 \\
- \hline
- \end{tabular}
-\end{center}
-\caption{Hyper-parameter estimates: polio data set.}
-\label{tab:hyper-estimates-2}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular}{@{\vrule height 12pt depth 6pt width0pt} lrrrrrrrr}
+ \hline
+ ~
+ & $\beta_1$ & $\beta_2$ & $\beta_3$ & $\beta_4$
+ & $\beta_5$ & $\beta_6$ & $\rho$ & $\sigma$ \\
+ \hline\\[-16pt]
+ \scAR\
+ & $0.242$ & $-3.81$ & $0.162$ & $-0.482$
+ & $0.413$ & $-0.0109$ & $0.627$ & $ 0.538$ \\
+ Std.\ dev.
+ & $0.270$ & $ 2.76$ & $0.150$ & $ 0.160$
+ & $0.130$ & $ 0.1300$ & $0.190$ & $ 0.150$ \\
+ Kuk \& Cheng~\citeasnoun{kuk:chen:1999}
+ & $0.244$ & $-3.82$ & $0.162$ & $-0.478$
+ & $0.413$ & $-0.0109$ & $0.665$ & $ 0.519$ \\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{Hyper-parameter estimates: polio data set.}
+ \label{tab:hyper-estimates-2}
\end{table}
We note that % not
@@ -2328,7 +2345,7 @@
The multilevel Rasch model can be implemented using random effects in \scAB. As
an example, we use data on the responses of 2042~soldiers to a total of 19~items
-(questions), taken from~\cite{doran2007estimating} %Doran et al.\ (2007).
+(questions), taken from~\cite{doran2007estimating}.
This illustrates the use of crossed random effects in \scAB. Furthermore, it is
shown how the model easily can be generalized in \scAB. These more general
models cannot be fitted with standard \scGLMM\ software, such as ``lmer'' in~R.
@@ -2347,19 +2364,19 @@
implemented.
\item The assignment operator for \texttt{dvariable} behaves differently. The
code
-\begin{lstlisting}
- dvariable y = 1;
- dvariable x = y;
-\end{lstlisting}
+ \begin{lstlisting}
+ dvariable y = 1;
+ dvariable x = y;
+ \end{lstlisting}
will make \texttt{x} and \texttt{y} point to the same memory location (shallow
copy) in \scAR. Hence, changing the value of \texttt{x} automatically changes
\texttt{y}. Under \scAB, on the other hand, \texttt{x} and \texttt{y} will
refer to different memory locations (deep copy). If you want to perform a deep
copy in \scAR\ you should write:
-\begin{lstlisting}
- dvariable y = 1;
- dvariable x;
- x = y;
+ \begin{lstlisting}
+ dvariable y = 1;
+ dvariable x;
+ x = y;
\end{lstlisting}
For vector and matrix objects \scAB\ and \scAR\ behave identically in that
a shallow copy is used.
@@ -2376,50 +2393,67 @@
\end{lstlisting}
Those options that are specific to \scAR\ are printed after line the ``Random
effects options if applicable.'' See Table~\ref{tab:command-line-options}.
-\begin{table}[h]
-\begin{center}
-\begin{tabular*}{.95\textwidth}%
-{@{\vrule height 14pt depth 10pt width0pt}@{\extracolsep{1em}} l
- p{.8\textwidth} }
-\hline
-\textbf{Option}& \textbf{Explanation}\\[-3pt]
-\hline
-\hline
-\texttt{-nr N} & maximum number of Newton-Raphson steps\\
-\texttt{-imaxfn N}
-& maximum number of evals in quasi-Newton inner optimization\\
-\texttt{-is N}
-& set importance sampling size to \texttt{N} for random effects\\
-\texttt{-isf N}
-& set importance sampling size funnel blocks to \texttt{N} for random effects\\
-\texttt{-isdiag} & print importance sampling diagnostics\\
-\texttt{-hybrid} & do hybrid Monte Carlo version of \scMCMC\\
-\texttt{-hbf} & set the hybrid bounded flag for bounded parameters\\
-\texttt{-hyeps} & mean step size for hybrid Monte Carlo\\
-\texttt{-hynstep} & number of steps for hybrid Monte Carlo\\
-\texttt{-noinit} & do not initialize random effects before inner optimzation\\
-\texttt{-ndi N} & set maximum number of separable calls\\
-\texttt{-ndb N}
-& set number of blocks for derivatives for random effects (reduces temporary
-file sizes)\\
-\texttt{-ddnr}
-& use high-precision Newton-Raphson for inner optimization for banded
-\mbox{Hessian} case \textit{only}, even if implemented\\
-\texttt{-nrdbg} & verbose reporting for debugging Newton-Raphson\\
-\texttt{-mm N} & do minimax optimization\\
-\texttt{-shess} & use sparse Hessian structure inner optimzation\\
-\hline
-\texttt{-l1 N} & set the size of buffer \texttt{f1b2list1} to~\texttt{N} \\
-\texttt{-l2 N} & set the size of buffer \texttt{f1b2list12} to~\texttt{N} \\
-\texttt{-l3 N} & set the size of buffer \texttt{f1b2list13} to~\texttt{N} \\
-\texttt{-nl1 N} & set the size of buffer \texttt{nf1b2list1} to~\texttt{N} \\
-\texttt{-nl2 N} & set the size of buffer \texttt{nf1b2list12} to~\texttt{N}\\
-\texttt{-nl3 N}
-& set the size of buffer \texttt{nf1b2list13} to~\texttt{N} \\\hline
-\end{tabular*}
-\end{center}
-\caption{Command line options.}
-\label{tab:command-line-options}
+\begin{table}[htbp]
+ \begin{center}
+ \begin{tabular*}{.95\textwidth}%
+ {@{\vrule height 14pt depth 10pt width0pt}@{\extracolsep{1em}} l
+ p{.8\textwidth}}
+ \hline
+ \textbf{Option}
+ & \textbf{Explanation}\\[-3pt]
+ \hline
+ \texttt{-nr N}
+ & maximum number of Newton-Raphson steps\\
+ \texttt{-imaxfn N}
+ & maximum number of evals in quasi-Newton inner optimization\\
+ \texttt{-is N}
+ & set importance sampling size to \texttt{N} for random effects\\
+ \texttt{-isf N}
+ & set importance sampling size funnel blocks to \texttt{N} for random
+ effects\\
+ \texttt{-isdiag}
+ & print importance sampling diagnostics\\
+ \texttt{-hybrid}
+ & do hybrid Monte Carlo version of \scMCMC\\
+ \texttt{-hbf}
+ & set the hybrid bounded flag for bounded parameters\\
+ \texttt{-hyeps}
+ & mean step size for hybrid Monte Carlo\\
+ \texttt{-hynstep}
+ & number of steps for hybrid Monte Carlo\\
+ \texttt{-noinit}
+ & do not initialize random effects before inner optimzation\\
+ \texttt{-ndi N}
+ & set maximum number of separable calls\\
+ \texttt{-ndb N}
+ & set number of blocks for derivatives for random effects (reduces
+ temporary file sizes)\\
+ \texttt{-ddnr}
+ & use high-precision Newton-Raphson for inner optimization for banded
+ \mbox{Hessian} case \textit{only}, even if implemented\\
+ \texttt{-nrdbg}
+ & verbose reporting for debugging Newton-Raphson\\
+ \texttt{-mm N}
+ & do minimax optimization\\
+ \texttt{-shess}
+ & use sparse Hessian structure inner optimzation\\
+ \hline
+ \texttt{-l1 N}
+ & set size of buffer \texttt{f1b2list1} to~\texttt{N}\\
+ \texttt{-l2 N}
+ & set size of buffer \texttt{f1b2list12} to~\texttt{N}\\
+ \texttt{-l3 N}
+ & set size of buffer \texttt{f1b2list13} to~\texttt{N}\\
+ \texttt{-nl1 N}
+ & set size of buffer \texttt{nf1b2list1} to~\texttt{N}\\
+ \texttt{-nl2 N}
+ & set size of buffer \texttt{nf1b2list12} to~\texttt{N}\\
+ \texttt{-nl3 N}
+ & set size of buffer \texttt{nf1b2list13} to~\texttt{N}\\\hline
+ \end{tabular*}
+ \end{center}
+ \caption{Command line options.}
+ \label{tab:command-line-options}
\end{table}
The options in the last section (the sections are separated by horizontal bars)
are not printed, but can still be used (see earlier).
@@ -2432,24 +2466,24 @@
To compile \texttt{model.tpl} in a \textsc{DOS}/Linux terminal window, type
\begin{code}
- admb [-s] [-re] model
+ admb [-r] [-s] model
\end{code}
where the options
\par
-\begin{tabular}{@{\texttt} l l }
- -s & yields the ``safe'' version of the \textsc{exe} file\\
- -re & is used to invoke the random effect module
+\begin{tabular}{@{\texttt} l l}
+ -r & is used to invoke the random effect module \\
+ -s & yields the ``safe'' version of the executable file\\
\end{tabular}
\medskip
\noindent There are two stages of compilation:
\begin{itemize}
- \item Proprocessor: \texttt{tpl2cpp} or \texttt{tpl2rem}
- \item \cplus\ compiler (Borland, Visual \cplus, etc.)
+ \item Translate TPL to \cplus: \texttt{tpl2cpp} or \texttt{tpl2rem}
+ \item Build executable from \cplus\ code (using GCC, Visual \cplus, etc.)
\end{itemize}
\begin{center}
-\includegraphics[width=11cm]{compiling-diagram.png}
+ \includegraphics[width=11cm]{compiling-diagram}
\end{center}
\hskip-2pc\includegraphics[width=18cm]{ADMBprim.pdf}%17