Revision 1000 trunk/docs/manuals/admbre/admbre.tex
admbre.tex (revision 1000)  

140  140 
\item Hyperparameters (variance components, etc.) estimated by maximum 
141  141 
likelihood. 
142  142  
143 
\item Marginal likelihood evaluated by the Laplace approximation or importance


144 
sampling.


143 
\item Marginal likelihood evaluated by the Laplace approximation, (adaptive) importance


144 
sampling or GaussHermite integration.


145  145  
146  146 
\item Exact derivatives calculated using Automatic Differentiation. 
147  147  
...  ...  
179  179 
\item \textit{Compilation}: You turn your model into an executable program 
180  180 
using a \cplus\ compiler (which you need to install separately). 
181  181  
182 
\item\textit{Platforms}: Windows and Linux.


182 
\item\textit{Platforms}: Windows, Linux and Mac.


183  183 
\end{itemize} 
184  184  
185  185 
\subsection{How to obtain \scAR} 
...  ...  
1250  1250 
(using ``Task Manager'' under Windows, and the command \texttt{top} under Linux) 
1251  1251 
to ensure that you do not exceed the available memory on your computer. 
1252  1252  
1253 
\subsection{Exploiting special model structure} 

1254 
If your model has special structure, such as 

1255 
grouped or nested random effects, statespace structure, 

1256 
crossed random effects or a general Markov strucure 

1257 
you will benefit greatly from using the techniques 

1258 
described in Section~\ref{separability} below. 

1259 
In this case the memory options in Table~\ref{tab:temporaryfiles} 

1260 
are less relevant (although sometimes useful), and instead the 

1261 
memory use can be controlled with the classical \ADM~command line options 

1262 
\texttt{cbs}, \texttt{gbs} etc. 

1263  
1253  1264 
\subsection{Limited memory Newton optimization} 
1254  1265 
\index{limited memory quasiNewton} 
1255  1266  
...  ...  
1261  1272 
is done using the command line argument \texttt{ilmn N}, where \texttt{N} is 
1262  1273 
the number of steps to keep. Typically, $\texttt{N} = 5$ is a good choice. 
1263  1274  
1264 
\chapter{Exploiting Separability}


1275 
\chapter{Exploiting special structure (Separability)}


1265  1276 
\label{separability} 
1266  
1267 
The following model classes: 

1277 
A model is said to be ``separable'' if the likelihood can be written 

1278 
as a product of terms, each involving only a small number of random effects. 

1279 
Not all models are separable, and for small toy examples (less than 50~random 

1280 
effects, say), we do not need to care about separability. You 

1281 
need to care about separability both to reduce memory requirements and 

1282 
computation time. Examples of separable models are 

1268  1283 
\begin{itemize} 
1269  1284 
\item Grouped or nested random effects 
1270  1285 
\item Statespace models 
1271  1286 
\item Crossed random effects 
1272  1287 
\item Latent Markov random fields 
1273  1288 
\end{itemize} 
1274 
share an important property: their ``Hessian'' is a sparse matrix. This enables


1275 
\scAR\ to do the calculations very efficiently. The Hessian~$H$ is defined as


1289 
The presence of separability allows \scAB\ to calculate the ``Hessian'' matrix


1290 
very efficiently.The Hessian~$H$ is defined as


1276  1291 
the (negative) Fisher information matrix (inverse covariance matrix) of the 
1277 
posterior distribution of the random~effects: 

1278 
\begin{equation} 

1279 
p(u \mid y) \propto p(y \mid u)\, p(u), 

1280 
\label{p(uy)} 

1281 
\end{equation} 

1282 
where $u$ are the random effects and~$y$ are the data. This definition is only 

1283 
exact if both~$u$ and~$y$ are Gaussian. More generally, $H$~is the Hessian 

1284 
matrix of the function~$\log\left[p(\cdot \mid y)\right]$. 

1292 
posterior distribution of the random~effects , and is a key component of the Laplace 

1293 
approximation. 

1285  1294  
1286 
That $H$ is sparse means that it contains mostly zeros. The actual sparsity 

1287 
pattern depends on the model type: 

1295 
How do we inform \scAR\ that the model is separable? We define \texttt{SEPARABLE\_FUNCTION}s 

1296 
in the \texttt{PROCEDURE\_SECTION} to specify the individual terms in the 

1297 
product that defines the likelihood function. Typically, a \texttt{SEPARABLE\_FUNCTION} 

1298 
is invoked many times, with a small subset of the random effects each time. 

1299  
1300 
For separable models the Hessian is a sparse matrix which means that it contains mostly zeros. 

1301 
Sparsity can be exploited by \scAB\ when manipulating the matrix $H$, 

1302 
such as calculating its determinant. The actual sparsity pattern depends on the model type: 

1288  1303 
\begin{itemize} 
1289  1304 
\item \textit{Grouped or nested random effects:} $H$ is block diagonal. 
1290  1305 
\item \textit{Statespace models:} $H$ is a banded matrix with a narrow band. 
...  ...  
1292  1307 
\item \textit{Latent Markov random fields:} often banded, but with a wide 
1293  1308 
band. 
1294  1309 
\end{itemize} 
1295 
\scAR\ should print out a message such as: 

1310 
For block diagonal and banded $H$, \scAR\ automatically will detect the structure from 

1311 
the \texttt{SEPARABLE\_FUNCTION} specification, and will print out a message such as: 

1296  1312 
\begin{lstlisting} 
1297  1313 
Block diagonal Hessian (Block size = 3) 
Also available in: Unified diff