## Revision 1000

140 140
  \item Hyper-parameters (variance components, etc.) estimated by maximum

141 141
  likelihood.

142 142

143
  \item Marginal likelihood evaluated by the Laplace approximation or importance

144
  sampling.

143
  \item Marginal likelihood evaluated by the Laplace approximation, (adaptive) importance

144
	sampling or Gauss-Hermite integration.

145 145

146 146
  \item Exact derivatives calculated using Automatic Differentiation.

147 147

......
179 179
  \item \textit{Compilation}: You turn your model into an executable program

180 180
  using a \cplus\ compiler (which you need to install separately).

181 181

182
  \item\textit{Platforms}: Windows and Linux.

182
  \item\textit{Platforms}: Windows, Linux and Mac.

183 183
\end{itemize}

184 184

185 185
\subsection{How to obtain \scAR}

......
1250 1250
(using Task Manager'' under Windows, and the command \texttt{top} under Linux)

1251 1251
to ensure that you do not exceed the available memory on your computer.

1252 1252

1253
\subsection{Exploiting special model structure}

1254
If your model has special structure, such as

1255
grouped or nested random effects, state-space structure,

1256
crossed random effects or a general Markov strucure

1257
you will benefit greatly from using the techniques

1258
described in Section~\ref{separability} below.

1259
In this case the memory options in Table~\ref{tab:temporary-files}

1260
are less relevant (although sometimes useful), and instead the

1261
memory use can be controlled with the classical \ADM~command line options

1262
\texttt{-cbs}, \texttt{-gbs} etc.

1263

1253 1264
\subsection{Limited memory Newton optimization}

1254 1265
\index{limited memory quasi-Newton}

1255 1266

......
1261 1272
is done using the command line argument \texttt{-ilmn N}, where \texttt{N} is

1262 1273
the number of steps to keep. Typically, $\texttt{N} = 5$ is a good choice.

1263 1274

1264
\chapter{Exploiting Separability}

1275
\chapter{Exploiting special structure (Separability)}

1265 1276
\label{separability}

1266

1267
The following model classes:

1277
A model is said to be separable'' if the likelihood can be written

1278
as a product of terms, each involving only a small number of random effects.

1279
Not all models are separable, and for small toy examples (less than 50~random

1280
effects, say), we do not need to care about separability. You

1281
need to care about separability both to reduce memory requirements and

1282
computation time. Examples of separable models are

1268 1283
\begin{itemize}

1269 1284
  \item Grouped or nested random effects

1270 1285
  \item State-space models

1271 1286
  \item Crossed random effects

1272 1287
  \item Latent Markov random fields

1273 1288
\end{itemize}

1274
share an important property: their Hessian'' is a sparse matrix. This enables

1275
\scAR\ to do the calculations very efficiently. The Hessian~$H$ is defined as

1289
The presence of separability allows \scAB\ to calculate the Hessian''  matrix

1290
very efficiently.The Hessian~$H$ is defined as

1276 1291
the (negative) Fisher information matrix (inverse covariance matrix) of the

1277
posterior distribution of the random~effects:

1278


1279
  p(u \mid y) \propto p(y \mid u)\, p(u),

1280
  \label{p(u|y)}

1281


1282
where $u$ are the random effects and~$y$ are the data. This definition is only

1283
exact if both~$u$ and~$y$ are Gaussian. More generally, $H$~is the Hessian

1284
matrix of the function~$\log\left[p(\cdot \mid y)\right]$.

1292
posterior distribution of the random~effects , and is a key component of the Laplace

1293
approximation.

1285 1294

1286
That $H$ is sparse means that it contains mostly zeros. The actual sparsity

1287
pattern depends on the model type:

1295
How do we inform \scAR\ that the model is separable? We define \texttt{SEPARABLE\_FUNCTION}s

1296
in the \texttt{PROCEDURE\_SECTION} to specify the individual terms in the

1297
product that defines the likelihood function. Typically, a \texttt{SEPARABLE\_FUNCTION}

1298
is invoked many times, with a small subset of the random effects each time.

1299

1300
For separable models the Hessian is a sparse matrix which means that it contains mostly zeros.

1301
Sparsity can be exploited by \scAB\ when manipulating the matrix $H$,

1302
such as calculating its determinant. The actual sparsity pattern depends on the model type:

1288 1303
\begin{itemize}

1289 1304
  \item \textit{Grouped or nested random effects:} $H$ is block diagonal.

1290 1305
   \item \textit{State-space models:} $H$ is a banded matrix with a narrow band.

......
1292 1307
   \item \textit{Latent Markov random fields:} often banded, but with a wide

1293 1308
   band.

1294 1309
\end{itemize}

1295
\scAR\ should print out a message such as:

1310
For block diagonal and banded $H$, \scAR\ automatically will detect the structure from

1311
the \texttt{SEPARABLE\_FUNCTION} specification, and will print out a  message such as:

1296 1312
\begin{lstlisting}

1297 1313
  Block diagonal Hessian (Block size = 3)

... This diff was truncated because it exceeds the maximum size that can be displayed.

Also available in: Unified diff