% $Id: autodif.tex 207 2011-12-18 18:11:16Z jsibert $
\documentclass{admbmanual}
\makeatletter\@twosidefalse\makeatother
\hypersetup{urlcolor=black}
% Some local definitions.
%
% Do not use a blank line between the question and the answer.
\newcommand{\question}[1]{\bigskip\noindent{\bf #1?}\par\medskip\noindent}
\newcommand\nr{{Newton-Raphson}}
% Some legacy shortcuts.
\newcommand\pd[2]{\frac{\partial #1}{\partial #2}}
\newcommand\bs{$\backslash$}
\newcommand\adtypes{\texttt{dvariable}, \texttt{dvar\_array},
\texttt{dvar\_matrix}, \texttt{dvector}, and \texttt{dmatrix}}
% Set the plot symbol size for PicTeX pictures.
\renewcommand\setplotsymbol{\fiverm .}
% Make a table with commands followed by their descriptions.
\newenvironment{commandtable}
{
\par
\begin{tabular}%
{@{\vrule height 12pt depth7pt width0pt}@{\tt-}l @{\rm\quad} l}
}
% Use a & to separate command from description. End line with a \\
{
\end{tabular}
\medskip
}
% Make a table with days of the week followed by up to 10 columns of numbers
% vertically aligned on their decimal points. Wanted it in place, so don't use
% the table environment.
\newenvironment{daysdecimaltable}
{
%\begin{table}[!h]
\par
\medskip
\begin{tabular}{l @{\qquad} *{10}{r@{}l} }
}
% Usage: (have to put decimal in, because some numbers don't have them.
% Monday & 25&.2 & -41 & & 433 & .12 & 10 & .11 & 23& \\
% Tuesday & 34&.21 & 356 & & 23 & & 23 & .1 & 5& \\
{
\end{tabular}
\bigskip
\par
% \end{table}
}
% Title definition.
\title{%
\largetitlepart{AUTODIF}
\smalltitlepart{%
A \cplus\ Array Language Extension\\
with Automatic Differentiation\\
for Use in Nonlinear Modeling and Statistics}
\vspace{4ex}\textsf{\textit{Version 10.0~~(2011-01-18)}}\vspace{3ex}
}
% Author definition.
\author{\textsf{\textit{David Fournier}}}
% Should be U/lc.
\manualname{Autodif}
% We want an index.
\makeindex
% Start the thing.
\begin{document}
% We want a title page.
\maketitle
~\vfill
\noindent ADMB Foundation, Honolulu.\\\\
\noindent This is the manual for AUTODIF version 10.0.\\\\
\noindent Copyright \copyright\ 1991, 1992, 1993, 1994, 1996, 2000, 2008, 2011
David Fournier\\\\
\noindent The latest edition of the manual is available at:\\
\url{http://admb-project.org/documentation/manuals/admb-user-manuals}
% We want a table of contents.
\tableofcontents
\chapter{Introduction}
\label{ch:introduction}
\X{dvariable@\texttt{dvariable}}
\X{dvar\_vector@\texttt{dvar\_vector}}
\question{What is \scAD}
\scAD\ is an array language extension to \cplus\ which allows the user to
automatically calculate the partial derivatives of a function of many
independent variables by simply redeclaring the usual C floating point types
doubles and arrays of doubles to be the corresponding \scAD\ type:
\texttt{dvariable}, and \texttt{dvar\_vector}. The user can write the code for
the function using any valid \cplus\ syntax. \scAD\ does not impose any
restrictions on allowable code constructions. Using \scAD, the programmer can
manipulate vectors and matrices as single objects in complicated mathematical
calculations, and at the same time, obtain derivatives of the resulting function
with respect to the independent variables.
When calculating derivatives, there is no restriction on calling user-supplied
functions, which may themselves call other functions---so long as the types have
been redefined to be the corresponding \scAD\ types. Since \cplus\ is upwardly
compatible with \textsc{ansi C}, existing numerical routines written in C can be
easily modified to use \scAD.
\question{Who should use \scAD}
Anyone who would like to have the derivatives of a complicated function computed
automatically without any additional programming effort will find \scAD\ useful.
It is particularly useful for optimization problems involving a differentiable
function of many independent variables. Such problems occur, for example, in
nonlinear statistical modeling, i.e., in nonlinear parameter estimation, in
sensitivity analysis, and in finding roots of systems of equations.
\question{Why is automatic differentiation better than estimating the
derivatives by finite differences}
\X{automatic differentiation}%
There are two reasons why the \scAD\ method is better than finite differences
for calculating partial derivatives.
\XX{accuracy}{of derivative calculations}
\begin{enumerate}
\item With \scAD, the derivatives are calculated to the limit of machine
accuracy, that is, to the same degree of accuracy as the function itself. The
derivatives are obtained as accurately as they would be if an analytical
expression for the derivatives were evaluated on the computer. With finite
differences, the accuracy of the derivative calculations seldom approaches
that of the limit of the machine accuracy for calculating the derivatives.
Consequently, optimization schemes based on automatic differentiation tend to
be more stable and perform better than those that use finite difference
approximations.
\item To obtain estimates of the partial derivatives by finite difference
approximations, the amount of calculation necessary is proportional to the
number of independent variables times the amount of time needed to calculate
the function itself. With \scAD, the amount of calculation is less than 5
times the amount of calculation needed to calculate the function itself
\cite{griewankcorliss1991}. In particular, the computing time does not depend
on the number of independent variables. This makes \scAD\ particularly
attractive for calculating derivatives for functions with many independent
variables.
\end{enumerate}
\XX{speed}{of calculations}
\question{Is it necessary to learn \cplus\ before one can use \scAD}
Any experienced C programmer can learn to use \scAD\ in a few hours. While it is
necessary to have a \cplus\ compiler to use \scAD, only a small number of extra
concepts besides those used in the C language need to be learned. This is due to
the fact that \cplus\ is essentially a superset of C, so valid
\textsc{ansi}-compliant C code needs only a very small amount of modification to
be converted to valid \cplus\ code.
We feel that the best way to get a feel for how \scAD\ is used is to study a
series of examples. These examples have been chosen to illustrate the use of
\scAD\ on problems in different fields from engineering to statistical analysis.
\question{How does automatic differentiation work}
Every computer program for calculating a function value, no matter how
complicated, can be broken down into a series of binary and unary mathematical
operations. Automatic differentiation is simply the chain rule from elementary
calculus applied to this series. The following simple example illustrates the
chain rule:
\begin{align}\label{eq:introduction-01}
\nonumber u&=x^2+y^2\\
v&=x+3y\\
\nonumber f&=\sin(u)+\exp(v)
\end{align}
We want to calculate the derivatives of the function $f$ with respect to the
variables~$x$ and~$y$. By the chain rule,
\begin{equation*}
\pd fx=\pd fu\pd ux + \pd fv\pd vx\qquad\qquad
\pd fy=\pd fu\pd uy + \pd fv\pd vy
\end{equation*}
Now
\begin{gather}\label{eq:introduction-02}
\pd fu=\cos(u) \qquad\qquad\qquad\qquad \pd fv=\exp(v)\\[6pt]
\pd ux=2x \qquad \pd vx=1 \qquad\qquad \pd uy=2y\qquad\pd vy=3
\end{gather}
\noindent such that
\begin{equation*}
\pd fx=\cos(u)*2x+\exp(v)*1\qquad\qquad\pd fy=\cos(u)*2y+\exp(v)*3
\end{equation*}
That is really all there is to it! Your program is broken down into a series of
simple operations like these so that the derivatives can be calculated. The
important point is that no matter how complicated the mathematical expressions
are, they can be thought of as consisting of a long series of binary and unary
operations.
\question{What is adjoint (precompiled) derivative code and why is it so
important} While the reverse mode of automatic differentiation is extremely
efficient in terms of the number of calculations required to compute
derivatives, the method requires a large amount of temporary storage. Each
arithmetic operation used to calculate the original function generates 28--30
bytes of temporary storage. Consider the fact that inverting a 100-by-100 matrix
requires $10^6$ operations. This would generate 28--30 megabytes of temporary
storage requirements. Clearly, it would be impractical to carry out large
calculations under these conditions. \X{precompiled derivative code}
The use of precompiled derivative code combines the best of two worlds, so to
speak. The small number of arithmetic operations required by the reverse mode
for calculating derivatives is retained while, at the same time, the amount of
temporary storage required is greatly reduced. The amount of storage required
for inverting a 100-by-100 matrix inverse is reduced to about 500K. Even better,
the amount of temporary storage requirement generated by adding two matrices is
about 32 bytes, no matter how large the matrices are.
Precompiled derivative code means that the actual derivative calculations are
written in \cplus\ functions. Following object oriented design considerations,
the precompiled code is completely ``encapsulated'' so that it is invisible to
the user. This means that it makes no difference to the user whether or not a
particular function has a precompiled derivative code component written for it.
At the user level, everything looks the same---except that performance is
greatly enhanced. The use of precompiled derivative code produces the ultimate
in reuseable code. For example, writing good code for the derivative of the
inverse of a matrix is not a trivial procedure. However, once it has been
written, any user of \scAD\ can take advantage of it simply by taking the
inverse of a matrix object with code like
\begin{lstlisting}
bbM1=inv(M); // M1 and M are dvar_matrix objects
\end{lstlisting}
\scAD\ combines array and matrix classes with the use of precompiled derivative
code for common array and matrix calculations and assignments to produce a
superior environment for efficient derivative calculations.
\section{A simple example of a derivative calculation}
While \scAD\ has been supplied with all the computational power described above,
it has been designed so that, at the user level, it appears like enhanced C
code, or perhaps as enhanced C code with an additional array language. Before
illustrating some of \scAD's array language facilities, we will begin with a
simple C~code example.
The following C code for the function \texttt{fcomp}
calculates the sum of squares
\begin{equation*}
\sum_{i=0}^{n-1} (x_i-1)^2
\end{equation*}
This code has been written in C without using any \cplus\ or \scAD\ extensions,
to show how C code can be modified to use \scAD\ for calculating derivatives.
\begin{lstlisting}
#include
double fcomp(int n,double * x)
{
double z;
double tmp;
int i;
z=0;
for (i=0;i} has been included. The file
\texttt{fvar.hpp} contains information about \scAD\ for the \cplus\ compiler.
\item The array of variables \texttt{x} has been redefined to be of type
\texttt{dvar\_vector}. This \scAD\ type is like an array of doubles for which
derivative information is collected. Besides providing for the calculation of
derivatives, the use of the \texttt{dvar\_vector} class for \texttt{x}
provides an additional improvement. With the original C code, there was no way
of determining in the subroutine \texttt{fcomp} whether or not enough memory
had actually been allocated to the array~\texttt{x}. \scAD\ provides optional
index bounds checking for all container classes, such as
\texttt{dvar\_arrays}. This greatly speeds up program development.
\item The variables \texttt{z} and \texttt{tmp} have been declared to be of
type \texttt{dvariable}. The \texttt{dvariable} type is the fundamental type
for automatic differentiation. It corresponds to the C floating point type
\texttt{double}.
\item The function \texttt{value} has been included in the line
\texttt{return(value(z));} The function \texttt{value()} returns the numerical
part of an \texttt{dvariable}. This is necessary to convert the
\texttt{dvariable} to a \texttt{double} so that the value can be returned and
used in subsequent calculations (such as in a function minimizer) without
generating any more derivative calculations.
\item The range of the index of the \texttt{for} loop has been changed to
begin at 1 and include \texttt{n}. While this change is not strictly necessary
for this example, \scAD\ allows the user to declare arrays that have arbitrary
valid index ranges. This facility often contributes to the production of
simpler looking code.
\end{enumerate}
\section{Interfacing with the rest of the program}
The form of the interface of the function \texttt{fcomp} with the rest of the
program will depend on the user's application. This section illustrates two
applications. The first example shows how to simply calculate the derivatives so
they can be used for any other purpose whatsoever. The second example shows how
to invoke a function minimizer, supplied with \scAD, to find the minimum of the
function \texttt{fcomp}. In any event, it is necessary to identify the
independent variables for \scAD. (These are the variables for which the partial
derivatives are calculated.) The independent variables are identified by
declaring them to be of type \texttt{independent\_variables}.
\X{independent variables}
The declaration \texttt{independent\_variables x(1,nvar)} creates an array of 20
independent variables and identifies the independent variables for \scAD. An
object of type \texttt{gradient\_structure} must be declared and remain in scope
until all the derivative calculations have been finished.
\XX{minimization}{of a function}
\X{gradient\_structure@\texttt{gradient\_structure}}
\begin{lstlisting}
#include
#include
double fcomp(dvar_vector); // Function prototype declaration
void main()
{
double f;
int i;
int nvar=20;
independent_variables x(1,nvar); // Identify the independent variables
dvector g(1,nvar); // Holds the vector of partial derivatives (gradient)
gradient_structure gs; // Must declare this structure to manage
// derivative calculations
f=fcomp(nvar,x);
gradcalc(nvar,g); // The derivatives are calculated
cout <<" The gradient vector is\n"<
#include
double fcomp(dvar_vector); // Function prototype declaration
void main()
{
int nvar=20; // This is the number of independent variables
independent_variables x(1,nvar); // these are the independent variables
double f; // This is the dependent variable
dvector g(1,nvar); // Holds the vector of partial derivatives (gradient)
fmm fmc(nvar); // Create structure to manage minimization
// Expansion of BEGIN_MINIMIZATION(nvar,x,f,g,fmc) macro
gradient_structure gs;
while (fmc.ireturn >= 0) // Begin loop for minimization
{
fmc.fmin(f,x,g); // Calls the minimization routine
if (fmc.ireturn > 0) // Loop for evaluating the function and
{ // derivatives
// End of the Expansion of BEGIN_MINIMIZATION macro
f=fcomp(nvar,x);
// Expansion of END_MINIMIZATION(nvar,g) macro
gradcalc(nvar,g);
}
}
// End of the Expansion of END_MINIMIZATION macro
cout << " The minimizing values are\n" << x << "\n"; // Print out answer
}
\end{lstlisting}
The minimization routine is controlled by the object \texttt{fmc} of type
\texttt{fmm}. Different aspects of the minimization---such as the maximum number
of function evaluations before stopping, convergence criteria, and printing
options---can be adjusted by the user through this object. See
Chapter~\ref{ch:function-minimizing}, on the function minimizers, for more
details. The flow through the loops is controlled by the
\texttt{integer fmc.ireturn}.
If you wish to use the conjugate gradient function minimizer instead of the
quasi-Newton function minimizer, you should declare the object \texttt{fmc} to
be of type \texttt{fmmc} rather than type \texttt{fmm}. Nothing else needs to be
changed.
\XX{function minimizer}{conjugate gradient}
\X{conjugate gradient function minimizer}
\section{Using \scAD's built in\br vector and matrix calculations}
\XX{overloading}{of functions in \cplus}
An important feature of \cplus\ is the ability to overload of arithmetic
operations, i.e., extending their definition to user-defined types. For example,
the plus operation (\texttt{+}) can be extended to user-defined string objects,
so if \texttt{string1} and \texttt{string2} are two string objects, then
\begin{lstlisting}
string1 + string2
\end{lstlisting}
produces a string object that contains the concatenation of \texttt{string1} and
\texttt{string2}. \scAD\ uses this feature to extend the operations of addition,
subtraction, multiplication, and division to various combinations of matrices,
vectors, and numbers. As a result, the matrices and vectors can be employed as
objects in complex mathematical calculations while obtaining derivatives. This
greatly simplifies and speeds up code development. Addition and multiplication
of vectors and matrices appear in many engineering and statistical applications.
\XX{bounds checking}{for container objects}
\X{safe arrays}
This example introduces two new types: the \texttt{dvector} and the
\texttt{dmatrix}. These are the constant analogues of the \texttt{dvar\_vector}
and \texttt{dvar\_matrix} types. They are ``safe array'' objects that contain
their own size information, and for which optional array bounds checking is
provided. The memory for all these objects is managed by \scAD\ in a fashion
that is transparent to the user. \scAD\ can combine these constant and variable
objects automatically so, for example, if \texttt{A} is a \texttt{dmatrix} and
\texttt{M} is a \texttt{dvar\_matrix}, the expression \texttt{A*M} in the user
code will produce an \texttt{dvar\_matrix} that is the product of \texttt{A} and
\texttt{M}. At the same time, the array bounds on \texttt{A} and \texttt{M} will
be checked to ensure that the operation is legal for these two objects.
\X{dvector@\texttt{dvector}}
\X{dvar\_vector@\texttt{dvar\_vector}}
\X{dmatrix@\texttt{dmatrix}}
\X{dvar\_matrix@\texttt{dvar\_matrix}}
To illustrate these vector and matrix operations, we have taken the following
example from~\cite{griewank1988}. The use of a cubic equation of state yields
the Hemholtz energy of a mixed fluid of unit volume at the absolute temperature
$T$ as
\begin{equation*}
f(x)=RT\sum_{i=1}^n x_i \log\left(\frac{x_i}{1-b^Tx}\right)
-\frac{x^TAx}{\sqrt{8}b^Tx}
\,\log\left(%
\frac{1+\big(1+\sqrt{2}\big)b^Tx}%
{1+\big(1-\sqrt{2}\big)b^Tx}\right)
\end{equation*}
where $R$ is the universal gas constant and
\begin{equation*}
0\le x,b \in {\mbox{\bf R}}^n\qquad
A=A^T\in {\mbox{\bf R}}^{n\times n}
\end{equation*}
During the simulation of an oil reservoir, this function and its derivatives
must be calculated at thousands of points in space and time.
In this example, there are two new applications of the overloaded multiplication
operator (\texttt{*}). $b^Tx$ denotes the dot product of the $n$-dimensional
vector~$b$ with the $n$-dimensional vector~$x$, $Ax$~denotes the multiplication
of the $n$-dimensional vector~$x$ by the $n\times n$~matrix~$A$, and $x^T Ax$ is
the result of the dot product of the vector~$x$ with the vector~$Ax$. Here is
the code for the function that calculates the Hemholtz energy using the
overloaded operators.
\input hem_func.cpp
\X{overloaded operators}
All the vector and matrix multiplications are carried out using the overloaded
operators \texttt{*}, \texttt{+}, and \texttt{-}, which enable the code to be
developed extremely quickly. Notice that multiplication~(using \texttt{*}) can
be carried out for any combination of \texttt{dvector}, \texttt{dmatrix},
\texttt{dvar\_vector}, and \texttt{dvar\_matrix}---so long as the operation
makes sense mathematically. \scAD\ can combine the constant types
\texttt{double}, \texttt{dmatrix}, and \texttt{dvector} with the variable types
\texttt{dvariable}, \texttt{dvar\_vector}, and \texttt{dvar\_matrix} in a manner
transparent to the user.
Here is the code that calls the function \texttt{hemholtz\_energy}.
\input hemholtz.cpp
\section{The log-likelihood function for a\br multivariate normal distribution}
As a further example of how to use \scAD's vector and matrix calculations,
consider the problem of estimating the parameters for a multivariate normal
model when the means and covariance matrix depend on a vector of common
parameters $\Theta$. Let $Y$ be a vector of observations $(y_1, y_2,\ldots,y_n)$
that satisfies the relationship
\begin{equation*}
Y = A(\Theta)X + \epsilon
\end{equation*}
where $X$ is a vector $(x_1, x_2,\ldots,x_m)$ of controls, $A(\Theta)$ is a
$n\times m$ matrix depending on a parameter vector
$\Theta=(\theta_1,\ldots,\theta_p)$, and $\epsilon=(\epsilon_1,
\epsilon_2,\ldots,\epsilon_n)$ is a random vector with a multivariate normal
distribution, with mean vector $0$ and covariance matrix $\Sigma(\Theta)$.
The log-likelihood function for the parameters $\Theta$ is given by
\begin{equation*}
-0.5 \ln\Big(\mbox{\rm det}\big(\Sigma(\Theta)\big)\Big)-
0.5\big(Y-A(\Theta)X\big)^T
\,\Sigma(\Theta)^{-1}
\big(Y-A(\Theta)X\big)
\end{equation*}
The maximum likelihood estimates for the parameters are found by maximizing the
log-likelihood function with respect to $\Theta$. This is a nonlinear
optimization problem. The best methods for solving such a problem require the
derivatives of the log-likelihood function with respect to the parameter vector
$\Theta$. Using \scAD's vector and matrix classes, the code for calculating the
log-likelihood function and obtaining the derivatives can be written as follows.
(Actually, there are even more efficient methods of coding these calculations
with \scAD.)
\XX{multivariate normal distribution}{log-likelihood function}
\begin{lstlisting}
dvariable log_likelihood(dvector x,dvar_vector Y,dvar_matrix Sigma,
dvar_matrix A)
// It is assumed that Sigma and A have already been calculated by
// other routines in terms of the parameters Theta
{
dvariable f;
dvar_vector diff=Y-A*x;
f=-0.5*log(det(Sigma))-.5*diff*inv(Sigma)*diff;
return f;
}
\end{lstlisting}
When this code is executed, the information necessary for calculating the
derivatives is automatically generated without further intervention by the user.
In addition, optional array bounds checking is provided to ensure that all
vector and matrix objects have the right dimensions for the operations
performed.
\question{What are ragged arrays} Traditional programming languages often used
for numerically intensive applications---such as \textsc{fortran}---only support
``flat'' data structures such as matrices. The natural data structures for many
applications simply do not come in this simple form. \scAD\ has extended these
simple forms to ragged matrices and ragged 3-dimensional arrays. A ragged matrix
is a 2-dimensional array where the rows of the array can contain a different
number of elements and have different valid index ranges. A ragged 3-dimensional
array is made up of matrices of different sizes, and these matrices can
themselves be ragged matrices.
Such ragged objects can be used to create many useful data structures. For
example, the ``weights'' in a feed-forward neural network can be best described
by a ragged 3-dimensional array. This can be used to provide an extremely
compact description of the neural network. See Chapter~\ref{ch:neural-networks},
on neural networks, for more details.
Since \cplus\ and \scAD\ are extensible, there is in principle no limit to the
complexity of the data structures that can be developed. This means that you
will never hit a ``dead end'' when developing an application.
%xx \htmlnewfile
\chapter{Getting Started}
\section{System requirements: PC implementations}
\X{system requirements}
\X{hardware}
\scAD\ is designed to operate on 80286, 80386, and 80486-based microcomputers
with a full 640 Kb main memory running under the \textsc{ms-dos} operating
system version 3.2 or higher. A hard disk is not required. However, \scAD\ may
generate several Mb of gradient information to be stored on an external device.
A hard disk and a \textsc{ram} disk are highly recommended for greatest speed.
The \scAD\ libraries have been compiled with either in-line code for a numeric
coprocessor or with code to emulate a numeric coprocessor. A numeric coprocessor
is also highly recommended for greatest speed. The optimum configuration would
include a numeric coprocessor, plus a hard disk and \textsc{ram} disk with at
least 2 Mb free memory, for storing the temporary files created by \scAD. (Large
applications may require even more disk memory.) \scAD\ appears to be compatible
with most popular extended memory managers and multitasking systems.
\X{libraries}
The \scAD\ system consists of a header file \texttt{fvar.hpp} and two or more
libraries. There is a safe library, which provides bounds checking and
initialization of arrays. This library should be used for code development. In
addition, there is an optimized library for faster execution of production
programs. Check the \texttt{read.me} file on your \scAD\ distribution disk for
the names and number of libraries supplied with your version of \scAD.
\section{System requirements: other implementations}
\scAD\ has been ported to a number of other platforms. The necessary information
for each specific platform should be contained in a \texttt{read.me} file.
\section{Installation}
The simplest way to install the \scAD\ system is to place the header file in the
directory where your compiler normally searches for header files, and the
libraries in the directory where your linker normally searches for libraries. If
you choose this option, there is no need to specify additional paths for the
compiler and linker to search for header files and libraries. Alternatively, you
may wish to install the \scAD\ system in its own directory. In this case, you
will need to tell the compiler and linker where to look for the header file and
libraries.
An installation program is included on the distribution disk. This program is
invoked by typing \texttt{a:install} at the \textsc{dos} prompt (assuming the
distribution disk is in drive \texttt{a:}). You will then be asked for the drive
of the source disk (in this case, \texttt{a:}) and the drive where you want to
install \scAD. Next, you will be asked for the directories in which to install
the \scAD\ header file, libraries, and examples. If your \cplus\ compiler is
located in \texttt{c:\bs cc}, you might want to specify \texttt{c:} as the
destination drive and \texttt{\bs cc\bs include} and \texttt{\bs cc\bs lib} as
the directories into which to install the include file and library files,
respectively. If you press \texttt{Enter} for any of these directories, the
corresponding files will be installed in a directory called \texttt{AUTODIF} on
the destination drive.
\textit{Be sure to read the} \texttt{read.me} \textit{file for last minute
changes and compiler-specific information.}
\section{Setting up the Borland compilers for \scAD}
\X{Borland}
\XX{IDE@\textsc{ide}}{Borland}
\X{memory model}
To compile \scAD\ applications with the integrated development environment
(\textsc{ide}), the following options must be selected within \textsc{ide}:
\begin{itemize}
\item Select {\bf O}ptions {\bf C}ompiler {\bf C}ode Generation. Choose {\bf
L}arge memory model. (Optional: select Pre-c{\bf o}mpiled headers if you
wish to speed up compilation.) Select Mo{\bf r}e for Advanced Code generation.
Select 8028{\bf 7} to generate direct 80287 inline code or {\bf E}mulation to
use numeric coprocessor emulation. (\textit{Optional:} Select {\bf D}ebug info
in \textsc{obj}s if you intend to use the debugger.)
\item (\textit{Optional:} If you intend to generate an overlay executable,
select {\bf O}ptions {\bf C}ompiler {\bf E}ntry/Exit Code \textsc{dos}
{\bf o}verlay.)
\item Select {\bf O}ptions {\bf D}irectories. If you have not stored your
\scAD\ header file and library files in the default directories, enter under
{\bf I}nclude Directories the path where you have stored the \scAD\ header
file (\texttt{fvar.hpp}), and under {\bf L}ibrary Directories the path where
you have stored the \scAD\ libraries.
\item Select {\bf P}roject {\bf O}pen to create a project file. Include your
source files and the \scAD\ library that you wish to use.
\end{itemize}
\X{make}
To compile \scAD\ applications using a makefile, the \texttt{bcc} or
\texttt{bccx} command-line compilers and \texttt{tlink} or \texttt{tlinkx}
linkers must be used.
\XX{compiler options}{Borland}
\X{memory model}
You should invoke the following compiler and linker options (see the file
\texttt{makefile} supplied with the examples):
\begin{commandtable}
I & followed by the path of your \scAD\ include file.\\
ml & to specify the large memory model. \\
L & followed by the path of your \scAD\ libraries.\\
c & to suppress linking after compilation.\\
v & if you want to use the Borland debugger.\\
f287 & to use inline 80287 instructions.\\
\end{commandtable}
\XX{linker options}{Borland}
You should invoke the \texttt{/c} and \texttt{/v} linker options if you are
using debugging.
\XX{linking}{Borland linkers}
The following files must be linked in the following order (see the file
\texttt{makefile} supplied with the examples):
\begin{enumerate}
\item \texttt{c0l.obj} startup module for the large memory model.
\item Your source code \texttt{.obj} files.
\item \texttt{graphics.lib}, if you have a graphic application.
\item \texttt{ado7.lib} or \texttt{ade.lib}.
\item \texttt{fp87.lib} for the inline floating-point library or
\texttt{emu.lib} for the emulation library.
\item \texttt{mathl.lib}, the math library for the large memory model.
\item \texttt{cl.lib}, the run-time library for the large memory model.
\end{enumerate}
\section{Setting up the Zortech compilers for \scAD}
\X{Zortech}
\XX{IDE@\textsc{ide}}{Zortech}
To compile \scAD\ applications with the integrated development environment
\textsc{ide}), the following options must be selected within \textsc{ide}:
\begin{itemize}
\item Select {\bf C}ompile Com{\bf p}ile Options. Select {\bf M}emory Models
L{\bf a}rge to select the large memory model. Select Object {\bf C}ode Inline
808{\bf 7} to generate direct inline 8087 code.
\item Append the directories where you have stored the \scAD\ header file and
the \scAD\ libraries to the environment strings \texttt{INCLUDE} and
\texttt{LIB}, as discussed in the compiler documentation.
\end{itemize}
\XX{compiler options}{Zortech}
\X{memory model}
You should invoke the following compiler and linker options (see the file
\texttt{makefile} supplied with the examples):
\begin{commandtable}
I & followed by the path of your \scAD\ include file.\\
ml & to specify the large memory model.\\
c & to suppress linking after compilation.\\
g & if you want to use the Zortech debugger.\\
f & if you want to generate in-line 8087 code.\\
\end{commandtable}
\section{Examples}
\X{examples}
Many of the examples in this manual are included on the distribution
%disketted fixed
as a ``self-expanding'' archive program \texttt{examples.exe}. When this program
is executed, it expands into several files. The installation program will
install the examples in the directory you specify. A \texttt{makefile} is
included to simplify compilation of the examples. Read the file
\texttt{examples.doc} for a more complete description of the examples.
%xx \htmlnewfile
\chapter{The \scAD\ Classes}
\X{\fontindexentry{tt}{double}}
\X{\fontindexentry{tt}{float}}
\X{\fontindexentry{tt}{dvariable}}
\X{\fontindexentry{tt}{dvector}}
\X{\fontindexentry{tt}{dvar\_vector}}
\X{constant objects}
\X{container classes}
\X{variable objects}
\X{\fontindexentry{tt}{dmatrix}}
\X{\fontindexentry{tt}{dvar\_matrix}}
\X{\fontindexentry{tt}{d3array}}
\X{\fontindexentry{tt}{dvar3\_array}}
\X{\fontindexentry{tt}{long int}}
\X{\fontindexentry{tt}{ivector}}
\X{\fontindexentry{tt}{lvector}}
\X{\fontindexentry{tt}{imatrix}}
\X{\fontindexentry{tt}{lmatrix}}
\X{\fontindexentry{tt}{int}}
The \scAD\ floating point container classes have been designed to facilitate the
calculation of derivatives. As in differential calculus, they reflect the main
dichotomy between constant objects for which no derivative calculations are
required, and variable objects for which derivative information is generated.
Both constant and variable objects can be further classified either as single
objects (a number), or 1-dimensional arrays, 2-dimensional arrays, and so on.
The constant and variable versions of each array type (container class) have
been designed to appear as similar as possible, so that identical code can be
written for calculations involving either constant or variable objects. See
Table~\ref{tab:types}.
\begin{table}[!h]
\begin{center}
\begin{tabular}{@{\vrule height 12pt depth7pt width0pt} l | c | c }
\hline
& \bf Constant Type & \bf Variable Type \\
\hline
Single element&\texttt{double}, \texttt{float}&\texttt{dvariable} \\
1-dimensional array&\texttt{dvector}&\texttt{dvar\_vector} \\
2-dimensional array&\texttt{dmatrix}&\texttt{dvar\_matrix} \\
3-dimensional array&\texttt{d3array}&\texttt{dvar3\_array} \\[3pt]
\hline
\end{tabular}
\end{center}
\emptycaption
\label{tab:types}
\end{table}
\scAD\ also has 1- and 2-dimensional container classes for integers. See
Table~\ref{tab:integer-container-classes}.
\begin{table}[!h]
\begin{center}
\begin{tabular}{@{\vrule height 12pt depth7pt width0pt} l | l }
\hline
Single element&\texttt{int}, \texttt{long int} \\
1-dimensional array&\texttt{ivector}, \texttt{lvector} \\
2-dimensional array&\texttt{imatrix}, \texttt{lmatrix} \\
\hline
\end{tabular}
\end{center}
\emptycaption
\label{tab:integer-container-classes}
\end{table}
The \texttt{ivector} and \texttt{lvector} classes are arrays of integers of type
\texttt{int} and \texttt{long int}. The \texttt{imatrix} and \texttt{lmatrix}
classes are 2-dimensional arrays of integers of type \texttt{int} and
\texttt{long~int}.
\section{The \texttt{dvariable} class}
Objects of the \texttt{dvariable} class are like numbers (variables). When they
are used in calculations, derivative information is generated. The value of a
\texttt{dvariable} is an object of type \texttt{double}. All the standard C
arithmetic and logical functions have been defined for type \texttt{dvariable}.
A \texttt{dvariable~z} is created by this declaration:
\begin{lstlisting}
dvariable z;
\end{lstlisting}
Since the \texttt{dvariable} class is designed to appear to the user like a
\texttt{double}, it is not necessary to discuss many of the operations that can
be performed. They appear to be simply the same as those performed on an object
of type \texttt{double}. There is, however, an important difference between the
behavior of a \texttt{dvariable} and that of a \texttt{double} under a copy
operation. A copy occurs, for example, when an object is passed as an argument
to a function (so long as the function has not been declared to take a
reference). Suppose that the function \texttt{func} has been declared
\X{copy operation}
\begin{lstlisting}
void func(dvariable)
\end{lstlisting}
so that \texttt{func} is a function that takes one argument of type
\texttt{dvariable}, and a portion of the user's code looks like
\begin{lstlisting}
dvariable v;
// ...
func(v);
\end{lstlisting}
\XX{constructor}{copy initializer}
\X{\fontindexentry{tt}{dvariable(dvariable\&)}}
When the function \texttt{func(v)} is called, the copy initializer constructor
will be called first, in order to construct a copy of \texttt{v} and place that
on the stack before passing control to \texttt{func}. For previous versions of
\scAD\ (before Version 1.05), the copy initializer performed a shallow
copy---that is it passed a pointer to the component of the \texttt{dvariable}
containing the \texttt{double}. As a result, the \texttt{dvariable} in the
function \texttt{func} and the \texttt{dvariable v} in the calling routine both
point to the same \texttt{double}, so changes to the value of the
\texttt{dvariable} in \texttt{func} will affect \texttt{v} even though
\texttt{v} was passed by value. To get better compatibility with the behavior of
C code (and the behavior of a \texttt{double}) for Version 1.05 or later, the
\texttt{dvariable} copy initializer constructor has been modified so that it
produces a distinct copy. This could cause a problem for old \scAD\ code. If a
\texttt{dvariable} has been passed by value to a function and its value modified
in the body of the function, it used to be the case that the value of the
\texttt{dvariable} in the calling function would be changed as well. This is no
longer the case. Care must be taken to modify old code where necessary.
\XX{copy operation}{shallow}
%
% \XX{distinct copy}{using assignment operator = to create}
% \XX{distinct copy}{of a dvariable}
% This section applies only to versions of \scAD\ with
% version number less than 1.05.
% For older versions of \scAD\ (version number less than 1.05)
% when it is necessary to create a distinct copy of a \texttt{dvariable}
% the assignment operator~``\texttt{=}'' should be used.
% To create a distinct copy of \texttt{v} whose value is not affected by
% the function \texttt{func} the preceding code could be modified to
% \begin{lstlisting}
% dvariable v;
% dvariable vcopy;
% vcopy=v; // make a distinct copy of v to pass to the function func
% // ...
% func(vcopy);
% \end{lstlisting}
% \noindent
% {\bf It is important to realize that the following simpler code will not
% accomplish the objective of creating a distinct copy of \texttt{v}.}
% \begin{lstlisting}
% dvariable v; // Not the right way to make a distinct copy
% dvariable vcopy=v; // This will NOT make a distinct copy
% // ...
% func(vcopy);
% \end{lstlisting}
% \X{initialization}\XX{copy initializer constructor}{for a dvariable}
% \noindent The problem is that while the code
% \texttt{dvariable vcopy=v;} appears to be an assignment, it is in fact
% interpreted by the compiler as an initialization
% and will cause the copy initializer constructor to be invoked.
% Thus a distinct copy of v is not created and the problem persists.
%
Questions about distinct copies and the form of the copy initializer also arise
with container classes, such as the \scAD\ vector and matrix classes. This is
discussed below for the \texttt{dvar\_vector} class.
There is one other operation on \texttt{dvariable}s that has no counterpart for
\texttt{double}s: the
\X{\fontindexentry{tt}{value} function}
\texttt{value} function:
\begin{lstlisting}
double value(dvariable&);
\end{lstlisting}
\noindent This function returns a \texttt{double}, which is the value of the
\texttt{dvariable}. The main use of this function is to pass to some other part
of the user's program the value of a some computation for which derivative
information has been calculated. That way, the value can be used without
creating a derivative calculation associated either with the passing of the
value or its subsequent use in other parts of the program. It is not necessary
to use \texttt{value}s for stream output of \texttt{dvariable}s, because the
stream operators \texttt{<{}<} and \texttt{>{}>} have been overloaded to accept
these objects. The value function should be used sparingly. In particular, it
must not be used in the middle of calculations for which the derivatives are
desired. The following example will produce an error:
\XX{stream operators}{\fontindexentry{tt}{<{}<}, \texttt{>{}>}}
\XX{operators}{\fontindexentry{tt}{<{}<}, \texttt{>{}>}}
\XX{stream input and output}{of \texttt{dvariables}}
\begin{lstlisting}
dvariable x,y,z;
double a;
// ... This code is incorrect
y=x*x;
a=value(y);
cout << " y = " << a << "\n";
z=sin(a); // This is where an error is introduced into the derivatives
cout << " z = " << z << "\n"; This will print out the value of z
\end{lstlisting}
The following correct version of the code will produce exactly the same output
and the derivative calculations will be correct:
\begin{lstlisting}
dvariable x,y,z;
double a;
// ... This code is correct
y=x*x;
cout << " y = " << y << "\n";
z=sin(y); // Use y instead of a at this point
cout << " z = " << z << "\n"; This will print out the value of z
\end{lstlisting}
It is possible to create an array \texttt{u} of 10 \texttt{dvariables} by using
the declaration \XX{array}{of \texttt{dvariables}}
\begin{lstlisting}
dvariable u[10]; // creates an array of 10 dvariables indexed from 0..9
\end{lstlisting}
However, this is not the recommended construction. Instead, the
\texttt{dvar\_vector} class should be used as in the declaration
\begin{lstlisting}
dvar\_vector u(0,9); // creates a dvar_vector of size 10 indexed from 0..9
\end{lstlisting}
This construction generates more efficient code and produces a ``safe'' object
with optional bounds checking. The \texttt{dvariable u[10]} construction may be
useful for converting pre-existing C code where the declaration \texttt{double}
or \texttt{float} is modified to \texttt{dvariable}.
\XX{\fontindexentry{tt}{float}}{no corresponding \scAD\ type}
There is at present no \scAD\ type that corresponds to the C~type
\texttt{float}. All \scAD\ variable types contain a \texttt{double}.
\section{The \texttt{dvector} and \texttt{dvar\_vector} classes}
Since the \texttt{dvector} and \texttt{dvar\_vector} classes are intended to
appear identical to the user (with the exception that derivative information is
generated for the \texttt{dvar\_vector} class), we shall discuss the use of
these classes simultaneously.
\X{safe arrays}
\XX{maximum size}{of a \texttt{dvar\_vector}}
The \texttt{dvar\_vector} class appears to the user as a vector (array) of
objects of type \texttt{dvariable}. Bounds checking for array and matrix objects
can be implemented---or not---by linking the user's program with the appropriate
\scAD\ library. The index for a \texttt{dvector} or a \texttt{dvar\_vector} is
of type \texttt{int} (a signed integer), so permissible array bounds are
$-32,768$ to $32,767$ for $16$-bit \textsc{dos} versions, where an object of
type \texttt{int} occupies two bytes. The maximum number of elements that can be
contained in a \texttt{dvar\_vector} is $8191$ in $16$-bit \textsc{dos} versions
of \scAD.
\section{Creating \texttt{dvector}s and \texttt{dvar\_vector}s}
Several declarations can be used to create a \texttt{dvector} or
\texttt{dvar\_vector} object:
\begin{lstlisting}
dvector v(1,n); // Creates a dvector with array bounds 1..n
dvector v=w; // Creates a dvector and initializes it with w
dvector w("{1.2,4,-5.23e-1}"); // Creates a dvector with array bounds 1..3
// and initializes it with 1.2, 4, -5.23e-1.
dvector w("mystuff.dat"); //Creates a dvector whose contents are
// read in from the file mystuff.dat. The data in
// mystuff.dat must be invalid numeric format
\end{lstlisting}
The code
\begin{lstlisting}
dvector v=w;
\end{lstlisting}
creates a \texttt{dvector v} by using the ``copy initializer''
\texttt{dvector(dvector\&)} constructor. This constructor performs a shallow
copy, so after this line of code is executed, \texttt{v} and \texttt{w} will
both point to the same area of memory where the array is contained. The same
type of constructor is invoked when a \texttt{dvector} or \texttt{dvar\_vector}
object is passed by value to a function. For example, suppose that a function
\texttt{users\_function} has been defined where the function prototype is
\begin{lstlisting}
void users_function(dvar_vector v);
\end{lstlisting}
and that the following statement appears somewhere in the user's code:
\begin{lstlisting}
dvar_vector w(1,n);
. . .
users_function(w);
. . .
\end{lstlisting}
To pass the \texttt{dvar\_vector w} to \texttt{users\_function}, a copy of
\texttt{w} is made and put on the stack. The copy initializer constructor is
called by the compiler to do this. The copy initializer uses a ``shallow'' copy
to create the copy of \texttt{w}, which is passed to the function.
A \texttt{dvector} contains a pointer that points to the area of memory where
the elements of the \texttt{dvector} are contained. With a shallow copy, the
value of this pointer is simply passed to the new copy of the \texttt{dvector
w}. As a result, both \texttt{w} in the calling routine and the copy of
\texttt{w} in the function \texttt{users\_function} point to the same area of
memory. So, changes made to this \texttt{dvector} in \texttt{users\_function}
will change the values of the corresponding entries of~\texttt{w} in the calling
routine, even though the \texttt{dvector} was passed by value. This can create a
problem if the user wants to operate on the \texttt{dvector} in
\texttt{users\_function} without changing the value of \texttt{w} in the calling
routine.
For ordinary C language objects, changes within a function to an argument passed
by value to the function do not affect the object passed. This is not the case
for the \scAD\ container classes \texttt{dvariable}, \texttt{dvector},
\texttt{dvar\_vector}, \texttt{dmatrix}, or \texttt{dvar\_matrix}, or
higher-dimensional arrays. If you wish to pass a ``copy'' of an object to a
function such that the original object is not affected by changes to the passed
object, you must explicitly make a distinct copy of the object by using the
assignment operator (\texttt{=}), as in this example.
\XX{AVOID}{using a shallow copy when a distinct object is desired}
\begin{lstlisting}
dvar_vector v(1,n); // Want to pass a copy of v to the function func
dvar_vector w(1,n); // This will be the distinct copy of v
w=v; // This creates the distinct copy
users_function(w); // Can now call the function
\end{lstlisting}
The user must avoid the simpler \cplus\ construction of declaring an initialized
object. Such a declaration will cause the copy initializer to be invoked, so a
shallow copy will be performed.
\begin{lstlisting}
dvar_vector v(1,n);
dvar_vector w=v; // This uses the copy initializer so that w is
// a shallow copy of v
users_function(w); // Changes to w in the function will cause changes
// in v
\end{lstlisting}
It is possible to create a \texttt{dvector} by explicitly naming the data it
will contain: \XX{constructor}{\fontindexentry{tt}{dvector}}
\begin{lstlisting}
dvector w("{12,24,56 78 91.4 23.455e+2}");
\end{lstlisting}
The data must be enclosed in double quotes and begin and end with braces
(\texttt{\{{}\}}). The numbers must be in valid numeric format and be separated
by blanks or commas. It is also possible to create a \texttt{dvector} using the
contents of a file. The declaration
\begin{lstlisting}
dvector w("mystuff.dat");
\end{lstlisting}
will create a \texttt{dvector} \texttt{w} that contains the contents of the file
\texttt{mystuff.dat}. It is not necessary to know how many numbers are contained
in the file to use this declaration. An error message will be generated if the
file does not exist, the file is empty, or there is non-numeric data in it. The
data in the file must be separated by spaces. This constructor will create a
\texttt{dvector} whose minimum valid index is~1.
It is also possible to create a \texttt{dvector} or a \texttt{dvar\_vector} by
reading in a column of data from a file. Suppose the file \texttt{``work.prn''}
contains the data
\begin{daysdecimaltable}
Monday & 25&.12 & -41& & 433&.12 & 10&.11 & 23& \\
Tuesday & 34&.21 & 356& & 23& & 23&.1 & 5& \\
Wednesday & 10& & 3&.13e-4 & 43&.11 & 3&.23 & 4& \\
Thursday & 12&.1 & 53&.13 & 453&.1 & -5&.13 & 4&.1 \\
\end{daysdecimaltable}
The declaration
\begin{lstlisting}
dvar_vector u("work.prn",3);
\end{lstlisting}
will create a \texttt{dvar\_vector} whose elements are the third column of the
file \texttt{work.prn}. That~is,
\begin{lstlisting}
u = (\ts-41,\ts356,\ts3.13e-4,\ts53.13\ts)
\end{lstlisting}
The entries of the file must be separated by blanks and all the entries in the
column being read must be in valid numeric format. This constructor will create
a \texttt{dvector} whose minimum valid index is~1.
% \section{Avoiding arithmetic statements in the initialization of
% \texttt{dvariable} objects}
% {\bf This section only applies to version of \scAD\ with
% version number less than 1.05}
%
% Including statements like
% \begin{lstlisting}
% dvariable x=a*b;
% \end{lstlisting}
% \noindent in your code can introduce errors.
% This is due to the use of shallow copies in \scAD\ combined with the
% the method used to allocate memory for objects the results containing
% %use of a circular queue of temporary objects for containing
% intermediate calculations. If \texttt{a} and \texttt{b} are
% \texttt{dvariables} then the operation \texttt{a*b} is stored in a temporary
% variable. The shallow copy which is then used to initialize
% the variable \texttt{x} will leave \texttt{x} pointing to the area
% of memory allocated to the temporary variable. Later on this
% memory location may be overwritten leading to incorrect
% results in the program. To avoid this type of error the proper
% construction is to split up this statement into two statements.
% \begin{lstlisting}
% dvariable x;
% x=a*b;
% \end{lstlisting}
% \XX{AVOID} {Initializing a \texttt{dvariable} with an arithmetic expression}
% \noindent The statement \texttt{x=a*b} performs a real copy so that
% \texttt{x} does not share the same memory location as the temporary
% variable.
% This restriction applies only to objects of type \texttt{dvariable}.
% It is perfectly all right to use arithmetic statements in
% the initialization of variable array objects. For example
% \begin{lstlisting}
% dvar_matrix M1=M*trans(M);
% \end{lstlisting}
% \noindent where \texttt{M} is a \texttt{dvar\_matrix} or
% \begin{lstlisting}
% dvar_vector v= 10*u;
% \end{lstlisting}
% \noindent where \texttt{u} is a \texttt{dvar\_vector}.
% resizable
Since this version of \scAD\ does not feature resizable arrays, the only way
that you can place the product of two \texttt{dvar\_matrices}~\texttt{M1}
and~\texttt{M2} in a \texttt{dvar\_matrix}~\texttt{M3} without knowing or
calculating the size of the product is to use the construction
\begin{lstlisting}
dvar_matrix M3=M1*M2;
\end{lstlisting}
so the copy initializer constructor will create an object with the correct size.
Of course, it is possible to calculate the size of the product and explicitly
declare the \texttt{dvar\_vector~M3} by using the \texttt{dvar\_matrix} access
functions to obtain the row and column dimensions of the matrices being
multiplied, as in the code
\begin{lstlisting}
dvar\_matrix M3(M1.rowmin(),M1.rowmax(),M2.colmin(),M2.colmax());
M3=M1*M2;
\end{lstlisting}
but this solution lacks elegance.
\section{The \texttt{dvector} and \texttt{dvar\_vector} access functions}
Most of the components of the \scAD\ classes are private, so they cannot be
accessed directly by the user. To allow the user to access the relevant elements
of the classes, public class member access functions are provided:
\begin{lstlisting}
int indexmin(); //Returns the minimum allowable index for an array
int indexmax(); //Returns the maximum allowable index for an array
int size(); //Returns the size of an array
int shift(int min); //Changes the allowable indices to min and min+size-1
\end{lstlisting}
These functions can be used to determine the size and valid index ranges of a
vector object that has been passed to a user's function. Thus, it is not
necessary to explicitly pass such information when designing a function that
uses a vector object. The information is already passed with the object itself.
If \texttt{v} is a \texttt{dvector}, the function \texttt{v.indexmin()} will
return the minimum valid index for \texttt{v}. The \texttt{shift} function
allows the valid index range for a \texttt{dvector} or \texttt{dvar\_vector} to
be changed. As an example of a function that acts on a vector objects, here is a
listing for a function that overloads the multiplication operator (\texttt{*});
\texttt{v * w} will be the dot product (also sometimes called the ``scalar
product'') of the \texttt{dvectors}~\texttt{v} and~\texttt{w}:
\begin{lstlisting}
double operator * (dvector& t1, dvector& t2)
{
// Check to see if vectors have the same valid index bounds
if (t1.indexmin() != t2.indexmin() || t1.indexmax() != t2.indexmax())
{
cerr << "Index bounds do not match in double operator * "
"(dvector&,dvector&)\n";
exit(1); // Stop if vectors are not compatible
}
double tmp;
tmp=0;
for (int i=t1.indexmin(); i<=t1.indexmax(); i++)
{
tmp+=t1[i]*t2[i];
}
return(tmp);
}
\end{lstlisting}
\XX{AVOID}{only creating a version of a function that takes a variable argument}
If you write your own functions, it is a good idea to provide both a constant
and variable version at the time of writing. This is especially true if you
write a variable version of the routine. Suppose that the user has written a
function \texttt{void userfun(dvar\_matrix)} that takes a \texttt{dvar\_matrix}
argument, but has neglected to make a version \texttt{void userfun(dmatrix)}
that takes a \texttt{dmatrix} argument. At some later time, the user wants to
use the function \texttt{userfun} with an argument of type \texttt{dmatrix},
such as in the following code segment.
\begin{lstlisting}
{
dmatrix M(1,5,1,5);
userfun(M); // M will be converted to a dvar_matrix
// ...
}
\end{lstlisting}
Since there is no version of \texttt{userfun} that accepts an argument of type
\texttt{dmatrix}, the \cplus\ compiler will convert \texttt{M} to an object of
type \texttt{dvar\_matrix}, by invoking the constructor for a
\texttt{dvar\_matrix(dmatrix\&)}. This is probably not what the user intends and
will be disastrous if an object of type \texttt{gradient\_structure} is not in
scope at the time.
\XX{dot product}{of vectors}
The corresponding variable version of the dot product for \texttt{dvar\_vectors}
would look like
\begin{lstlisting}
dvariable operator * (dvar_vector& t1, dvar_vector& t2)
{
if (t1.indexmin() != t2.indexmin() || t1.indexmax() != t2.indexmax())
{
cerr << "Index bounds do not match in dvariable operator * "
"(dvar_vector&,dvar_vector&)\n";
exit(1);
}
RETURN_ARRAYS_INCREMENT();
dvariable tmp;
tmp=0;
for (int i=t1.indexmin(); i<=t1.indexmax(); i++)
{
tmp+=t1[i]*t2[i];
}
RETURN_ARRAYS_DECREMENT();
return(tmp);
}
\end{lstlisting}
\X{\fontindexentry{tt}{RETURN\_ARRAYS\_INCREMENT}}
\X{\fontindexentry{tt}{RETURN\_ARRAYS\_DECREMENT}}
The appearance of the statements
\begin{lstlisting}
RETURN_ARRAYS_INCREMENT();
\end{lstlisting}
and
\begin{lstlisting}
RETURN_ARRAYS_DECREMENT();
\end{lstlisting}
requires some explanation. \scAD\ employs a stack of queues to hold intermediate
arithmetic results for variable objects. To ensure that these queues are not
overflowed, you should always increment the queue structure when entering a
function that returns a variable object, and decrement the queue structure when
leaving the function. If you fail to increment the queue structure and a queue
overflow occurs, the program will continue to operate, but the arithmetic will
be corrupted. If you fail to decrement the queue structure when leaving the
function, you may eventually increment past the end of the stack of queues,
producing an error message
\begin{lstlisting}
Overflow in RETURN_ARRAYS stack---Increase NUM_RETURN_ARRAYS
There may be a RETURN_ARRAYS_INCREMENT()
which is not matched by a RETURN_ARRAYS_DECREMENT()
\end{lstlisting}
Simply include the increment and decrement statements in every function you
create that returns a variable object and everything will be fine. It is not
necessary to include the increment and decrement statements in a function that
does not return a variable type, because such a function cannot be used in an
arithmetic statement. \textit{You should never include the}
\texttt{RETURN\_ARRAYS\_INCREMENT()} \textit{and}
\texttt{RETURN\_ARRAYS\_DECREMENT()}
\textit{functions in a function that returns a void or constant type, such as}
\texttt{double}, \texttt{dvector}, \textit{or} \texttt{dmatrix}. The reason is
that such a function might be called when there is no object from the class
\texttt{gradient\_structure} in scope. Calling
\texttt{RETURN\_ARRAYS\_INCREMENT()} or \texttt{RETURN\_ARRAYS\_DE\-CRE\-MENT()}
when there is no object of type \texttt{gradient\_structure} in scope could
cause a serious program error.
\XX{AVOID}{calling \texttt{RETURN\_ARRAYS\_INCREMENT()}}
\XX{AVOID}{calling \texttt{RETURN\_ARRAYS\_DECREMENT()}}
\section{Accessing array elements}
\X{operators \texttt{()} and \texttt{[]}}
\XX{access functions}{array elements}
Both the \texttt{()} and the \texttt{[]} operators can be used to access
elements of arrays. There is absolutely no difference between them. If
\texttt{v} is a \texttt{dvector}, the $\texttt{i}^\textrm{th}$ element of
\texttt{v} can be accessed by writing either \texttt{v[i]} or \texttt{v(i)}.
Optional array bounds checking is provided for both operators. Array bounds
checking is implemented or turned off by linking with the appropriate library.
No changes in the user's code are required. Linking with the safe libraries
enables array bounds checking, while linking with the optimized libraries
disables array bounds checking.
\XX{libraries}{safe}
\XX{libraries}{optimized}
\section{Creating column vectors and row vectors}
It is important to recognize that there is a difference in \scAD\ between a
vector object and a matrix object that has either 1 row or 1 column. This means
that a vector object is neither a row vector nor a column vector, it is simply a
vector object. We have adopted this treatment of vector objects so that it is
possible to multiply vector objects such as \texttt{u} and \texttt{v} simply by
writing \texttt{u*v}. If vector objects were the same as matrices with 1~column,
it would be necessary to write \texttt{trans(u)*v} to multiply two vectors. Here
\texttt{trans} denotes the transpose of a \texttt{matrix} object.
\X{\fontindexentry{tt}{column\_vector function}}
\X{\fontindexentry{tt}{row\_vector} function}
\XX{constructor}{\fontindexentry{tt}{dmatrix}}
Sometimes it is useful to be able to treat vector objects as though they were
matrices with one row or column. The functions \texttt{column\_vector} and
\texttt{row\_vector} enable this to be done. The function
\begin{lstlisting}
dmatrix column_vector(dvector& v);
\end{lstlisting}
will convert \texttt{v} into a matrix with 1 column. The valid column index is
from~1 to~1. The valid row index bounds are the same as the valid index bounds
for \texttt{v}. The function
\begin{lstlisting}
dmatrix row_vector(dvector& v);
\end{lstlisting}
will convert \texttt{v} into a matrix with 1 row. The valid row index is from~1
to~1. The valid column index bounds are the same as the valid index bounds for
\texttt{v}.
As an example of how to use these functions, consider the problem of computing
the outer product of two vectors. The outer product of two vectors $(u_i)$ and
$(v_j)$ is the matrix $w_{ij}$, where $w_{ij}=u_iv_j$. The statement
\begin{lstlisting}
dmatrix w=column_matrix(v)*row_matrix(u);
\end{lstlisting}
will compute the outer product of the \texttt{dvectors} \texttt{u} and
\texttt{v}. Of course, you can also use the function \texttt{outer\_prod},
supplied with the \scAD\ libraries, to accomplish the same thing:
\begin{lstlisting}
dmatrix w=outer_prod(u,v);
\end{lstlisting}
\XX{operators}{\fontindexentry{tt}{<{}<}}
Another use for the \texttt{column\_vector} function is to print out a vector
one element to a line. The overloaded operator \texttt{<{}<} will write out a
vector object all on one line. This is useful if, for example, you want to put a
vector object into a row in a spreadsheet, but it can be a problem for some
editors that cannot read such long lines in a file. The code
\begin{lstlisting}
cout << column_vector(v);
\end{lstlisting}
will print out the vector \texttt{v} with one entry per line.
\section{The \texttt{dmatrix} and \texttt{dvar\_matrix} classes}
The \texttt{dmatrix} and \texttt{dvar\_matrix} classes are implemented,
respectively, as arrays of \texttt{dvector} and \texttt{dvar\_vectors}. If
\texttt{M} is a \texttt{dmatrix}, then \texttt{M[j]} or \texttt{M(j)} is a
\texttt{dvector} whose elements are the $\texttt{j}^\textrm{th}$ row of
\texttt{M}. Since each row is a \texttt{dvector} that contains its own size
(shape) information, it is a simple matter to construct \texttt{dmatrix}
objects---so-called ``ragged'' matrices---that have a different numbers of rows
in each column. Of course, these are not matrices in the usual sense in which
the word is used, but they can be useful container objects for some
applications. Therefore, we have extended the \texttt{dmatrix} and
\texttt{dvar\_matrix} classes to include them. As with the \texttt{dvector and
dvar\_vector} classes, \texttt{dmatrix and dvar\_matrix} classes appear
identical to the user, with the exception that derivative information is
collected for the \texttt{dvar\_matrix class}. An object of type
\texttt{dvar\_matrix} is created by using the declarations
\begin{lstlisting}
dmatrix M(lbr,lur,lbc,luc); // Creates a (lur-lbr+1)$\times$(luc-lbc+1) matrix
dmatrix M1=M; // M has already been defined. M1 is initialized
// by M.
dmatrix M(lbr,lur,ivec_lb,ivec_ub); // Creates a ``ragged'' dmatrix with a
// variable number of rows in each column
dmatrix M("filename"); // Creates a dmatrix whose elements are the
// contents of the file ``filename''
dmatrix M( "{ 1.5,-2.0,3.0,3.5 }"
"{ 2.0,14.0,5.1,2.2 }" );// creates and initializes a dmatrix
\end{lstlisting}
The constructor
\begin{lstlisting}
dmatrix(lbr,lur,ivec_lb,ivec_ub)
\end{lstlisting}
creates a \texttt{dmatrix} with \texttt{lur-lbr+1} rows. The
$\texttt{i}^\textrm{th}$ row is a \texttt{dvector} of size $\texttt{ivec\_ub[i]}
- \texttt{ivec\_lb[i]+1}$ with a range of valid indices running from
\texttt{ivec\_lb[i]} to \texttt{ivec\_ub[i]}. \texttt{ivec\_lb} and
\texttt{ivec\_ub} are instances of the class \texttt{ivector}, that is, vectors
of integers. They must have valid indices ranging from \texttt{lbr} to
\texttt{lur}. For example, suppose we wish to make a ragged
\texttt{dvar\_matrix} with 4 rows for which the valid row index bounds should be
from~2 to~5. The declaration should be
\begin{lstlisting}
dvar_matrix M(2,5,ivec_lb,ivec_ub);
\end{lstlisting}
Suppose the desired valid row indices are 1 to 6 for row 1, 0 to 3 for row 2, 1
to 2 for row~4, and~3 to~9 for row~5. The legal bounds on the \texttt{ivector}s
\texttt{ivec\_lb} and \texttt{ivec\_ub} must be equal to the legal column bounds
on \texttt{M}. We can create such \texttt{ivector} objects with the declaration
\begin{lstlisting}
ivector ivec_lb(2,5);
ivector ivec_ub(2,5);
\end{lstlisting}
The correct values for \texttt{ivec\_lb} and \texttt{ivec\_ub} can be inserted
for example by using the ``fill'' function
\begin{lstlisting}
ivec_lb.fill("{1,0,1,3}");
ivec_ub.fill("{6,3,2,9}");
\end{lstlisting}
Alternatively, the desired \texttt{ivector} objects can be created with the
declarations
\begin{lstlisting}
ivector ivec_lb("{1,0,1,3}");
ivector ivec_ub("{6,3,2,9}");
\end{lstlisting}
These declarations will create \texttt{ivector} objects whose legal bounds go
from~1 to~4. To change the valid index bounds to the desired values from~2 to~5,
the class member function \texttt{shift()} can be used:
\begin{lstlisting}
ivec_lb.shift(2); // legal bounds will be from 2 to 5
ivec_ub.shift(2); // legal bounds will be from 2 to 5
\end{lstlisting}
As is the case for vector objects, the copy initializer for matrix objects
performs a light copy. To create a distinct new matrix object, you must use the
assignment operator (\texttt{=}).
The declaration
\begin{lstlisting}
dmatrix M("filename");
\end{lstlisting}
will create a \texttt{dmatrix} \texttt{M} whose elements are the contents of the
file \texttt{filename}. All entries in the file must be in valid numeric format.
All entries must be separated by blanks or commas. Each line in the file becomes
a line of the matrix. Ragged matrices are supported, so the files may have
different number of entries in each line. For the \textsc{dos} version of \scAD,
no line may contain more than 6550 elements.
To create a \texttt{dmatrix} and initialize it with data, a declaration of the
form
\begin{lstlisting}
dmatrix W( "{ 1.5,-2.0,3.0,3.5 }"
"{ 2.0,14.0,5.1,2.2 }" );
\end{lstlisting}
can be used.
This declaration will create a dmatrix with~2~rows and~4~columns. The minimum
valid index for the rows and columns is set equal to~1.
\section{The \texttt{dmatrix} access functions}
\XX{\fontindexentry{tt}{dmatrix}}{access functions}
\XX{access functions}{for \texttt{dmatrix} and \texttt{dvar\_matrix} class}
The components of all the \scAD\ classes are private, which means that they
cannot be accessed directly by the user. Public class member access functions
are provided to allow the user to access the relevant elements of the classes:
\begin{lstlisting}
int rowmin(); //Returns the minimum allowable row index for a matrix
int rowmax(); //Returns the maximum allowable row index for a matrix
int colmin(); //Returns the minimum allowable column index for a matrix
int colmax(); //Returns the maximum allowable column index for a matrix
int rowsize(); //Returns the number of rows in an array
int colsize(); //Returns the number of columns in an array
void colshift(int&); //Changes the range of valid indices for the columns
void rowshift(int&); //Changes the range of valid indices for the rows
\end{lstlisting}
\X{\fontindexentry{tt}{rowmax()}}
\X{\fontindexentry{tt}{colmin()}}
\X{\fontindexentry{tt}{colmax()}}
\X{\fontindexentry{tt}{rowsize()}}
\X{\fontindexentry{tt}{colsize()}}
\X{\fontindexentry{tt}{colshift(int\&)}}
\X{\fontindexentry{tt}{rowshift(int\&)}}
\section{The 3-dimensional arrays\br \texttt{d3array} and \texttt{dvar3\_array}}
\X{\fontindexentry{tt}{d3array} and \texttt{dvar3\_array} classes}
Three-dimensional arrays have been implemented in this version of \scAD\ mainly
as containers for collections of 2-dimensional arrays. If \texttt{u} is a
\texttt{d3array} then \texttt{u[1]} or \texttt{u(1)} is a dmatrix (provided
that~1 is a valid index for \texttt{u}). We shall refer to the object obtained
by fixing the first index of a 3-dimensional array as a ``slice.'' An element of
a 3-dimensional array is determined by picking a slice, a row, and a column.
\XX{constructor}{\fontindexentry{tt}{d3array}} \XX{slice}{of a \texttt{d3array}}
An object of type \texttt{dvar3\_array} is created by using the declarations
\begin{lstlisting}
dvar_3darray M(ls,us,lr,ur,lc,uc); // Creates a (us-ls+1)*(ur-lr+1)*(uc-lc+1)
// dvar_3darray
dvar_3darray M1=M; // M has already been defined. M1 is
// initialized by M.
d3array(ls,us,ivec_lr,ivec_ur,ivec_lc,ivec_uc); // Creates a ragged d3array
// with a variable size dmatrices in each slice.
\end{lstlisting}
\section{The \texttt{d3array} access functions}
\XX{access functions}{for a \texttt{d3array}}
\XX{\fontindexentry{tt}{d3array}}{access functions}
\begin{lstlisting}
int slicemin(); //Returns the minimum allowable slice index for a d3array
int slicemax(); //Returns the maximum allowable slice index for a d3array
int rowmin(); //Returns the minimum allowable row index for a d3array
int rowmax(); //Returns the maximum allowable row index for a d3array
int colmin(); //Returns the minimum allowable column index for a d3array
int colmax(); //Returns the maximum allowable column index for a d3array
int slicesize(); //Returns the number of slices in a d3array
int rowsize(); //Returns the number of rows in an d3array
int colsize(); //Returns the number of columns in an array
void sliceshift(int&); //Changes the range of valid indices for the slices
void colshift(int&); //Changes the range of valid indices for the columns
void rowshift(int&); //Changes the range of valid indices for the rows
\end{lstlisting}
\X{\fontindexentry{tt}{slicemin()}}
\X{\fontindexentry{tt}{slicemax()}}
\X{\fontindexentry{tt}{slicesize()}}
\X{\fontindexentry{tt}{rowmin()}}
\X{\fontindexentry{tt}{rowmax()}}
\X{\fontindexentry{tt}{colmin()}}
\X{\fontindexentry{tt}{colmax()}}
\X{\fontindexentry{tt}{rowsize()}}
\X{\fontindexentry{tt}{colsize()}}
\X{\fontindexentry{tt}{colshift(int\&)}}
\X{\fontindexentry{tt}{rowshift(int\&)}}
\X{\fontindexentry{tt}{slicehift(int\&)}}
\XX{conversion}{between between integer and floating point objects}
\section{Converting between the integer\br and floating point container objects}
It is often useful to be able to convert a floating point object into an integer
object, or an integer object into a floating point object. (See, for instance,
the example on creating a bootstrap sample, where a \texttt{dvector} object of
random numbers is converted into an \texttt{ivector} object of integer indices.)
It is possible to convert between vector objects as follows:
\begin{lstlisting}
ivector <--> dvector
lvector <--> dvector
lvector <--> ivector
\end{lstlisting}
These conversions may be made explicit, as in the following code fragment:
\begin{lstlisting}
dvector u(1,10);
ivector iv(1,10);
// ...
iv = ivector(u); // iv will be filled with the integer parts of the
// components of u
\end{lstlisting}
or the conversion may be ``implicit,'' that is, supplied by the compiler.
\X{implicit conversion}
\XX{conversion}{implicit}
\begin{lstlisting}
dvector u(1,10);
// ...
ivector iv = u; // iv will be filled with the integer parts of the
// components of u
\end{lstlisting}
In the second example, the \texttt{ivector} \texttt{iv} will have minimum and
maximum valid indices determined by the minimum and maximum valid indices of the
\texttt{dvector}~\texttt{u}.
%xx \htmlnewfile
\chapter{Operations and Functions}
\section{The operators $+$ and $-$}
\X{operators $+$ and $-$}
\XX{vector}{component-wise sum of}
\X{addition of vectors}
\XX{optimizing performance}{using the best operators for a calculation}
Since \scAD\ supplies a large number of operators and functions, there is often
more than one way to carry out a given calculation. While all equivalent methods
will produce the same answer, one method may produce more efficient code. For
example, the two lines of~code
\begin{lstlisting}
A=A+B;
//
A+=B;
\end{lstlisting}
will both add the object \texttt{A} to the object \texttt{B} and store the
result in \texttt{A}. (This will work whether \texttt{A} and \texttt{B} are
numbers, vectors, or matrices.) However, the code produced by \texttt{A+=B} is
more efficient in all cases. It is both faster and generates less temporary data
to be stored.
While the type of object produced by an arithmetic operation depends on whether
the arguments are constants (\texttt{double}, \texttt{dvector} or
\texttt{dmatrix}) or variables (\texttt{dvariable}, \texttt{dvar\_vector}, or
\texttt{dvar\_matrix}), the sort of operation defined by the operation does not.
The symbol~\texttt{+} between two vector objects will denote the component-wise
sum of these vectors and produce a vector object corresponding to their sum.
This is true regardless of whether one or both objects are constant or variable
objects. We denote this by writing
\begin{lstlisting}
vector_object = vector_object + vector_object // vector sum
\end{lstlisting}
We shall refer to the ``output'' object on the left side of the equals sign
($=$) by $z$ if it is a number, $z_i$ if it is a vector and $z_{ij}$ if it is a
matrix---in order to express the components of the operation. We shall refer to
the first input object (the one on the right side of the equals sign) by $x$,
$x_i$, and $x_{ij}$, and the second input object---if there is one---by $y$,
$y_i$, and $y_{ij}$. For any operation, the property of being a variable is
dominant in that if either of the input objects is a variable, then the output
object is a variable.
\begin{lstlisting}
vector_object = vector_object + vector_object // vector sum
\end{lstlisting}
\afterlistingthing{$z_i=x_i+y_i$}
\begin{lstlisting}
vector_object = number + vector_object // add number to vector
\end{lstlisting}
\afterlistingthing{$z_i=x+y_i$}
\begin{lstlisting}
vector_object = vector_object + number // add number to vector
\end{lstlisting}
\afterlistingthing{$z_i=x_i+y$}
\begin{lstlisting}
matrix_object = matrix_object + matrix_object // matrix sum
\end{lstlisting}
\afterlistingthing{$z_{ij}=x_{ij}+y_{ij}$}
\begin{lstlisting}
matrix_object = matrix_object + number // matrix sum
\end{lstlisting}
\afterlistingthing{$z_{ij}=x_{ij}+y$}
\XX{vector}{component-wise difference of}\X{subtraction of vectors}
\begin{lstlisting}
vector_object = vector_object - vector_object // vector difference
\end{lstlisting}
\afterlistingthing{$z_i=z_i-y_i$}
\XX{vector}{subtracting a number from a}
\begin{lstlisting}
vector_object = vector_object - number // subtract number from a vector
\end{lstlisting}
\afterlistingthing{$z_i=x_i-y$}
\begin{lstlisting}
matrix_object = matrix_object - number // matrix difference
\end{lstlisting}
\afterlistingthing{$z_{ij}=x_{ij}-y$}
\XX{matrix}{subtracting a number from a}
\begin{lstlisting}
matrix_object = matrix_object - matrix_object // matrix difference
\end{lstlisting}
\afterlistingthing{$z_{ij}=x_{ij}-y_{ij}$}
\section{The operators \ttpluseq, \ttminuseq, \ttmultiplyeq, and \ttdivideeq}
\X{operators \ttpluseq, \ttminuseq, \ttmultiplyeq, and \ttdivideeq}
These operators generate faster code than the corresponding operators defined
above, since their use avoids both the necessity for creating a temporary object
to hold the result, and the extra overhead associated with assigning the values
contained in this temporary object to the left-hand side of the expression.
\begin{lstlisting}
vector_object += vector_object // vector sum
\end{lstlisting}
\afterlistingthing{$x_i \pluseq y_i$}
\begin{lstlisting}
vector_object += number // add number to vector
\end{lstlisting}
\afterlistingthing{$x_i \pluseq d$}
\begin{lstlisting}
vector_object -= vector_object // vector difference
\end{lstlisting}
\afterlistingthing{$x_i \minuseq y_i$}
\begin{lstlisting}
vector_object -= number // subtract number from vector
\end{lstlisting}
\afterlistingthing{$x_i \minuseq d$}
\begin{lstlisting}
vector_object /= number // divide vector by number
\end{lstlisting}
\afterlistingthing{$x_i \divideeq d$}
\begin{lstlisting}
vector_object *= number // multiply vector by number
\end{lstlisting}
\afterlistingthing{$x_i \multiplyeq d$}
\begin{lstlisting}
matrix_object += matrix_object // matrix sum
\end{lstlisting}
\afterlistingthing{$x_{ij} \pluseq y_{ij}$}
\begin{lstlisting}
matrix_object += number // add number to matrix
\end{lstlisting}
\afterlistingthing{$x_{ij} \pluseq d$}
\begin{lstlisting}
matrix_object -= matrix_object // matrix subtraction
\end{lstlisting}
\afterlistingthing{$x_{ij} \minuseq y_{ij}$}
\begin{lstlisting}
matrix_object -= number // subtract number from matrix
\end{lstlisting}
\afterlistingthing{$x_{ij} \minuseq d$}
\section{The operators $*$ and $/$}
\X{$/$}
\X{operators $*$ and $/$}
\XX{vector}{dot product of two}
\X{dot product}
\begin{lstlisting}
number = vector_object * vector_object // vector dot product
\end{lstlisting}
\afterlistingthing{$z=\sum_i x_iy_i$}
\XX{vector}{multiplying a number by a}\X{scalar product}
\begin{lstlisting}
vector_object = number * vector_object // scalar product
\end{lstlisting}
\afterlistingthing{$z_i=x*y_i$}
\begin{lstlisting}
vector_object = vector_object * number // scalar product
\end{lstlisting}
\afterlistingthing{$z_i=x_i*y$}
\XX{vector}{multiplying a matrix by a}
\XX{matrix}{multiplying a vector times a}
\begin{lstlisting}
vector_object = vector_object * matrix_object // vector times matrix
\end{lstlisting}
\afterlistingthing{$z_j=\sum_i x_i*y_{ij}$}
\XX{vector}{multiplying a matrix times a}
\XX{matrix}{multiplying a vector by a}
\begin{lstlisting}
vector_object = matrix_object * vector_object // matrix times vector
\end{lstlisting}
\afterlistingthing{$z_i=\sum_j x_{ij}*y_{j}$}
\XX{matrix}{multiplication of two}
\X{matrix multiplication}
\begin{lstlisting}
matrix_object = matrix_object * matrix_object // matrix times matrix
\end{lstlisting}
\afterlistingthing{$z_{ij}=\sum_k x_{ik}*y_{kj}$}
\XX{matrix}{multiplication by a number}
\begin{lstlisting}
matrix_object = number * matrix_object // scalar times matrix
\end{lstlisting}
\afterlistingthing{$z_{ij}= x*y_{ij}$}
An additional vector-matrix operation is the outer product of two vectors.
\XX{vector}{outer product of two}
\XX{outer product}{of two vectors}
\begin{lstlisting}
matrix_object = outer_prod(vector_object,vector_object)
// outer product of two vectors
\end{lstlisting}
\afterlistingthing{$z_{ij}= x_i*y_j$}
\XX{vector}{dividing by a number}
\begin{lstlisting}
vector_object = vector_object / number // divide a vector by a number
\end{lstlisting}
\afterlistingthing{$z_i=x_i/y$}
\XX{vector}{dividing a number by}
\begin{lstlisting}
vector_object = number / vector_object // divide a number by a vector elements
\end{lstlisting}
\afterlistingthing{$z_i=x/y_i$}
\XX{matrix}{dividing by a number}
\begin{lstlisting}
matrix_object = matrix_object / number // divide a matrix by a number
\end{lstlisting}
\afterlistingthing{$z_{ij}=x_{ij}/y$}
\section{The concatenation of vector objects}
\XX{concatenation}{of vector objects} \XX{vector objects}{concatenation}
The \texttt{\&} operator has been overloaded to produce the concatenation of two
vector objects:
\begin{lstlisting}
vector_object=vector_object & vector_object
\end{lstlisting}
If \texttt{x}=$(x_r,\ldots,x_n)$ and \texttt{y}=$(y_s,\ldots,y_m)$, then
\texttt{x \& y}= $(x_r,\ldots,x_n,y_s,\ldots,y_m)$ and the minimum valid index
for \texttt{x \& y} is equal to the minimum valid index for~\texttt{x}.
Since in the present version of \scAD\ arrays are not resizable, it is not
possible to write something like \X{resizable arrays}
\begin{lstlisting}
x = x & y;
\end{lstlisting}
where \texttt{x} and \texttt{y} are \texttt{dvectors}. Instead, you must do
something like
\begin{lstlisting}
dvector z = x & y;
\end{lstlisting}
Future versions of \scAD\ will incorporate resizable arrays.
\section{Element-wise operations}
There are several operations familiar to users of spreadsheets that do not
appear as often in classical mathematical calculations. For example, spreadsheet
users often wish to multiply one column in a spreadsheet by the corresponding
elements of another column. Spread sheet users might find it much more natural
to define the product of matrices as an element-wise operation such as
\begin{equation*}
z_{ij}= x_{ij}*y_{ij}
\end{equation*}
The ``classical'' mathematical definition for the matrix product has been
assigned to the overloaded multiplication operator (\texttt{*}), so large
mathematical formulas involving vector and matrix operations can be written in a
concise notation. Typically, spreadsheet-type calculations are not so
complicated and do not suffer so much from being forced to adopt a
``function-style'' of notation.
Since addition and subtraction are already defined in an element-wise manner, it
is only necessary to define element-wise operations for multiplication and
division. We have named these functions \texttt{elem\_prod} and
\texttt{elem\_div}.
\XX{vector}{element-wise product of}
\XX{\fontindexentry{tt}{elem\_prod}}{element-wise product}
\begin{lstlisting}
vector_object = elem_prod(vector_object,vector_object) // element-wise multiply
\end{lstlisting}
\afterlistingthing{$z_i= x_i*y_i$}
\XX{vector}{element-wise division}
\XX{\fontindexentry{tt}{elem\_div}}{element-wise division}
\begin{lstlisting}
vector_object = elem_div(vector_object,vector_object) // element-wise divide
\end{lstlisting}
\afterlistingthing{$z_i= x_i/y_i$}
\XX{matrix}{element-wise product of}
\begin{lstlisting}
matrix_object = elem_prod(matrix_object,matrix_object) // element-wise multiply
\end{lstlisting}
\afterlistingthing{$z_{ij}= x_{ij}*y_{ij}$}
\XX{matrix}{element-wise division}
\begin{lstlisting}
matrix_object = elem_div(matrix_object,matrix_object) // element-wise divide
\end{lstlisting}
\afterlistingthing{$z_{ij}= x_{ij}/y_{ij}$}
\section{The identity matrix function \texttt{identity\_matrix}}
\XX{matrix}{the identity matrix function}
\begin{lstlisting}
matrix_object = identity_matrix(int min,int max)
\end{lstlisting}
\afterlistingthing{creates a square identity matrix with minimum valid index
\texttt{min} and maximum valid index~\texttt{max}.}
\section{The operations \texttt{det}, \texttt{inv}, \texttt{norm},
\texttt{norm2}, \br \texttt{min}, \texttt{max}, \texttt{sum}, \texttt{rowsum},
and \texttt{colsum}}
\subsubsection{The determinant of a matrix object}
\XX{matrix}{the determinant of}
\X{determinant}
\X{det}
(The matrix must be square: that is, the number of rows must equal the number of
columns.)
\begin{lstlisting}
matrix_object = det(matrix_object)
\end{lstlisting}
\subsubsection{The inverse of a matrix object}
(The matrix must be square: that is, the number of rows must equal the number of
columns.)
\XX{matrix}{the inverse of}
\X{inverse}
\X{inv@\texttt{inv}}
\begin{lstlisting}
matrix_object = inv(matrix_object)
\end{lstlisting}
\subsubsection{The norm of a vector object}
\XX{vector}{the norm of}
\XX{vector}{the norm squared of}
\XX{matrix}{the norm of}
\X{\fontindexentry{tt}{norm}}
\begin{lstlisting}
number = norm(vector_object)
\end{lstlisting}
\afterlistingthing{$z=\sqrt{\sum_{i} x_{i}^2} $}
\bigskip
\subsubsection{The norm squared of a vector\_object}
\XX{matrix}{the norm squared of}\X{\fontindexentry{tt}{norm2}}
\begin{lstlisting}
number = norm2(vector_object)
\end{lstlisting}
\afterlistingthing{$z=\sum_{i} x_{i}^2 $}
\bigskip
\subsubsection{The norm of a matrix\_object}
\begin{lstlisting}
number = norm(matrix_object)
\end{lstlisting}
\afterlistingthing{$z=\sqrt{\sum_{ij} x_{ij}^2} $}
\bigskip
\subsubsection{The norm squared of a matrix\_object}
\begin{lstlisting}
number = norm2(matrix_object)
\end{lstlisting}
\afterlistingthing{$z=\sum_{ij} x_{ij}^2 $}
\bigskip
\subsubsection{The sum over the elements of a vector object}
\XX{vector}{sum over the elements of}
\XX{\fontindexentry{tt}{sum}}{operation on a vector}
\begin{lstlisting}
number = sum(vector_object)
\end{lstlisting}
\afterlistingthing{$z=\sum_i x_i $}
\bigskip
\subsubsection{The row sums of a matrix object}
\XX{matrix}{\fontindexentry{tt}{rowsum of}}
\XX{\fontindexentry{tt}{rowsum}}{operation on a matrix}
\begin{lstlisting}
vector = rowsum(matrix_object)
\end{lstlisting}
\afterlistingthing{$z_i=\sum_j x_{ij}$}
\bigskip
\subsubsection{The column sums of a matrix object}
\XX{matrix}{\fontindexentry{tt}{colsum} of}
\XX{\fontindexentry{tt}{colsum}}{operation on a matrix}
\begin{lstlisting}
vector = colsum(matrix_object)
\end{lstlisting}
\afterlistingthing{$z_j=\sum_i x_{ij}$}
\bigskip
\subsubsection{The minimum element of a vector object}
\XX{vector}{minimum element of}
\XX{\fontindexentry{tt}{min}}{operation on a vector}
\begin{lstlisting}
number = min(vector_object)
\end{lstlisting}
\XX{vector}{maximum element of}
\XX{\fontindexentry{tt}{max}}{operation on a vector}
\bigskip
\subsubsection{The maximum element of a vector object}
\begin{lstlisting}
number = max(vector_object)
\end{lstlisting}
\bigskip
\section{Eigenvalues and eigenvectors\br of a symmetric matrix}
\XX{eigenvalues}{not differentiable}
\XX{eigenvectors}{not differentiable}
While we have included eigenvalue and eigenvector routines for both constant and
variable matrix objects, you should be aware that, in general, the eigenvectors
and eigenvalues are not differentiable functions of the variables determining
the matrix.
The eigenvalues of a symmetric matrix
\XX{eigenvalues}{of a symmetric matrix}
\begin{lstlisting}
vector_object = eigenvalues(matrix_object)
\end{lstlisting}
are returned in a vector. It is the user's responsibility to ensure that the
matrix is actually symmetric. The routine symmetrizes the matrix so that the
eigenvalues returned are actually those for the symmetrized matrix.
The eigenvectors of a symmetric matrix:
\XX{eigenvectors}{of a symmetric matrix}
\begin{lstlisting}
matrix_object = eigenvectors(matrix_object)
\end{lstlisting}
are returned in a matrix. It is the user's responsibility to ensure that the
matrix is actually symmetric. The routine symmetrizes the matrix so that the
eigenvectors returned are actually those for the symmetrized matrix. The
eigenvectors are located in the columns of the matrix. The $i^\textrm{th}$
eigenvalue returned by the function \texttt{eigenvalues} corresponds to the
$i^\textrm{th}$ eigenvector returned by the function \texttt{eigenvector}.
\X{Choleski decomposition of a symmetric matrix}
\XX{symmetric matrix}{Choleski decomposition}
\section{The Choleski decomposition of a\br positive definite symmetric matrix}
For a positive definite symmetric matrix \texttt{S}, the Choleski decomposition
of \texttt{S} is a lower triangular matrix \texttt{T} satisfying the
relationship \texttt{S=T*trans(T)}. If \texttt{S} is a (positive definite
symmetric) matrix object and \texttt{T} is a matrix object, the line of code
\begin{lstlisting}
T=choleski_decomp(S);
\end{lstlisting}
will calculate the Choleski decomposition of \texttt{S} and put it into
\texttt{T}.
\X{solving a system of linear equations}
\X{\fontindexentry{tt}{solve} function}
\XX{optimizing performance}{using the best operators for a calculation}
\section{Solving a system of linear equations}
If \texttt{y} is a vector and \texttt{M} is an invertible matrix, then finding a
vector \texttt{x} such that
\begin{lstlisting}
x=inv(M)*y
\end{lstlisting}
will be referred to as solving the system of linear equations determined by
\texttt{y} and \texttt{M}. Of course, it is possible to use the \texttt{inv}
function to accomplish this task, but it is much more efficient to use the
\texttt{solve} function.
\begin{lstlisting}
vector x=solve(M,y); // x will satisfy x=inv(M)*y;
\end{lstlisting}
It turns out that it is a simple matter to calculate the determinant of the
matrix \texttt{M} at the same time as the system of linear equations is solved.
Since this is useful in multivariate analysis, we have also included a function
that returns the determinant at the same time as the system of equations is
solved. To avoid floating point overflow or underflow when working with large
matrices, the logarithm of the absolute value of the determinant, together with
the sign of the determinant, are returned. The constant form of the
\texttt{solve} function is
\begin{lstlisting}
double ln_det;
double sign;
dvector x=solve(M,y,ln_det,sign);
\end{lstlisting}
while the variable form is
\begin{lstlisting}
dvariable ln_det;
dvariable sign;
dvar_vector x=solve(M,y,ln_det,sign);
\end{lstlisting}
The \texttt{solve} function is useful for calculating the log-likelihood
function for a multivariate normal distribution. Such a log-likelihood function
involves a calculation similar to
\begin{lstlisting}
l = -.5*log(det(S)) -.5*y*inv(S)*y
\end{lstlisting}
where \texttt{S} is a matrix object and \texttt{y} is a vector object. It is
much more efficient to carry out this calculation using the \texttt{solve}
function. The following code illustrates the calculations for variable objects:
\XX{multivariate normal distribution}%
{calculation of the log-likelihood function for}
\begin{lstlisting}
dvariable ln_det;
dvariable sign;
dvariable l;
dvar_vector tmp=solve(M,y,ln_det,sign);
l=-.5*ln_det-y*tmp;
\end{lstlisting}
\section{Methods for filling arrays and matrices}
\X{filling arrays and matrices}
While it is always possible to fill vectors and matrices by using loops and
filling them element by element, this is tedious and prone to error. To simplify
this task, a selection of methods for filling vectors and matrices with random
numbers, or a specified sequence of numbers, is available. There are also
methods for filling row and columns of matrices with vectors. In this section
the symbol \texttt{vector} can refer to either a \texttt{dvector} or a
\texttt{dvar\_vector}, while the symbol \texttt{matrix} can refer to either a
\texttt{dmatrix} or a \texttt{dvar\_matrix}.
\XX{vector}{\fontindexentry{tt}{fill}}
\XX{\fontindexentry{tt}{fill}}{filling a vector}
\begin{lstlisting}
void vector::fill("{m,n,...,}")
\end{lstlisting}
\afterlistingthing{fills a vector with a sequence of the form \texttt{n},
\texttt{m},$\ldots$ The number of elements in the string must match the size
of the vector.}
\XX{vector}{\fontindexentry{tt}{fill\_seqadd}}
\XX{\fontindexentry{tt}{fill\_seqadd}}{filling a vector}
\begin{lstlisting}
void vector::fill_seqadd(double& base, double& offset)
\end{lstlisting}
\afterlistingthing{fills a vector with a sequence of the form
\texttt{base}, \texttt{base+offset}, \texttt{base+2*offset},$\ldots$}
For example, if \texttt{v} is a \texttt{dvector} created by the statement
\begin{lstlisting}
dvector v(0,4);
\end{lstlisting}
then the statement
\begin{lstlisting}
v.fill_seqadd(-1,.5);
\end{lstlisting}
will fill \texttt{v} with the numbers $(-1.0,-0.5,0.0,0.5,1.0)$.
\XX{matrix}{\fontindexentry{tt}{rowfill\_seqadd}}
\XX{\fontindexentry{tt}{rowfill\_seqadd}}{filling a matrix}
\begin{lstlisting}
void matrix::rowfill_seqadd(int& i,double& base, double& offset)
\end{lstlisting}
\afterlistingthing{fills row \texttt{i} of a matrix with a sequence of the form
\texttt{base}, \texttt{base+offset}, \texttt{base+2*offset},$\ldots$}
\XX{matrix}{\fontindexentry{tt}{colfill\_seqadd}}
\XX{\fontindexentry{tt}{colfill\_seqadd}}{filling a matrix}
\begin{lstlisting}
void matrix::colfill_seqadd(int& j,double& base, double& offset)
\end{lstlisting}
\afterlistingthing{fills column \texttt{j} of a matrix with a sequence of the
form \texttt{base}, \texttt{base+offset}, \texttt{base+2*offset},$\ldots$}
\XX{matrix}{\fontindexentry{tt}{colfill}}
\XX{\fontindexentry{tt}{colfill}}{filling a matrix column with a vector}
\XX{matrix}{\fontindexentry{tt}{rowfill}}
\XX{\fontindexentry{tt}{rowfill}}{filling a matrix row with a vector}
\begin{lstlisting}
void matrix::colfill(int& j,vector&)
\end{lstlisting}
\afterlistingthing{fills the $\texttt{j}^\textrm{th}$ column of a matrix with a
vector.}
\begin{lstlisting}
void matrix::rowfill(int& i,vector&)
\end{lstlisting}
\afterlistingthing{fills the $\texttt{i}^\textrm{th}$ row of a matrix with a
vector.}
In this section, a uniformly distributed random number is assumed to have a
uniform distribution on $[0,1]$. A normally distributed random number is assumed
to have mean $0$ and variance~$1$. A binomially distributed random number is
assumed to have a parameter~$p$, where $1$ is returned with probability~$p$,
and~$0$ is returned with probability $1-p$. A multinomially distributed random
variable is assumed to have a vector of parameters $P$, where $i$ is returned
with probability~$p_i$. If the components of $P$ do not sum to $1$, the vector
will be normalized so that the components \textit{do} sum to~$1$.
\XX{vector}{\fontindexentry{tt}{fill\_randu}}
\XX{\fontindexentry{tt}{fill\_randu}}{filling a vector with random numbers}
\begin{lstlisting}
void vector::fill_randu(long int& n)
\end{lstlisting}
\afterlistingthing{fills a vector with a sequence of uniformly distributed
random numbers. The \texttt{long int n} is a seed for the random number
generator. Changing \texttt{n} will produce a different sequence of random
numbers.}
\XX{matrix}{\fontindexentry{tt}{colfill\_randu}}
\XX{\fontindexentry{tt}{colfill\_randu}}{filling a matrix with random numbers}
\begin{lstlisting}
void matrix::colfill_randu(int& j,long int& n)
\end{lstlisting}
\afterlistingthing{fills column \texttt{j} of a matrix with a sequence of
uniformly distributed random numbers. The \texttt{long int n} is a seed for
the random number generator. Changing \texttt{n} will produce a different
sequence of random numbers.}
\XX{matrix}{\fontindexentry{tt}{rowfill\_randu}}
\XX{\fontindexentry{tt}{rowfill\_randu}}{filling a matrix with random numbers}
\begin{lstlisting}
void matrix::rowfill_randu(int& i,long int& n)
\end{lstlisting}
\afterlistingthing{fills row \texttt{i} of a matrix with a sequence of uniformly
distributed random numbers.}
\XX{vector}{\fontindexentry{tt}{fill\_randbi}}
\XX{\fontindexentry{tt}{fill\_randbi}}{filling a vector with random numbers}
\begin{lstlisting}
void vector::fill_randbi(long int& n, double& p)
\end{lstlisting}
\afterlistingthing{fills a vector with a sequence random numbers from a binomial
distribution.}
\XX{vector}{\fontindexentry{tt}{fill\_randn}}
\XX{\fontindexentry{tt}{fill\_randn}}{filling a vector with random numbers}
\begin{lstlisting}
void vector::fill_randn(long int& n)
\end{lstlisting}
\afterlistingthing{fills a vector with a sequence of normally distributed random
numbers.}
\XX{matrix}{\fontindexentry{tt}{rowfill\_randn}}
\XX{\fontindexentry{tt}{rowfill\_randn}}{filling a matrix with random numbers}
\begin{lstlisting}
void matrix::colfill_randn(int& j,long int& n)
\end{lstlisting}
\afterlistingthing{fills column \texttt{j} of a matrix with a sequence of
normally distributed random numbers.}
\begin{lstlisting}
void matrix::rowfill_randn(int& i,long int& n)
\end{lstlisting}
\afterlistingthing{fills row \texttt{i} of a matrix with a sequence of normally
distributed random numbers.}
\XX{vector}{\fontindexentry{tt}{fill\_multinomial}}
\XX{\fontindexentry{tt}{fill\_multinomial}}%
{filling a vector with random numbers}
\begin{lstlisting}
void vector::fill_multinomial(long int& n, dvector& p)
\end{lstlisting}
\afterlistingthing{fills a vector with a sequence random numbers from a
multinomial distribution. The parameter~\texttt{p} is a \texttt{dvector} such
that \texttt{p[}$i$\texttt{]} is the probability of returning $i$. The
elements of \texttt{p} must sum to $1$.}
\section{Methods for extracting from arrays and matrices}
\X{extracting data from arrays and matrices}
\XX{matrix}{\fontindexentry{tt}{extract\_column}}
\XX{\fontindexentry{tt}{extract\_column}}{from a matrix}
\begin{lstlisting}
vector extract_column(matrix& M,int& j)
\end{lstlisting}
\afterlistingthing{extracts a column from a matrix and puts it into a vector.}
\begin{lstlisting}
vector extract_row(matrix& M,int& i)
\end{lstlisting}
\afterlistingthing{extracts a row from a matrix and puts it into a vector.}
\XX{\fontindexentry{tt}{extract\_diagonal}}{from a matrix}
\begin{lstlisting}
vector extract_diagonal(matrix& M)
\end{lstlisting}
\afterlistingthing{extracts the diagonal elements from a matrix and puts them
into a vector.}
\X{operator \texttt{()}}
The function call operator (\texttt{()}) has been overloaded in two ways to
provide for the extraction of a subvector. X{vector}{extracting a subvector}
\X{extracting a subvector}
\XX{matrix}{\fontindexentry{tt}{extract\_row}}
\XX{\fontindexentry{tt}{extract\_row}}{from a matrix}
\XX{vector}{function call \texttt{()} to extract subvector}
\begin{lstlisting}
vector(ivector&)
\end{lstlisting}
An \texttt{ivector} object is used to specify the elements of the vector to be
chosen. If \texttt{u} and \texttt{v} are \texttt{dvectors} and \texttt{i} is an
\texttt{ivector}, then the construction
\begin{lstlisting}
dvector u = v(i);
\end{lstlisting}
will extract the members of \texttt{v} indexed by \texttt{i} and put them in the
\texttt{dvector u}. The size of~\texttt{u} is equal to the size of~\texttt{i}.
The \texttt{dvector u} will have minimum valid index and maximum valid index
equal to the minimum valid index and maximum valid index of \texttt{i}. The size
of~\texttt{i} can be larger than the size of~\texttt{v}, in which case some
elements of \texttt{v} must be repeated. The elements of the \texttt{ivector i}
must lie in the valid index range for \texttt{v}.
If \texttt{v} is a \texttt{dvector} and \texttt{i1} and \texttt{i2} are two
integers,
\begin{lstlisting}
u(i1,i2)
\end{lstlisting}
is a \texttt{dvector}, which is a subvector of \texttt{v} (provided, of course,
that \texttt{i1} and \texttt{i2} are valid indices for \texttt{v}). Subvectors
can appear on both the left and right-hand side of an assignment:
\begin{lstlisting}
dvector u(1,20);
dvector v(1,19);
v = 2.0; // assigns the value 2 to all elements of v
u(1,19) = v; // assigns the value 2 to elements 1 through 19 of u
\end{lstlisting}
\X{operator $++$}
\X{operator $-{}-$}
\XX{operator $++$}{use with subvectors}
\XX{operator $-{}-$}{use with subvectors}
\XX{operator $++$}{for \texttt{dvectors}}
\XX{operator $-{}-$}{for \texttt{dvectors}}
In the above example, suppose that we wanted to assign the vector \texttt{v} to
elements 2 through 20 of the vector \texttt{u}. To do this, we must first ensure
that they have the same valid index ranges. The operators \texttt{++} and
\texttt{-{}-}, when applied to vector objects, increment and decrement the index
ranges by~1. See the code fragment below:
\begin{lstlisting}
dvector u(1,20);
dvector v(1,19);
v = 2.0; // assigns the value 2 to all elements of v
--u(2,20) = v; // assigns the value 2 to elements 2 through 20 of u
u(2,20) = ++v; // assigns the value 2 to elements 2 through 20 of u
\end{lstlisting}
It is important to realize that from the point of view of the vector \texttt{v},
both of the above assignments have the same effect. It will have elements 2
through 20 set equal to~2. The difference is the side effects on the vector
\texttt{v}. The use of subvectors, and increment and decrement operations, can
be used to remove loops from the code. Note that
\XX{subvectors}{using to remove loops from code}
\begin{lstlisting}
dvector x(1,n)
dvector y(1,n)
dvector z(1,n)
for (int i=2;i<=n;i++)
{
x(i)=y(i-1)*z(i-1);
}
\end{lstlisting}
can be written as
\begin{lstlisting}
dvector x(1,n)
dvector y(1,n)
dvector z(1,n)
x(2,n)=++elem_prod(y(1,n-1),z(1,n-1)); // elem_prod is element-wise
// multiplication of vectors
\end{lstlisting}
\section{Making a bootstrap sample}
X{bootstrap sample}{code for}
As an example of how one can use some of these operations, consider the problem
of creating bootstrap samples from a set of data. Suppose we wish to create a
random sample (with replacement) of the original data set. Assume that the
original data is contained in a \texttt{dvector} object named \texttt{data}. It
is not necessary that the new samples be the same size as the original sample,
so we shall allow the size of the bootstrap sample to be a variable. The code
for the function \texttt{bootstrap} follows:
\begin{lstlisting}
dvector bootstrap(dvector& data, int& sample_size, long int seed)
{
// data contains the data which is used to make the bootstrap sample
// sample_size is the size of the desired bootstrap sample
// seed is a seed for the random number generator
dvector tmp(1,sample_size);
tmp.fill_randu(seed); // 0{}>}}
\XX{vector}{I/O operations}
Input and output methods for the \scAD\ class objects have been implemented by
overloading of the \cplus\ stream I/O operators \texttt{<{}<} and \texttt{>{}>}
for all \scAD\ classes. The so-called \textit{iostream} package, introduced in
AT\&T~Release 2.0 of \cplus, is fully supported. For a detailed discussion of
\cplus\ stream I/O, you should consult your \cplus\ manual (and the appropriate
header files). In this section, we discuss extension of stream I/O to \scAD\
class objects. Formatted input and output uses the standard \cplus\ stream
classes \texttt{istream}, \texttt{ostream}, \texttt{ifstream}, and
\texttt{ofstream}. In addition, unformatted (``binary'') file I/O is implemented
through two additional stream classes \texttt{uostream} and \texttt{uistream}.
\section{Formatted stream I/O}
\XX{input and output}{formatted}
\X{formatting}
\XX{manipulators}{\fontindexentry{tt}{setw}}
\XX{manipulators}{\fontindexentry{tt}{setfixex}}
\XX{manipulators}{\fontindexentry{tt}{setscientific}}
\XX{manipulators}{\fontindexentry{tt}{setprecision}}
Variables of type \adtypes\ may be input or output using the standard stream
operators, and formatted using the standard manipulators. A two additional
manipulators are provided to simplify selection of either fixed point or
``scientific'' (e-format) notation. The use of the manipulators \texttt{setw()}
(set width), \texttt{setprecision()}, \texttt{setfixed}, \texttt{setscientific},
and \texttt{endl} is illustrated below.
\begin{lstlisting}
#include
#include
main()
{
int n = 30;
int m = 5;
gradient_structure gs;
dvar_matrix x(1, n, 1, m);
.
. // code to calculate x
.
// e format with default width and precision
cout << setscientific << x << endl;
// fixed point format
cout << setw(13) << setprecision(3) << setfixed << x << endl;
.
.
.
}
\end{lstlisting}
The first output statement will produce 30 lines with five numbers per line in
``scientific'' (e-format) separated by one or more blanks. The second output
statement will produce a table with 30 rows and 5 columns, with each member of
\texttt{x} printed with a fixed decimal place, with 3 significant figures in a
field 13 bytes wide. If the manipulators are omitted, the output will appear
with the default width, precision, and format specified by the compiler, and
with each number separated by one space.
The following standard manipulators are also supported:
\XX{manipulators}{\fontindexentry{tt}{setf}}
\begin{lstlisting}
setf(f, ios::floatfield); // to select fixed or scientific
setf(b, ios::basefield); // to base of number
setf(a, ios::adjustfield); // to select justification of fields
// where
// f may be ios::fixed or ios::scientific
// b may be ios::dec, ios::oct or ios::hex
// a may be ios::left, ios::right or ios::internal
\end{lstlisting}
%Some compilers to not yet support \texttt{iostream} package. The \scAD\ system
%includes the overloaded function \texttt{fform( )} which mimics the
%older style \cplus\ formatting function \texttt{form(const char* ... )}.
%The function prototypes for \texttt{fform( )} are:
%\begin{lstlisting}
%char* fform(const char*, dvariable&);
%char* fform(const char*, dvar_vector&);
%char* fform(const char*, dvar_matrix&);
%char* fform(const char*, dvector&);
%char* fform(const char*, dmatrix&);
%\end{lstlisting}
%\noindent The first argument is any valid C format string as described
%in the documentation for \texttt{printf}. The second
%argument is the object to be printed.
%The function \texttt{fform( )} is noticeably slower than the \texttt{iostream}
%package and the size of the object it can print is limited by
%by the size of an internal buffer. It offers no known advantages
%over the \texttt{iostream} package and is only provided to enable some
%control over the format of the output.
%{\bf There are incompatibilities between the new iostream package and the
%older stream package. Users are cautioned against trying to use both.}
\section{Error checking}
\X{I/O error checking}
\X{operator \texttt{!}}
\XX{operator \texttt{!}}{I/O error checking and}
Although the stream operators \texttt{>{}>} and \texttt{<{}<} make it very easy
to perform input or output on the \scAD\ container classes, it is important to
remember that they do not perform any error checking. While a complete
discussion of error states for the stream classes is beyond the scope of this
manual, a brief discussion of the use of the operator~\texttt{!} for I/O error
checking is included. If \texttt{infile} is a stream object, then after any I/O
operation involving \texttt{infile}, \texttt{!infile} will be true if the
operation has failed and false if the operation has succeeded. For example:
\begin{lstlisting}
dvector x(1,10);
ifstream infile("data"); // associate the file data with the ifstream
// object infile
if (!infile) // Check if the file has been successfully opened
{
cerr << "Error trying to open file data\n";
exit(1); // User has decided to stop on this error condition
}
infile >> x ; // read data into the dvector x;
if (!infile) // Check if the input operation was successful
{
cerr << "Error trying to read dvector x from file data\n";
exit(1); // User has decided to stop on this error condition
}
\end{lstlisting}
If you neglect to put in any error checking for I/O operations, your program may
very well carry on blissfully, even though the I/O operation has failed. The
symptoms of this failure can be quite confusing and waste a lot of valuable
program development time.
\section{Unformatted stream I/O}
\XX{input and output}{unformatted}
\XX{input and output}{binary}
It is often useful to store intermediate calculations in an external file.
Unformatted (``binary'') file I/O is the most rapid and space-efficient method
for such operations. It is supported through two additional stream classes
\texttt{uostream} and \texttt{uistream}. These two classes are derived classes
from the \texttt{ofstream} and \texttt{ifstream} classes in the
\textit{iostream} package.
\section{An example of input and output\br for \scAD\ classes}
\XX{input and output}{example}
The following listing illustrates the main features of the \scAD\ system I/O
support. It also illustrates the fill operations and \cplus\ scoping rules to
automatically destroy some variables.
\input io_examp.cpp
%xx \htmlnewfile
\chapter{Temporary Files}
\X{\fontindexentry{tt}{gradfil1.tmp}}
\X{\fontindexentry{tt}{gradfil2.tmp}}
\X{\fontindexentry{tt}{cmpdiff.tmp}}
\XX{temporary files}{\fontindexentry{tt}{gradfil1.tmp}}
\XX{temporary files}{\fontindexentry{tt}{gradfil2.tmp}}
\XX{temporary files}{\fontindexentry{tt}{cmpdiff.tmp}}
\XX{temporary files}{for saving derivative information}
The \scAD\ system saves on a stack information required to calculate the
derivatives during the evaluation of the function. The amount of information
depends on the number of variables for which derivative information is required,
and on the specific nature of the computations. It is difficult to predict
generally how much information will be generated. If the length of this stack
exceeds a preset amount, it is stored on the disk in one or two unformatted
temporary gradient files, \texttt{gradfil1.tmp} and \texttt{gradfil2.tmp}. There
is also derivative information from ``precompiled'' derivative calculations,
which is stored in the file \texttt{cmpdiff.tmp}.
\XX{\fontindexentry{sc}{ram} disk}{and temporary files}
\XX{temporary files}{use of the \textsc{dos} environment string \textsc{tmp}}
\XX{temporary files}{use of the \textsc{dos} environment string \textsc{tmp}1}
\XX{temporary files}{default directory for}
\XX{environment string}{specifying directory for temporary files}
The location of these files is controlled by the \textsc{dos} environment
strings, \texttt{TMP} and \texttt{TMP1}. The file \texttt{gradfil1.tmp} will be
created in the directory indicated by \texttt{TMP}, while the file
\texttt{cmpdiff.tmp} will be created in the directory indicated by
\texttt{TMP1}. If either or both of these strings is undefined, the current
directory will be used to create the files. Greatest speed will be achieved if
the temporary files are located on a \textsc{ram} disk. Assuming that your
\textsc{ram} disk is drive \texttt{E:}, type \texttt{set tmp=e:} at the
\textsc{dos} prompt. If \texttt{gradfil1.tmp} becomes too large for your
\textsc{ram} disk (or whatever device is specified by the \texttt{TMP}
environment string), the rest of the file will be stored in
\texttt{gradfil2.tmp} on the device referred to by the \textsc{dos} environment
string \texttt{TMP1}. For example, if you have typed \texttt{set tmp=e:} and
\texttt{set tmp1=c:\bs temp}, the first part of the gradient information will be
stored on \texttt{e:} (presumably a \textsc{ram} disk), while the second part
will be stored in the directory \texttt{c:\bs temp}. If the directory does not
exist, \scAD\ will display an error message and retire. If either the
\texttt{TMP} or \texttt{TMP1} environment string does not exist, \scAD\ will
create \texttt{gradfil1.tmp} or \texttt{gradfil2.tmp} in the current directory.
\XX{temporary files}{effect of abnormal program termination on}
The temporary gradient files are normally deleted when objects of class
\texttt{gradient\_structure} go out of scope. Abnormal termination of the
program (such as by pressing \texttt{Ctrl-Break} or use of the derivative
checker) will cause some temporary gradient files to remain in your file system.
These files may become quite large---several~Mb or more, so it is a good idea to
remove them.
%xx \htmlnewfile
\chapter{Global Variables}
\section{Adjusting the \scAD\ system global variables}
\XX{global variables}{access functions for}
\X{\fontindexentry{tt}{gradient\_structure}}
\X{controlling memory allocation}
The \scAD\ system declares a number of global variables that control memory
allocation for the \scAD\ structures. These global variables can be changed by
the use of ``access functions'' prior to declaring an object of type
\texttt{gradient\_structure}. For most applications, the default values of these
global variables should suffice. You will not need to worry about them until you
start building larger applications.
\bigskip
\XX{\fontindexentry{tt}{NUM\_RETURN\_ARRAYS}}{default value}
\begin{lstlisting}
int NUM_RETURN_ARRAYS = 10;
\end{lstlisting}
This global variable determines the maximum allowable depth of nesting of
functions that return \scAD\ variable types (see the discussion about the
\texttt{RETURN\_ARRAYS\_INCREMENT()} and \texttt{RETURN\_ARRAYS\_DECREMENT()}
instructions). The default value of 10 will suffice for almost all applications
and most users will not need to concern themselves with this variable.
To increase the number to, say, 15, you should put the instruction
\X{\fontindexentry{tt}{RETURN\_ARRAYS\_INCREMENT}}
\X{\fontindexentry{tt}{RETURN\_ARRAYS\_DECREMENT}}
\XX{error messages}{\fontindexentry{tt}%
{Overflow in RETURN\_ARRAYS stack. Increase NUM\_RETURN\_ARRAYS}}
\begin{lstlisting}
gradient_structure::set_NUM_RETURN_ARRAYS(15);
\end{lstlisting}
into your program before an object of type \texttt{gradient\_structure} is
declared to manage the derivative calculations. If you exceed the maximum
allowable depth of nesting, the message
\begin{lstlisting}
Overflow in RETURN_ARRAYS stack---Increase NUM_RETURN_ARRAYS
\end{lstlisting}
will appear. This message can also occur if you have put a
\texttt{RETURN\_ARRAYS\_INCREMENT} that is not matched by a
\texttt{RETURN\_ARRAYS\_DECREMENT} into one of your functions.
\bigskip
\X{\fontindexentry{tt}{GRADSTACK\_BUFFER\_SIZE}}
\begin{lstlisting}
long int GRADSTACK_BUFFER_SIZE = 2200;
\end{lstlisting}
This global variable determines the number of entries contained in the buffer
that, in turn, contains the information necessary for calculating derivatives.
For historical reasons, the actual amount of memory (in bytes) reserved for the
buffer is equal to the value of \texttt{GRADSTACK\_BUFFER\_SIZE} multiplied by
the size (in bytes) of an \scAD\ structure, \texttt{grad\_\-stack\_\-entry}. For
16-bit \textsc{dos} compilers, the size of \texttt{grad\_stack\_entry} is
28~bytes. The default value of \texttt{GRADSTACK\_BUFFER\_SIZE} is 2200 bytes,
which reserves a buffer of~$28*2200=61600$~bytes. Since for 16-bit \textsc{dos},
versions of \scAD\ buffers are limited in size to a maximum of 64K, 2200 is
about the largest size that can be used.
If you make the value smaller (say, 500), it will free up some memory (about
50K), but the program will execute more slowly, because it must store data on
the hard disk more often. So, this is a desperation move in a situation where
you must find some more memory for your program. To decrease the number to 500,
you should put the instruction
\begin{lstlisting}
gradient_structure::set_GRAD_STACK_BUFFER_SIZE(500);
\end{lstlisting}
into your program before declaring a \texttt{gradient\_structure} object.
\bigskip
\X{\fontindexentry{tt}{CMPDIF\_BUFFER\_SIZE}}
\X{\fontindexentry{tt}{set\_CMPDIF\_BUFFER\_SIZE}}
\begin{lstlisting}
long int CMPDIF_BUFFER_SIZE = 32000L;
\end{lstlisting}
This global variable determines the size in bytes of the buffer used to contain
the information generated by the ``precompiled'' derivative code. To change this
variable, the instruction
\begin{lstlisting}
static void gradient_structure::set_CMPDIF_BUFFER_SIZE(long int i);
\end{lstlisting}
is used.
\bigskip
\X{\fontindexentry{tt}{RETURN\_ARRAYS\_SIZE}}
\begin{lstlisting}
int RETURN_ARRAYS_SIZE = 50;
\end{lstlisting}
This global variable controls the amount of complexity that one line of
arithmetic can have. The present default value should be large enough for any
conceivable purpose. It is recommended that the user leave it alone.
\bigskip
\X{\fontindexentry{tt}{MAX\_NVAR\_OFFSET}}
\XX{independent variables}{maximum number of}
\begin{lstlisting}
int MAX_NVAR_OFFSET = 200;
\end{lstlisting}
This global variable determines the maximum number of independent variables that
can be used. In can be increased, for example, to 500, by including the
instruction
\begin{lstlisting}
gradient_structure::set_MAX_NVAR_OFFSET(500);
\end{lstlisting}
into your program before declaring a \texttt{gradient\_structure} object. If you
have more independent variables than the value of \texttt{MAX\_NVAR\_OFFSET},
the error message
\begin{lstlisting}
You need to increase the global variable MAX_NVAR_OFFSET to xxx
\end{lstlisting}
will appear and execution will be halted.
\XX{error messages} {\fontindexentry{tt}%
{You need to increase the global variable MAX\_NVAR\_OFFSET to xxx}}
\bigskip
\X{\fontindexentry{tt}{ARRAY\_MEMBLOCK\_SIZE}}
\begin{lstlisting}
unsigned long ARRAY_MEMBLOCK_SIZE=100000L;
\end{lstlisting}
This global variable determines the maximum amount of memory (in bytes)
available for the \scAD\ variable type container class objects
(\texttt{dvar\_vector}, \texttt{dvar\_matrix}, etc.). It should be set to the
largest single amount of memory required when an object of type
\texttt{gradient\_structure} is declared. An \scAD\ variable array object
requires about 8 bytes for each element of the array. A \texttt{dvar\_vector}
object of size 500 will require slightly more than 4000 bytes of memory. A
\texttt{dvar\_matrix} object with 20~rows and 30~columns would require $ 20
\times 30 \times 8 = 4800$ bytes of memory. To provide the maximum amount of
memory for other purposes, you can reduce the size of
\texttt{ARRAY\_MEMBLOCK\_SIZE} until an error message telling you to increase
\texttt{ARRAY\_MEMBLOCK\_SIZE} appears.
\XX{error messages}{\fontindexentry{tt}%
{Need to increase ARRAY\_MEMBLOCK\_SIZE parameter}}
\X{\fontindexentry{tt}{MAX\_DLINKS}}
\begin{lstlisting}
unsigned MAX_DLINKS = 1000;
\end{lstlisting}
The access function
\XX{\fontindexentry{tt}{dvariable}}%
{controlling the maximum number of with \texttt{set\_MAX\_DLINKS}}
\XX{\fontindexentry{tt}%
{set\_MAX\_DLINKS}}{setting the maximum number of dvariable objects}
\begin{lstlisting}
gradient_structure::set_MAX_DLINKS(int i);
\end{lstlisting}
determines the maximum number of \texttt{dvariable} objects that can be in scope
at one time. The default value is 1000, which should be enough for most
purposes. If the value is exceeded, the message
\XX{error messages}{\fontindexentry{tt}%
{Need to increase the maximum number of dlinks}}
\begin{lstlisting}
Need to increase the maximum number of dlinks
\end{lstlisting}
will appear and program execution will be terminated.
\subsubsection{Other access functions}
The complete list of access functions for the \scAD\ system global variables is:
\XX{access functions}{for global objects}
\begin{lstlisting}
static void set_NUM_RETURN_ARRAYS(int i);
static void set_ARRAY_MEMBLOCK_SIZE(unsigned long i);
static void set_GRADSTACK_BUFFER_SIZE(long int i);
static void set_MAX_NVAR_OFFSET(unsigned int i);
static void set_MAX_DLINKS(int i);
static void set_CMPDIF_BUFFER_SIZE(long int i);
\end{lstlisting}
These functions may not be used while an object of type
\texttt{gradient\_structure} is in scope. Doing so will produce an error message
and program execution will be halted.
\section{Stack size}
\XX{stack}{allocating sufficient space for the}
The size of the stack is usually controlled by a compiler-specific global
variable. The default stack size is generally too small for \scAD. It is
recommended that the stack size be increased to 15000 or 20000 bytes. The \scAD\
library routines have been compiled without the option of checking for stack
overflow, so if you do not reserve enough area for the stack, your program will
probably crash the system without any warning, or simply behave in some strange
manner.
\XX{stack}{checking for overflow of the}
Stack size is set by inserting an appropriate line in the code prior to the
\texttt{main()} block. The following examples set the stack length to 15000
bytes. Refer to your compiler documentation for more details.
\X{\fontindexentry{tt}{\_stklen}}
\X{Borland}
\XX{stack}{setting the stack size for Borland \string\cplus}
For the Borland compiler, the appropriate syntax is:
\begin{lstlisting}
extern unsigned _stklen = 15000;
main()
{
// ... your code
}
\end{lstlisting}
\X{\fontindexentry{tt}{\_stack}}
\X{Zortech}
\XX{stack}{setting the stack size for Zortech \string\cplus}
\noindent For the Zortech compiler, the appropriate syntax is:
\begin{lstlisting}
unsigned int _stack = 15000;
main()
{
// ... your code
}
\end{lstlisting}
%xx \htmlnewfile
\chapter{The \scAD\ Function\br Minimizing Routines}
\label{ch:function-minimizing}
\XX{minimization of functions}{quasi-Newton method}
\XX{minimization of functions}{conjugate gradient method}
\X{conjugate gradient method}
\X{quasi-Newton method}
\scAD\ is supplied with both a quasi-Newton function minimizer and a conjugate
gradient function minimizer. A quasi-Newton function minimizer is an effective
method for minimizing smooth functions of up to one or two hundred parameters.
At some point, the calculations involved in updating the inverse Hessian
approximation (on the order of $n^3$) and the memory requirements for the
inverse Hessian approximation (with $n(n+1)/2$ floating point elements), where
$n$ is the number of parameters, become prohibitive. The conjugate gradient
method, supplied with \scAD, should then be used.
\XX{minimization of functions}{user interface}
We have provided the user with two types of interfaces to the function minimizer
routines. The first approach provides the user with the maximum amount of
encapsulation of the minimization routine. It effectively allows the user to
say, ``Here is my function. Minimize it.'' The disadvantage of this approach is
some loss of flexibility in managing memory and controlling the minimization
process. To provide maximum flexibility, we also provide a lower-level interface
that allows the user to exercise complete control over the minimization process.
\X{minimize function}
The fully encapsulated minimization procedure is invoked by calling the function
\texttt{minimize}. The following code invokes the \texttt{minimize} routine to
find the minimum for the user's function \texttt{userfun}.
\input minimize.cpp
\XX{\fontindexentry{tt}{independent\_variables}}{declaration of}
\XX{\fontindexentry{tt}{independent\_variables}}{initialization of}
In this example, it is the user's responsibility to determine the number of
independent variables \texttt{nvar} and to declare the array of type
\texttt{independent\_variables} that holds the independent variables. The
elements of the \texttt{independent\_variables} array \texttt{x} are initialized
to zero. If other initial values are desired, the user must initialize the array
as well. An example of a typical simple function that could be minimized by this
code is the code for the function
\begin{equation*}
\big(x[1]-1\big)^2 + \sum_{i=1}^{i=0)
{
fmc.fmin(f,x,g);
if (fmc.ireturn > 0)
{
f=userfun(x,...) // The users function
gradcalc(nvar,g);
}
}
\end{lstlisting}
The \texttt{dvector} objects \texttt{g} and \texttt{x} must have minimum valid
indices equal to~1 and maximum valid indices equal to the number of independent
variables in the minimization.
\X{\fontindexentry{tt}{BEGIN\_MINIMIZATION}}
\X{\fontindexentry{tt}{END\_MINIMIZATION}}
Since many parts of the code sequence are independent of the particular
application, the code segments for calling the function minimizer and evaluating
the derivatives has been made into a macro. Using the macros,
\texttt{BEGIN\_MINIMIZATION(nvar,f,x,g,fmc)} and
\texttt{END\_MINIMIZATION(nvar,g)}, the above code can be written as
\begin{lstlisting}
double f;
independent_variables x(1,nvar);
dvector g(1,nvar);
fmm fmc(nvar); // Use this declaration for the quasi-Newton function minimizer
// fmmc fmc(nvar); // Use this declaration for the conjugate-gradient function
//minimizer
BEGIN_MINIMIZATION(nvar,f,x,g,fmc)
f=userfun(x,...) // The users function
END_MINIMIZATION(nvar,g)
\end{lstlisting}
The macro \texttt{BEGIN\_MINIMIZATION} takes five arguments: the number of
independent variables, the name of the object of type \texttt{double} that will
contain the value of the function being minimized, the independent variables,
the gradient vector, and the name of the function-minimizing control structure.
The macro \texttt{END\_MINIMIZATION} takes two arguments: the number of
independent variables and the gradient vector.
\XX{\fontindexentry{tt}{BEGIN\_MINIMIZATION}}{number of arguments}
\XX{\fontindexentry{tt}{END\_MINIMIZATION}}{number of arguments}
\XX{\fontindexentry{tt}{fmm} class}{controlling minimization with}
\XX{\fontindexentry{tt}{fmmc} class}{controlling minimization with}
Notice that even when using the macros, it is the responsibility of the user to
declare the name of the object of type \texttt{double}, \texttt{f}, which will
contain the value of the function being minimized, the
\texttt{independent\_variables x} and the \texttt{dvector g}, as well as the
\texttt{fmm} object \texttt{fmc}. The quasi-Newton function minimizer
\texttt{fmin} is controlled by, and is a member function of, the class
\texttt{fmm}. The constructor for the class \texttt{fmm} takes \texttt{nvar},
the number of variables, as an argument and creates the structure \texttt{fmc},
which controls the minimization. The user can control the operation of the
function minimizer by changing certain values of the \texttt{fmm} structure
\texttt{fmc}. The conjugate gradient function minimizer \texttt{fmin} is a
member function of the class \texttt{fmmc}. The only difference in calling the
quasi-Newton or conjugate-gradient function minimizers is in the declaration of
an object of type \texttt{fmm} for the quasi-Newton method, and of type
\texttt{fmmc} for the conjugate-gradient method. The public members of this
structure are given below.
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{maxfn}}
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{min\_improve}}
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{crit}}
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{imax}}
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{iprint}}
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{scroll\_flag}}
\XX{\fontindexentry{tt}{fmm} class}{\fontindexentry{tt}{ireturn}}
\begin{lstlisting}
class fmm
{
public:
// control variables:
long maxfn; // maximum number of function evaluations
// default value: maxfn = 500
double min_improve; // if the function doesn't decrease by at least this
// amount over 10 iterations then quit.
// default value: min_improve=1.e-6
double crit; // gradient convergence criterion; fmin terminates if
// maximum absolute value of the
// gradient component is less than crit.
// default value: crit = 1e-4
long imax; // maximum number of function evaluations in linear
// search without improving the current estimate
// before fmin terminates default value : imax = 30
long iprint; // number of iterations between screen displays of
// current parameter estimates default value: iprint = 1
long scroll_flag; // set to 1 to scroll display on screen;
// 0 to overwrite screen
// default value: scroll_flag = 1
int ireturn; // used to control the loop containing fmin and the
// user's function
}
\end{lstlisting}
There are five criteria by which the function minimizer may decide that
convergence has been achieved.
\X{convergence criteria for function minimizer}
\XX{function minimizer}{convergence criteria for}
\begin{enumerate}
\item If the magnitude of all the gradient components is less than
\texttt{crit} (default value 1.0\e{-4}).
\item If the maximum number of function evaluations, \texttt{maxfn}, is
exceeded (default value 500).
\item If the function value does not decrease by more than
\texttt{min\_improve} over 10 iterations (default value 1.0\e{-6}).
\item If the function minimizer tries to evaluate the function more than
\texttt{imax} times within one linear search (default value 30).
\item If the function minimizer detects that the user has pressed the `Q' or
`q' key.
\end{enumerate}
Here is an example of the use of these switches:
\begin{lstlisting}
dvector g(1,nvar);
dvector x(1,nvar);
double f;
fmm fmc(nvar);
fmc.maxfn=1000; // Maximum number of function evaluations is 1000
fmc.iprint=10; // Current best estimate will be printed out every
// ten iterations
fmc.crit=.01; // Convergence if derivatives are all smaller in
// magnitude than .01
fmc.min_improve=0.0; // This convergence criterion won't be used
fmc.imax=0; // This convergence criterion won't be used
gradient_structure gs;
while (fmc.ireturn >=0)
{
fmc.fmin(f,x,g);
if (fmc.ireturn > 0)
{
f=userfun(x,...) // The users function
gradcalc(nvar,g);
}
}
\end{lstlisting}
The function prototype for \texttt{fmin} is:
\begin{lstlisting}[escapechar=\%]
void fmm::fmin(double& f, independent_variables & x, dvector & g);
/*--------------------------------------------------------------------------*
* Purpose: quasi-Newton numerical minimization *
* f : current value of function to be minimized *
* x : current values of the variables over which the minimization *
* occurs; instance of class independent variables with elements *
* from 1 to n *
* g : current value of the gradient as returned by the function *
* gradcalc( ... ); instance of class dvector with elements *
* from 1 to n *
*--------------------------------------------------------------------------*/
\end{lstlisting}
The execution of \texttt{fmin} may be terminated by pressing the `Q' key on the
keyboard. There may be a slight delay between the keypress and the termination
of \texttt{fmin}.
\section{Putting bounds on parameters}
It is often necessary to put bounds on the values that a parameter can take. For
example, if the expression $1/y$ is to be evaluated in a program, an arithmetic
error will occur if $y=0$. Similarly, $e^x$ will probably produce an error if
$x=100000$.
\X{\fontindexentry{tt}{boundp}}
\XX{bounding functions}{\fontindexentry{tt}{boundp}}
The function \texttt{boundp} is supplied with \scAD\ to simplify the placing of
bounds on parameters.
\begin{lstlisting}
dvariable boundp( dvariable x, double fmin, double fmax, dvariable& fpen)
{
dvariable t,y;
t=fmin + (fmax-fmin)*(sin(x*1.570795)+1)/2;
if (x < 0.00001)
{
fpen+=(x-0.00001)*(x-0.00001);
}
if (x > 0.9999)
{
fpen+=(x-0.9999)*(x-0.9999);
}
if (x < -1)
{
fpen+=1000*(x+1)*(x+1);
}
if (x > 1)
{
fpen+=1000*(x-1)*(x-1);
}
return(t);
}
\end{lstlisting}
The function \texttt{boundp} constrains the variable to lie between the values
\texttt{fmin} and \texttt{fmax}. Both the values \texttt{fmin} and \texttt{fmax}
can actually be attained, so if you want a variable to be greater than zero, the
code
\begin{lstlisting}
y=boundp(x,0.0,1.0,fpen);
\end{lstlisting}
will not do the job. Instead, you can accomplish this in one of two ways. You
can use a quadratic transformation if you really don't care how large \texttt{x}
can become.
\begin{lstlisting}
y=x*x;
\end{lstlisting}
You can make a decision about what is meant by ``close to zero'' for your
application---in this case, 0.00001:
\begin{lstlisting}
y=boundp(x,0.00001,1.0,fpen);
\end{lstlisting}
\XX{penalty function}{use wish bounding functions}
The penalty function \texttt{fpen} is included in the \texttt{boundp} routine,
because otherwise, the function would have a zero derivative at its lower and
upper bounds. This causes the derivative with respect to the bounded parameter
to be equal to zero. The result is that if the minimization routine is started
with a parameter value already at the upper or lower bound, then this parameter
will never be changed. The penalty shifts the position of the critical point
slightly within the bounds. Of course, if the initial parameter estimate happens
to coincide exactly with this critical point, the derivative will be zero, but
this is very unlikely.
If you neglect to add the penalty function to the function being minimized, the
minimization routine will still work. The only problem is that you run the risk
of one of the bounded parameters getting ``stuck'' at its upper or lower bound.
\XX{penalty function}{getting the sign correct}
The penalty should be added to the function that you wish to minimize. If you
wish to maximize the function (in which case you are going to change the sign of
the function before passing it to the function minimizer), then you should
subtract the penalty. \textit{If you get the sign of the penalty wrong, the
routines will not work properly.}
\X{\fontindexentry{tt}{boundpin}}
\XX{bounding functions}{\fontindexentry{tt}{boundpin}}
\XX{bounding functions}{inverse function for}
To begin the minimization, you generally know the initial value of the
constrained parameter~\texttt{y}, and you need to find the value of~\texttt{x}
that will produce this value, so you can initialize the proper component of the
vector of \texttt{independent\_variables} with it. The function
\texttt{boundpin}, which is the inverse function of \texttt{boundp}, will yield
the desired value for~\texttt{x}.
\begin{lstlisting}
// double boundpin(double y, double fmin, double fmax)
x=boundpin(y,fmin,fmax);
\end{lstlisting}
\section{The derivative checker}
\XX{derivative checker}{checking correctness of derivatives with}
It may happen that the derivatives being passed to \texttt{fmin} become
incorrect for some values of the parameters. Two possible reasons for this are
1) an error in the internal derivative calculations of \scAD\ itself or 2) the
use of a user's function that is not differentiable at the point in question.
The former cause becomes increasingly less likely as experience with \scAD\
continues and any remaining bugs are removed. The latter can occur because when
constructing a large complicated function, it is always possible (and not
uncommon, in our experience) to add some non-differentiable components as an
oversight.
If the function becomes nondifferentiable at some point or the derivative
calculations are incorrect, the function minimizer will begin to experience
difficulty in making progress. To help identify derivative errors or cases of
non-differentiability when they occur, the \scAD\ system includes a interactive
derivative checker, which compares the value of the derivatives calculated by
\texttt{gradcalc()} with finite difference approximations. The derivative
checker can be invoked while the function minimizer is in progress by pressing
the `C' key on the keyboard.
\XX{derivative checker}{invoking from the function minimizer}
The derivative checker operates in two modes. In the default mode, the
derivative is checked for one independent variable at a time. In this mode, you
will be asked to supply the index of the variable for which you want to check
the derivative. Alternatively, if you want to check all derivatives, enter $0$
when prompted for the index of the variable to check. Next, you will be asked to
enter the step size to use for the finite difference approximation. Enter a
number which will alter the value of the of the variables to be checked in the
fifth or sixth significant digit. The derivative checker will then tabulate the
values of the function, the variable being checked, the analytical derivative
(from \texttt{gradcalc()}), the finite difference approximation, and the index
of the variable being checked. The derivative checker may be interrupted by
pressing the `X' key on the keyboard. On completion, the derivative checker
exits to the operating system.
For many models, the finite approximations to the derivatives may change
radically as the step size used in the derivative approximation is changed. How
then can one be sure that the analytical derivatives are correct or incorrect?
The following ``rule of thumb'' seems to work fairly well. First, identify a
gradient component that seems to be incorrect. Pick this component as the
independent variable whose derivative is to be checked. Evaluate the finite
difference approximation for different step sizes, with the step sizes differing
by a factor of about 3. Typical step sizes might be \texttt{1.e-4},
\texttt{3.e-5}, \texttt{1.e-5}, \texttt{3.e-6},$\ldots$ Now, if the finite
difference approximations stay almost the same for two consecutive step sizes,
but are very different from the analytical approximations, this is a good
indication that the analytical derivatives are incorrect.
Frequent use of the derivative checker is advisable during development of a new
application, particularly after changes that modify the objective function.
\XX{derivative checker}{invoking explicitly}
The derivative checker may also be invoked explicitly. Its function prototype is
\begin{lstlisting}[escapechar=\%]
void derch(double & f, independent_variables & x, dvector & g,
const int n, int & ireturn);
/*--------------------------------------------------------------------------*
* Purpose: compare analytical derivative calculation with finite *
* difference approximation *
* Input: *
* f : current value of function *
* x : current values of the parameters of the function; instance of *
* class independent_variables with elements ranging from 1 to n *
* g : current value of the gradient as returned by the function *
* gradcalc( ... ); instance of class dvector with elements from *
* 1 to n *
* n : number of variables *
* ireturn : return status; set ireturn = 3 prior to first call to derch() *
* Output: *
* ireturn : return status; ireturn = 4 or 5 on return from derch() *
*--------------------------------------------------------------------------*/
\end{lstlisting}
The following code fragment will display the values of the function, the
variable being checked, the analytical derivative (from \texttt{gradcalc()}),
the finite difference approximation, and the index of the variable being
checked.
\begin{lstlisting}
#include
#include
// prototype for user supplied code to compute f as a function of x
void f_comp( double & f, independent_variables & x, ... );
void main()
{
int n = ...; // number of variables in function
independent_variables x(n);
set_x( ... ); // user supplied code to set the values of x
dvector g(1,n);
gradient_structure gs;
double f = 0;
int derch_return = 3;
f_comp(f, x, ... ); // first call to get value of function for current x
while (derch_return >=3)
{
derch(f, x, g, n, derch_return);
{
f_comp(f, x, ... ); // subsequent calls for with values of x
// varied by the step size within derch()
gradcalc(n, g);
}
}
}
\end{lstlisting}
%xx \htmlnewfile
\chapter{Statistical Functions}
The principal purpose of \scAD\ is for constructing computer programs that use
mathematical derivatives. One such application is in nonlinear parameter
estimation, that is, the construction of nonlinear statistical models. It is
convenient to use simulations that produce data with known statistical
properties, in order to analyze the behavior of such models. To facilitate this
process, \scAD\ is furnished with a small number of statistical functions that
can be applied to both constant and variable vector objects.
\X{statistical functions}
\X{simulation models}
\X{nonlinear-parameter estimation}
\section{Filling vectors with random numbers}
\XX{random numbers}{filling vectors with}
\XX{filling vectors} {with random numbers}
In this section, a uniformly distributed random number is assumed to have a
uniform distribution on~$[0,1]$. A normally distributed random number is assumed
to have mean~$0$ and variance~$1$. A binomially distributed random number is
assumed to have a parameter~$p$, where~1 is returned with probability $p$, and
$0$ is returned with probability $1-p$. The \texttt{long~int\&~n} is the random
number seed specified by the user.
\X{\fontindexentry{tt}{fill\_randu}}
\X{\fontindexentry{tt}{fill\_randn}}
\X{\fontindexentry{tt}{fill\_randbi}}
\X{\fontindexentry{tt}{fill\_multinomial}}
\begin{lstlisting}
dvector::fill_randu(long int& n) // Fill a dvector from a uniform
// distribution on 0,1
dvector::fill_randn(long int& n) // Fill a dvector from a standard normal
// distribution
dvector::fill_randbi(int& n,double p) // Fill a dvector from a binomial
//distribution A 1 is returned with probability p
// A 0 is returned with probability 1-p
dvector::fill_multinomial(int& n,dvector p)// Fill a dvector from a
// multinomial distribution The distribution is determined by
// the dvector p. An i is returned with probability p(i) if the
// components of p sum to 1. Otherwise p is normalized so that
// its components sum to 1.
ivector::fill_multinomial(int& n,dvector p)// Fill a dvector from a
// multinomial distribution The distribution is determined by
// the dvector p. An i is returned with probability p(i) if the
// components of p sum to 1. Otherwise p is normalized so that
// its components sum to 1.
\end{lstlisting}
The \texttt{fill\_multinomial} routine expects to receive a \texttt{dvector} of
parameters, that is, $P=(p_1,\ldots,p_n)$. The routine fills the elements of a
\texttt{dvector} with integers whose values range from $1$ to~$n$, where $i$
occurs with probability~$p_i$. If the components of $P$ do not sum to $1$, they
will be normalized so that the components do sum to~$1$.
As an example of how to use these functions, consider the following code
segment, which generates random numbers from a normal distribution contaminated
with a small proportion of large errors. This kind of distribution is useful for
studying ``robust'' parameter estimation procedures. Robust estimation
procedures are intended to be insensitive to such deviations from the main
distribution.
\XX{random number generator}{example of the use of}
\X{robust estimation}
Assume that the main distribution has mean~20 and standard deviation~2, while
the contaminating distribution has mean~20 and standard deviation~6. We want the
probability that the random numbers belong to the main distribution to be 0.90
and the probability that they belong to the contaminating distribution to
be~0.10. We shall generate a random vector of 100 such random numbers.
\begin{lstlisting}
// ...
dvector rn(1,100);
{
dvector w(1,100);
rn.fill_randn(212); // fills rn with numbers drawn from
// a normal distribution
w.fill_randbi(1521,0.90); // fills w with numbers drawn from
// a binomial distribution
for (int i=1; i<=100;i++)
{
if (w(i)==1) // This condition will occur with probability 0.90
{
rn(i)=2*rn(i)+20; // standard deviation 2 and mean 20
}
else// This condition will occur with probability 0.10
{
rn(i)=6*rn(i)+20; // standard deviation 6 and mean 20
}
}
} // Now the dvector w goes out of scope and the memory it used
// is released...
\end{lstlisting}
\section{Generating a sample from a mixture\br of $n^2$ normal distributions}
\X{mixture of normal distributions}
\XX{multinomial fill}{of an ivector object}
This example extends the previous example to a mixture of more than two
distributions. Two interesting aspects of this example are the use of the
multinomial fill of an \texttt{ivector} object and the use of the resulting
\texttt{ivector} object in a general selection of the form
\texttt{dvector(ivector)}.
\X{operator \texttt{()}}
\XX{vector}{extracting a subvector}
\X{extracting a subvector}
\XX{vector}{function call \texttt{()} to extract subvector}
\begin{lstlisting}
int nsample; // nsample is the sample size to be generated
int n; // n is the number of groups in the mixture
dvector p(1,ngroups); // p contains the mixture proportions;
dvector mean(1,ngroups); // mean contains the mean for each group
dvector sd(1,ngroups); // sd contains the standard deviations for each group
dvector sample(1,nsample); // sample will contain the simulated data
ivector choose(1,nsample); // this will determine which group in the mixture
// each observation came from
//...
// somehow get the mixture proportions into p, the means into mean,
// and the standard deviations into sd
// ...
choose.fill_multinomial(1011,p);
sample.fill_randn(211); // fill the sample with standard normal deviates
sample=elem_prod(sd(choose),sample)+mean(choose); //This
// creates the mixture of normal distributions
\end{lstlisting}
Notice that the $i^\textrm{th}$ element of \texttt{choose} will be equal to $j$
with probability $p_j$, so each element in the sample will have a probability
$p_j$ of having \texttt{mean($j$)} as its mean and \texttt{sd($j$)} as its
standard deviation. This means that each element of \texttt{sample} will have a
probability $p_j$ of belonging to the $j^\textrm{th}$ group of normal
distributions. It follows that the elements of \texttt{sample} have the desired
mixture distribution.
\section{Functions that provide summary statistics}
\XX{\fontindexentry{tt}{mean}}{mean of a vector object}
\XX{\fontindexentry{tt}{std\_dev}}{standard deviation a vector object}
The functions
\begin{lstlisting}
number mean(vector_object);
number std_dev(vector_object);
\end{lstlisting}
\noindent return the mean and standard deviation of a \texttt{dvector} or
\texttt{dvar\_vector}.
%xx \htmlnewfile
\chapter{Robust Nonlinear Regression}
\label{ch:robust-nonlinear-regression}
Many authors have indicated that the usual method of least-squares estimation of
parameters is too sensitive to the existence of a small number of large errors,
so-called outliers, in the data being analyzed. Although an extensive theory of
robust estimators that are not so sensitive to such outliers has developed, many
people still use standard statistical packages based on least-squares estimation
for statistical problems such as multiple regression. With the \scAD\ system, it
is a simple matter to use superior robust-regression estimation procedures on
both linear and nonlinear statistical models. We should emphasize that we are
not experts in the field of robust regression and we make no claims of
optimality or near optimality for these methods. The methods in this chapter
seem to work well and they have been designed so that they can be easily
extended to large nonlinear regression problems with thousands of observations,
and at least hundreds (and perhaps thousands) of parameters. (In 16-bit
\textsc{dos} versions of \scAD, the 64K barrier may limit the size of the
problem that can be considered.)
\XX{least-squares}{parameter estimation}
\XX{robust estimation}{in nonlinear parameter estimation}
\XX{parameter estimation}{least-squares}
\XX{parameter estimation}{robust-nonlinear}
Our purpose in this chapter is to present an estimation procedure that performs
almost as well as the usual least-squares estimator when the model errors are
normally distributed, and that performs much better than least squares when the
model errors contain some large outliers. With such an estimator, you don't have
to worry about whether the use of robust techniques is justified. If your model
errors are exactly normally distributed, you have lost very little, and if they
are not normally distributed, you may have gained a lot.
The reader who is not interested in the derivation of the formulas in this
section can skip to
Section~\ref{sec:using-routines}, %xx \number\mychapno.2 check this!!
where the methods for using the routines are discussed. The necessary functions
are supplied with the \scAD\ libraries.
\section{Statistical theory for robust regression}
Assume that we have $n$ data points, $Y_i$, $i=1,\ldots,n$. The $Y_i$ are
assumed to have been produced by the process
\begin{equation}
\label{eq:robust-nonlinear-regression-00}% \eqno(\number\mychapno.0)
Y_i=f({\mbox{\boldmath $x$}}_i;\Theta)+\epsilon_i
\end{equation}
where for each $i$, ${\mbox{\boldmath $x$}}_i$ is a known vector, $\Theta$ is an
unknown vector of parameters, and the~$\epsilon_i$ are random variables. We want
to get estimates for the parameters~$\Theta$. Some common special cases are:
\begin{description}
\item[Multiple regression:]
for each $i$,
${\mbox{\boldmath $x$}}_i =(x_{i1},\ldots,x_{ir})$ is an $r$-vector and
$\Theta =(\theta_{0},\theta_{1},\ldots,\theta_{r})$ is an $(r+1)$-vector. The
functional relationship is
\begin{equation*}
f({\mbox{\boldmath $x$}}_i;\Theta)
=\theta_0+x_{i1}\theta_1+x_{i2}\theta_2+\cdots+x_{ir}\theta_r
\end{equation*}
\item[Polynomial regression:]
for each $i$, ${\mbox{\boldmath $x$}}_i =(x_i)$ is a $1$-vector and
$\Theta =(\theta_{0},\theta_{1},\ldots,\theta_{r})$ is an $(r+1)$-vector. The
functional relationship is
\begin{equation*}
f({\mbox{\boldmath $x$}}_i;\Theta)
=\theta_0+x_i\theta_1+x_i^2\theta_2+\cdots+x_i^r\theta_r
\end{equation*}
\item[A nonlinear model:]
For a simple nonlinear model, we shall consider the case where
${\mbox{\boldmath $x$}}_i$ is a 1-dimensional vector and $\Theta$ is a
1-dimensional vector whose single component is denoted by $b$. The functional
relationship is
\begin{equation}
\label{eq:robust-nonlinear-regression-01} % \eqno(\number\mychapno.1)
f({\mbox{\boldmath $x$}}_i;b)=bx_i+b^2
\end{equation}
\end{description}
The form of the estimation procedure that should be used to estimate the
parameter vector~$\Theta$ depends on the assumptions that are made about the
nature of the random variables~$\epsilon_i$. Assume that the~$\epsilon_i$ are
independent identically distributed random variables with a probability density
function~$\phi(u;\Lambda)$, where~$\Lambda$ is a vector of parameters whose
components may be known or unknown.
The joint probability density function for the $(Y_1,\ldots,Y_n)$ is given by
\begin{equation}
\label{eq:robust-nonlinear-regression-02} %\eqno(\number\mychapno.2)
\prod_{i=1}^n \phi\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta);\Lambda\big)
\end{equation}
Taking the logarithm of
equation~(\ref{eq:robust-nonlinear-regression-02}) %(\number\mychapno.2)
gives the expression
\begin{equation}
\label{eq:robust-nonlinear-regression-03} %\eqno(\number\mychapno.3)
\sum_{i=1}^n \log
\Big(\phi\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta);\Lambda\big)\,\Big)
\end{equation}
Considered as a function of the parameters $\Theta$ and $\Lambda$,
expression~(\ref{eq:robust-nonlinear-regression-03}) % $(\number\mychapno.3)$
is known as the log-likelihood function. The maximum likelihood estimates for
the parameters $\Theta$ and $\Lambda$ are the values of parameters that maximize
the likelihood or its equivalent, the log-likelihood function.
If we assume that the $\epsilon_i$ are normally distributed with mean~$0$ and
standard deviation~$\sigma$, then
\begin{equation*}
\phi(u;\Lambda)= \frac{1}{\sqrt{2\pi}\sigma}
\,\exp\left(-\frac{u^2}{2\sigma^2}\right)
\end{equation*}
and the log-likelihood function becomes
\begin{equation}
\label{eq:robust-nonlinear-regression-04} % \eqno(\number\mychapno.4)
-n\log(\sigma)-\frac{1}{2\sigma^2}
\sum_{i=1}^n\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^2
\end{equation}
In this case, $\Lambda=(\sigma)$ is a 1-dimensional vector.
The maximum likelihood estimates $\widehat\Theta$ for the parameters $\Theta$
that maximize
equation~(\ref{eq:robust-nonlinear-regression-04}) %(\number\mychapno.4)
are the same as the those values of the parameters that minimize
\begin{equation}
\label{eq:robust-nonlinear-regression-05} % \eqno(\number\mychapno.5)
\sum_{i=1}^n\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^2
\end{equation}
Since these parameter estimates minimize the squared difference between the
predicted and observed values, they are known as the least-squares estimates.
For normally distributed model errors, the maximum likelihood estimates and the
least-squares estimates coincide. Since the
expression~(\ref{eq:robust-nonlinear-regression-05}) % (\number\mychapno.5)
does not depend on~$\sigma$, the maximum-likelihood estimate for~$\Theta$ does
not depend on the estimate for~$\sigma$. This is a special property shared by
the normal distribution and some other distributions. It is does not hold in
general. The maximum likelihood estimate~$\hat\sigma$ for the standard
deviation~$\sigma$ is given by
\begin{equation}
\label{eq:robust-nonlinear-regression-06} % \eqno(\number\mychapno.6)
\hat\sigma=\sqrt{\frac{1}{n}
\sum_{i=1}^n\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^2}
\end{equation}
\XX{maximum likelihood estimation}{for robust regression}
The reason that the normal least-squares estimation is so sensitive to outliers
is that the normal distribution has a small ``tail,'' so the probability of
occurrence of an event drops off very quickly as one moves away from the mean a
distance of a few standard deviations. One approach to develop more robust
estimation methods is to use distributions with fatter tails. A distribution
with a fatter tail than the normal distribution's is the double exponential
distribution whose probability density function (assuming the mean is~0) is
given by:
\X{two-sided exponential distribution}
\begin{equation}
\label{eq:robust-nonlinear-regression-07} % \eqno(\number\mychapno.7)
\frac{1}{2\sigma}
\exp\left|-\frac{u}{\sigma}\right|
\end{equation}
The corresponding log-likelihood function is
\begin{equation}
\label{eq:robust-nonlinear-regression-08} % \eqno(\number\mychapno.8)
-n\log(\sigma)-\frac{1}{\sigma}
\sum_{i=1}^n\big|Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big|
\end{equation}
The values of the parameters $\Theta$ that maximize
equation~(\ref{eq:robust-nonlinear-regression-08}) % (\number\mychapno.8)
are the same as the values that minimize the sum
\begin{equation}
\label{eq:robust-nonlinear-regression-09} % \eqno(\number\mychapno.9)
\sum_{i=1}^n\big|Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big|
\end{equation}
Since equation~(\ref{eq:robust-nonlinear-regression-09}) % (\number\mychapno.9)
is the total absolute deviation between the predicted and observed values, these
estimates are known as the minimum total absolute deviation estimates for the
parameters.
While the minimum total absolute deviation estimates are fairly insensitive to a
small number of large model errors, they do not perform very well when the model
errors are normally distributed. To develop a maximum likelihood estimation
procedure that is less sensitive to outliers, but that also performs well when
the model errors are normally distributed, we have employed a distribution that
is a mixture of a normal distribution and another distribution with a fatter
tail. The probability density functions of the fat-tailed distribution is given
by:
\begin{equation}
\label{eq:robust-nonlinear-regression-10} % \eqno(\number\mychapno.10)
\frac{\sqrt{2}}{\pi\sigma e}\left\{1+\frac{x^4}{{(\sigma e)}^4}\right\}^{-1}
\end{equation}
The extra parameter $e$ has been added to the log-likelihood
function~(\ref{eq:robust-nonlinear-regression-10}) % $(\number\mychapno.10)
to adjust the ``spread'' of the fat-tailed distribution. The value of $e$ has
been set equal to~3. A mixture of two random variables with mixing proportion
$p$ is formed by choosing one of the random variables with probability~$p$ and
the other random variable with probability~$1-p$. If the probability density
functions of the mixture components are $\psi(u)$ and $\chi(u)$, then the
probability density function of the mixture is $p\psi(u)+(1-p)\chi(u)$. We shall
assume that the model errors $\epsilon_i$ are a mixture of the
distribution~(\ref{eq:robust-nonlinear-regression-10}) % (\number\mychapno.10)
with probability~0.05 and a normal distribution with probability~0.95. The
quantity $0.05$ can be interpreted as the proportion of observations
contaminated by large errors.
The corresponding log-likelihood function for the observations is
\begin{align}
\label{eq:robust-nonlinear-regression-11} % \eqno(\number\mychapno.11)
\nonumber -n\ln(\sigma)+\sum_{i=1}^n \log n
& \Bigg[
(1-p)\exp\bigg\{-\frac{\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^2}
{2\sigma^2}\bigg\}\\
& \quad + \frac{2p}{\sqrt{\pi} e}
\bigg\{1+\frac{\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^4}
{(e\sigma)^4}\bigg\}^{-1}
\Bigg]
\end{align}
The parameter estimates $\widetilde\Theta$ and $\tilde\sigma$ for $\Theta$ and
$\sigma$, obtained by maximizing
equation~(\ref{eq:robust-nonlinear-regression-05}), % $(\number\mychapno.5)$
will be referred to as the ``robust-mixture estimator'' for the parameters. For
the robust-mixture estimator, the estimate for $\Theta$ is not independent of
the estimate for $\sigma$, so to estimate the parameters, the entire
expression~(\ref{eq:robust-nonlinear-regression-11}) % (\number\mychapno.11)
must be maximized for $\Theta$ and $\sigma$ simultaneously.
While it is true that the robust parameter estimates are the values of the
parameters that maximize equation~(\ref{eq:robust-nonlinear-regression-11}), it
is possible to reformulate the problem slightly so it is posed in terms of
parameters more easily estimated. Such reparameterizations are often useful in
nonlinear optimization. Consider the log-likelihood
function~(\ref{eq:robust-nonlinear-regression-11}). % (\number\mychapno.11)
If~$p$ is set equal to zero, the log-likelihood reduces to the normal case,
expression~(\ref{eq:robust-nonlinear-regression-04}). So, the value of the
parameter~$\sigma$ that would maximize
equation~(\ref{eq:robust-nonlinear-regression-11}) when~$p=0$ is~$\hat\sigma$,
given by
equation~(\ref{eq:robust-nonlinear-regression-06}). % (\number\mychapno.6)
Define a new parameter $\alpha$ by $\alpha=\sigma/\hat\sigma$ such that
$\sigma=\alpha\hat\sigma$. Replace the parameter~$\sigma$ in
equation~(\ref{eq:robust-nonlinear-regression-11}) % (\number\mychapno.11)
by the parameter~$\alpha\hat\sigma$. The log-likelihood function now takes the
form
\begin{align}
\label{eq:robust-nonlinear-regression-12} % \eqno(\number\mychapno.12)
\nonumber
n\ln(\alpha\hat\sigma)-\sum_{i=1}^n \log
& \Bigg[
(1-p)\exp\bigg\{-\frac{\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^2}
{2(\alpha\hat\sigma)^2}\bigg\}\\
&\quad + \frac{2p}{\sqrt{\pi}e}
\bigg\{1+\frac{\big(Y_i-f({\mbox{\boldmath $x$}}_i;\Theta)\big)^4}
{(e\alpha\hat\sigma)^4}\bigg\}^{-1}
\Bigg]
\end{align}
The parameter~$\alpha$ can be interpreted as an estimate of how much of the
residuals in the model fit are contributed by the large outliers generated by
the contaminating distribution. If~$\alpha$ is close to~$1$, most of the
contribution to the residuals comes from the ``ordinary'' errors associated with
the normal distribution. If~$\alpha$ is close to~$0$, most of the contribution
to the residuals comes from the large outliers associated with the fat-tailed
distribution. We expect that the maximum likelihood estimate~$\tilde \alpha$
for~$\alpha$ will lie between~$0$ and~$1$. The parameter~$\alpha$ can also be
interpreted as a ``cutoff'' switch. As~$\alpha$ is made smaller, the influence
of the larger residuals on the parameter estimates will be reduced. This is much
like looking at the data and removing the data points that fit the model badly.
\X{outliers}{in robust regression}
The code for the log-likelihood
function~(\ref{eq:robust-nonlinear-regression-12}) % \number\mychapno.12
can be written down very concisely using \scAD's vector operations. Suppose that
the observations $Y_i$ are contained in a \texttt{dvector} called \texttt{obs},
and the predicted values $f({\mbox{\boldmath $x$}}_i;\Theta)$ are contained in a
\texttt{dvar\_vector} called \texttt{pred}. The vector of residuals $Y_i
-f({\mbox{\boldmath $x$}}_i;\Theta)$ and the vector of squared residuals can be
calculated in two statements:
\begin{lstlisting}
dvar_vector diff = obs-pred; // These are the residuals
dvar_vector diff2 = pow(diff,2); // These are the squared residuals
v_hat = mean(diff2);
\end{lstlisting}
The estimate of the variance is given by the mean of the vector of squared
residuals. If $\texttt{b}=2p/\sqrt{\pi}e$ and $\texttt{a2}=2\alpha$, the code
for equation~(\ref{eq:robust-nonlinear-regression-12}) % \number\mychapno.12
can be written as
\begin{lstlisting}
dvariable log_likelihood = 0.5*diff.size()*log(a2*v_hat)
-sum(log((1.-pcon)*exp(-diff2/(2*a2*v_hat))
+ b/(1.+pow(diff2/(a2*v_hat),2))));
\end{lstlisting}
\XX{vector operations}{example of}
\XX{vector operations}{to reduce the amount of temporary storage required}
While the combination of scalars and vectors in statements like \texttt{1.+v}
can seem a bit strange at first, one quickly gets used ot it. The main thing to
consider is that some combinations of operations are not associative, so it may
be necessary to include parentheses to ensure that the operations are carried
out in the correct order. As is often the case, using vector operations instead
of writing the code in terms of the components of the vector objects will
greatly reduce the amount of temporary storage required.
As the examples below illustrate, the parameter $\alpha$ is often not
well-determined by the data. Rather than estimating this parameter directly, we
feel that the best way to do robust regression is probably to use a fixed value
for $\alpha$. We recommend the value~$0.7$ for fitting the model, although we
have no strong reason for preferring this particular value.
\section{Using the \scAD\ robust\br nonlinear regression routines}
\label{sec:using-routines}
The \scAD\ system permits the user to obtain these robust regression estimates
in a simple fashion. Suppose that the observations $Y_i$ are contained in a
\texttt{dvector} called \texttt{OBS\_Y} and that the values predicted by the
model for these observations have been computed and are contained in a
\texttt{dvar\_array} called \texttt{PRED\_Y}. To compute the value of the
log-likelihood
function~(\ref{eq:robust-nonlinear-regression-05}) % (\number\mychapno.5)
with a fixed value of $\alpha$, the routine \texttt{robust\_regression} is used.
The function prototype for this routine is:
\begin{lstlisting}
dvariable robust_regression(dvector& OBS_Y,dvar_vector& PRED_Y,double& alpha);
\end{lstlisting}
The function \texttt{robust\_regression} returns minus the value of the
log-likelihood function. (The sign has been changed for the function minimizer.)
\begin{figure}
\centering\hskip1pt
\input graph.tex
\emptycaption{}
\label{fig:robust-nonlinear-regression-01} % \number\mychapno.1
\end{figure}
To estimate the parameter $\alpha$ as well, the parameter must be passed as a
\texttt{dvariable}. The function prototype is
\begin{lstlisting}[escapechar=\%]
dvariable %
%robust_regression(dvector& OBS_Y,dvar_vector& PRED_Y,dvariable& alpha);
\end{lstlisting}
\XX{bounding functions}{\fontindexentry{tt}{boundp}}
\XX{bounding functions}{\fontindexentry{tt}{boundpin}}
For the routine \texttt{robust\_regression} to work properly when the parameter
$\alpha$ is being estimated, it is necessary to put bounds on the values that
$\alpha$ can assume. This is done by using the \texttt{boundp} function supplied
with \scAD. The parameter $\alpha$ must be restricted such that $0\alpha>0.4$ and another group for $0.3>\alpha>0.1$. Constructing such a
table could be a possible way to search for ``interesting'' structure in the
outliers in a large nonlinear model.
\begin{table}[!h]
\begin{tabular}{c @{\qquad}*{9}{r}}
\hline\\[-11pt]
\hline\\[-9.0pt]
$\alpha$&0.9&0.8&0.7&0.6&0.5&0.4&0.3&0.2&0.1 \\
\cline{2-10}\\[-9.0pt]
0.9&0.0&\\%&0.3&1.0&1.9&2.7&3.4&11.6&14.0&15.9\cr
0.8&0.3&0.0&\\%0.6&1.6&2.4&3.1&11.9&14.3&16.2\cr
0.7&1.0&0.6&0.0&\\%0.9&1.7&2.4&12.5&14.9&16.9\cr
0.6&1.9&1.6&0.9&0.0&\\%0.7&1.5&13.4&15.8&17.7\cr
0.5&2.7&2.4&1.7&0.7&0.0&\\%0.7&14.1&16.5&18.4\cr
0.4&3.4&3.1&2.4&1.5&0.7&0.0&\\%14.7&17.1&19.0\cr
0.3&11.6&11.9&12.5&13.4&14.1&14.7&0.0&\\%2.4&4.3\cr
0.2&14.0&14.3&14.9&15.8&16.5&17.1&2.4&0.0&\\%1.9\cr
0.1&15.9&16.2&16.9&17.7&18.4&19.0&4.3&1.9&0.0\\
%0.9&0.000&0.311&1.007&1.940&2.733&3.478&11.614&14.041&15.985\cr
%0.&0.311&0.000&0.696&1.629&2.423&3.171&11.905&14.332&16.277\cr
%0.&1.007&0.696&0.000&0.934&1.729&2.487&12.557&14.984&16.927\cr
%0.&1.940&1.629&0.934&0.000&0.799&1.574&13.424&15.848&17.790\cr
%0.&2.733&2.423&1.729&0.799&0.000&0.790&14.132&16.553&18.492\cr
%0.&3.478&3.171&2.487&1.574&0.790&0.000&14.723&17.136&19.070\cr
%0.&11.614&11.905&12.557&13.424&14.132&14.723&0.000&2.427&4.371\cr
%0.&14.041&14.332&14.984&15.848&16.553&17.136&2.427&0.000&1.944\cr
%0.&15.985&16.277&16.927&17.790&18.492&19.070&4.371&1.944&0.000\cr
\hline
\end{tabular}
\caption{Euclidean distance between the vectors of residuals for the
robust-mixture estimator for different values of $\alpha$.}
\label{tab:robust-nonlinear-regression-02}
\end{table}
%xx \htmlnewfile
\chapter{Problems with More than\br One Dependent Variable}
\label{ch:problems-more-dependent-variable}
\X{several dependent variables}
\XX{several dependent variables}{use of the operator \texttt{<{}<}}
\section{Using more than one dependent variable}
'Till now, all the problems considered have had only one dependent variable.
Many problems of interest, such as the problem of solving a system of nonlinear
equations, require the computation of derivatives with respect to more than one
dependent variable.
The operator \texttt{<{}<} has been overloaded to implement more than one
dependent variable. The user simply writes something like
\begin{lstlisting}
u << -exp(x(1))+y; // u is a dependent variable
\end{lstlisting}
where \texttt{u} is a \texttt{dvariable} and will become a dependent variable
for which derivatives can be calculated. The operator \texttt{<{}<} acts like
the equals sign (\texttt{=}), with the additional property that \texttt{u} is
made a dependent variable. Of course, it is possible to use \texttt{u} in later
calculations, such~as
\begin{lstlisting}
u << -exp(x(1))+y; // u is a dependent variable
// ...
v << u*u; // v is a dependent variable
\end{lstlisting}
The process of declaring dependent variables in this way is global---that is, it
does not have to be done in one function. The dependent variables are ordered in
the order in which they are encountered when the code is executed.
The default maximum number of dependent variables (which is at least 10) depends
on the implementation of \scAD\ you are using. If desired, it can be increased
by using the static member function
\XX{several dependent variables}{default number of}
\XX{dependent variables}{\fontindexentry{tt}%
{set\_NUM\_DEPENDENT\_VARIABLES(int i);}}
\X{\fontindexentry{tt}{set\_NUM\_DEPENDENT\_VARIABLES(int i);}}
\begin{lstlisting}
gradient_structure::set_NUM_DEPENDENT_VARIABLES(int i);
\end{lstlisting}
For example, to set the maximum number of dependent variables to~25, you would
put the lines
\begin{lstlisting}
gradient_structure::set_NUM_DEPENDENT_VARIABLES(25);
// ... must occur before the gradient_structure declaration
gradient_structure gs;
\end{lstlisting}
into your code. The derivatives are calculated by calling the function
\begin{lstlisting}
jacobcalc(int\& nvar,dmatrix\& jac)
\end{lstlisting}
The derivatives are put into the \texttt{dmatrix} called \texttt{jac}. The
minimum valid row index of \texttt{jac} must be~1, while the maximum valid row
index must be greater than or equal to the number of dependent variables. Each
row of \texttt{jac} must have a minimum valid index of~1 and a maximum valid
index greater than or equal to \texttt{nvar}, where \texttt{nvar} is the number
of independent variables.
\X{\fontindexentry{tt}{gradient\_structure}}
\X{\fontindexentry{tt}{jacobcalc}}
\XX{finding roots of equations}{Newton-Raphson method}
\XX{Newton-Raphson method}{roots of equations}
\XX{Newton-Raphson method}{example of}
\section {Finding roots of systems of\br equations with Newton-Raphson}
\label{sec:finding-roots}
Suppose that you wish to solve the nonlinear system of equations
\begin{align}\label{eq:problems-more-dependent-variable-01}
\nonumber %\eqno(\number\mychapno.1)
f_1(x_1,\ldots,x_n)&=0\\ \nonumber
f_2(x_1,\ldots,x_n)&=0\\
f_2(x_1,\ldots,x_n)&=0\\ \nonumber
& \,\,\,\vdots\\ \nonumber
f_n(x_1,\ldots,x_n)&=0\\ \nonumber
\end{align}
In general, this is a very difficult problem. However, under suitable regularity
conditions, if you can find an initial estimate for $(x_1,\ldots,x_n)$ that is
close enough, then the iterative Newton-Raphson technique will efficiently
converge to the solution. Let $X$ denote the vector $(x_1,\ldots,x_n)$. Let
\begin{equation*}
J(X)=\frac{\partial f_i(X)}{\partial x_j}
\end{equation*}
be the Jacobian matrix. Then to first order,
\begin{equation}
\label{eq:problems-more-dependent-variable-02} % \eqno(\number\mychapno.2)
f(X+\delta)=f(X)+J(X)\delta,
\end{equation}
where $f(X)$ denotes the vector $f_1(X),\ldots,f_n(X)$ and $J(X)\delta$ denotes
the multiplication of the vector $\delta$ by the matrix $J(X)$. Equating
(\ref{eq:problems-more-dependent-variable-02}) %(\number\mychapno.2)
to zero and solving for $\delta$ yields
\begin{equation}
\label{eq:problems-more-dependent-variable-03} %\eqno(\number\mychapno.3)
\delta=-J(X)^{-1}f(X)
\end{equation}
Now we proceed iteratively. If $X^{(r)}$ is the current estimate for $X$, let
\begin{equation*}
X^{(r+1)}=X^{(r)} +\delta
\end{equation*}
be the next estimate where $\delta$ is defined
by~(\ref{eq:problems-more-dependent-variable-03}). % (\number\mychapno.3).
\section{Implementation of the Newton-Raphson technique}
This example shows the \nr\ technique for two equations.
\input newt.cpp
We have chosen a rather ``degenerate'' example where the Jacobian matrix is
almost singular at the solution. The two functions are
\begin{align*}
f_1(x_1,x_2)&=100x_1^2+x_2^2-1 \\[6pt]
f_2(x_1,x_2)&=100x_1^2+x_2^2-1+.0001-.0001x_1^4+.01x_2^2
\end{align*}
The calculation of the Jacobian and the Newton-Raphson ``update'' step can be
accomplished with two lines of code:
\begin{lstlisting}
jacobcalc(nvar,jacobian)
x-=inv(jacobian)*f
\end{lstlisting}
The iteration continues until the norm of the \texttt{dvector} called \texttt{f}
is less than 1.0\e{-12}.
The only variable types occur in the user's function \texttt{function}. All the
arithmetic for the Newton-Raphson update is done with vectors and matrices that
are of constant type. The \texttt{value} function is used to turn the
\texttt{dvar\_vector} called \texttt{f} in \texttt{function} into a
\texttt{dvector} object, so it can be returned to the calling routine.
%xx \htmlnewfile
\chapter{A Complete Nonlinear Parameter Estimation Program}
\label{nonlinear-estimation}
A major difference between linear and nonlinear statistical modeling is the
inherent instability of nonlinear models. It seems to be an empirical fact that
for any reasonably complicated model, if you simply write correct naive computer
code for estimating the parameters, the program will not work. By ``work,'' we
mean you need to get the information you want out of the data you are trying to
analyze. In general, there are two reasons for this behavior:
\begin{enumerate}
\item The function being calculated is numerically unstable, so a zero divide
or a floating point overflow will occur. For example, calculate~$1/x$ when~$x$
has the value~$0$, or calculate~$\exp(x)$ when~$x$ has the value~$90000$.
\item The ``correct'' values of the parameter estimates occur at a local
minimum of the objective function, not at a global minimum.
\end{enumerate}
The problem we consider in this chapter exhibits both kinds of pathology
mentioned above. We will describe the techniques necessary for producing a
stable, controllable nonlinear parameter estimation routine for solving this
problem. The code for this example is included on the \scAD\ distribution disk.
\section{Finite mixture problems}
\label{sec:finite-mixture}
Consider a set of~$m$~random variables~$X_j$, indexed by~$j$, with probability
density functions~$F_j(x)$. A new random variable~$Y$ can be created from
the~$X_j$ as follows. A realization of~$Y$ is produced by first picking one of
the~$X_j$ at random, with probability~$p_j$, and then taking a realization of
the resulting~$X_j$. The random variable~$Y$ is a finite mixture of the~$X_j$,
with mixing proportions $p_j$. The probability density function for~$Y_j$ is
given by~$\sum_{j=1}^m p_jF_j(x)$. The particular finite mixture that we shall
address is a mixture of normal distributions. Effective techniques for
estimating parameters for this problem were first considered by
Hasselblad~\cite{hasselblad1966}.
Assume that $X_j$ is normally distributed with mean~$\mu_j$ and standard
deviation~$\sigma_j$. The probability density function for~$X_j$ is given by
\begin{equation*}
\frac{1}{\sqrt{2\pi}\sigma_j}\,
\exp\left(\, \frac{-(x-\mu_j)^2}{2\sigma_j^2}\right)
\end{equation*}
while the probability density function for the mixture~$Y$ is given by:
\begin{equation*}
\sum_{j=1}^m \frac{p_j}{\sqrt{2\pi}\sigma_j}
\,\exp\left(\frac{-(x-\mu_j)^2}{2\sigma_j^2}\right)
\end{equation*}
Now suppose that we are given a vector $(y_1,\ldots,y_n)$ of $n$ realizations of
the random variable~$Y$. We wish to use the information in the~$y_i$ (and any
other auxiliary information that may be at our disposal) to estimate the
parameters~$p_j$,~$\mu_j$, and~$\sigma_j$.
In general, there are many kinds of auxiliary information that may be present in
such mixture problems. For example, we may or may not know the number of groups
present in the mixture. We may know some or all of the means,~$\mu_j$, or we may
have some knowledge of the approximate value of some or all of the~$\mu_j$. We
may know the value of some or all of the~$\sigma_j$, or we may know that all
the~$\sigma_j$ have the same value. A general routine for estimating the
parameters in mixture problems should be able, as much as possible, to
incorporate auxiliary information about the parameters into the estimation
process.
The maximum likelihood estimates for the parameters are found by finding the
correct local maximum of the log-likelihood function
\begin{equation}\label{eq:nonlinear-estimation-01} % \eqno(\number\mychapno.1
\max_{p,\mu,\sigma}
\Bigg\{
\sum_{i=1}^n \,\log
\Bigg[\,
\sum_{j=1}^m \frac{p_j}{\sqrt{2\pi}\sigma_j}
\,\exp
\left(
\frac{-(y_i-\mu_j)^2}{2\sigma_j^2}
\right)\
\Bigg]
\Bigg\}
\end{equation}
subject to any constraints on the parameters that may be appropriate for the
problem.
For mixtures of normal distributions, the correct solution never occurs at the
global maximum of the log-likelihood function when there is more than one group
in the mixture. This is due to the fact that it is always possible to make the
log-likelihood function as large as desired, by setting the mean of one of the
groups equal to the value of an observation and letting the corresponding
standard deviation parameter tend to zero. The value of the probability density
function for this group will then tend to infinity at the point in question. It
follows that to find the correct solution, the values of the standard deviations
should be bounded away from zero.
The following code calculates the likelihood function corresponding to
equation~(\ref{eq:nonlinear-estimation-01}). Notice that none of the constraints
that we intend to place on permissible parameter values appears in the code. The
constraint conditions will be implemented elsewhere. Avoiding them at this stage
produces a clean piece of code that can easily be seen to correspond to
equation~(\ref{eq:nonlinear-estimation-01}). The constant terms $1/\sqrt{2\pi}$
have been ignored.
\input like.cpp
The term \texttt{1.e-20} is put in to improve the numerical stability of the
program. The routine \texttt{log\_likelihood} is intended to be used with a
function-minimizing routine to estimate the parameters. In the course of such a
process, the function minimizer invariably will pick some parameters that fit
certain observations extremely badly. An extremely bad fit is characterized by
the value of \texttt{diff} being large for all groups. As a result, due to
floating point underflow,
$\exp(-\texttt{diff}*\texttt{diff}/(2.*\texttt{v(j)}))$ can be equal to zero for
all groups, so \texttt{sum} has the value zero. Taking the logarithm of~$0$
would cause a floating point exception or error. Adding the small value
\texttt{1.e-20} allows the program to carry on gracefully.
It should be noted that Hasselblad grouped the observed data into size classes
and analyzed the grouped data. This can easily be done---and should be done---if
the number of observations is large, because the analysis of the grouped data
can be much faster.
\section{Putting bounds on parameter values}
\X{\fontindexentry{tt}{boundp}}
\XX{bounding functions}{\fontindexentry{tt}{boundp}}
\X{penalty function}
The function minimizer routine deals with a vector \texttt{x} of numbers. The
likelihood calculation routine \texttt{log\_likelihood} deals with vectors of
proportions, means, and standard deviations. We need a routine to ``put'' the
$x$s into the model.
\input reset.cpp
\X{\fontindexentry{tt}{boundp}}
The function \texttt{reset} not only puts the $x$s into the parameters of the
function \texttt{like\_fun}, it also determines what parameters are active
(being estimated at the time), by examining the \texttt{ivector control}, and
puts constraints on the values of the parameters through the function
\texttt{boundp}. The function
\begin{lstlisting}
dvariable boundp(dvariable x,double fmin,double fmax,dvariable& fpen)
\end{lstlisting}
\X{penalty function}
bounds the variable \texttt{x} to the closed interval
$[$\texttt{fmin, fmax}$]$.
The term \texttt{fpen} is a penalty term that modifies the behavior of the
function. It must be added to the function being minimized. \textit{Note that
this means that if you want to maximize a function, you must change the sign
of the function before adding} \texttt{fpen} \textit{to it!} The function
\texttt{reset} supplies complete control over what parameters are active,
through the vector \texttt{control}. For example, if \texttt{control(1)} is
equal to~1 while \texttt{control(2)} and \texttt{control(3)} are equal to~0,
then only the proportions at age will be estimated, while the means and standard
deviations will be kept fixed. The \texttt{dvector}s called \texttt{mumin} and
\texttt{mumax} determine lower and upper bounds on the means, while the
\texttt{dvector}s \texttt{sdmin} and \texttt{sdmax} determine lower and upper
bounds on the standard deviations.
Suppose that you want the standard deviations to be the same for all the groups.
This can be accomplished easily by modifying the last part of \texttt{reset} to
\begin{lstlisting}
if (control(3)>0) // control(3) determines whether the sd's are active
{
for (int j=1;j<=ngroups;j++)
{
sd(j)=boundp(x(ii),sdmin(j),sdmax(j),zpen); //Note that ii is not
// incremented
}
ii++; // Now increment ii
}
\end{lstlisting}
Another modification would be to enable individual mean or standard deviations
to be fixed or active.
\section{Getting the initial \texttt{x} vector}
\X{\fontindexentry{tt}{boundpin}}
\XX{bounding functions}{\fontindexentry{tt}{boundpin}}
It should be clear that the vector \texttt{x} is a transitory object associated
with the minimization routine. Without the function \texttt{reset}, it is not
even clear to what model parameter a particular component of \texttt{x}
corresponds or what value of the parameter it produces. For this reason, it does
not makes sense to save the values of \texttt{x} during or after the analysis.
It is the model parameters \texttt{p}, \texttt{mu}, and \texttt{sd} that should
be saved. However, this raises a question. Given the values of the model
parameters, the values of the control vector, and the values of the bounds on
the parameters, what should the initial values of the \texttt{x} vector be? This
problem is solved by the functions \texttt{boundpin} and \texttt{xinit}. \input
xinit.cpp
The control structure of \texttt{xinit} exactly parallels that of
\texttt{reset}, so no matter what switch combinations or bounds are used, the
correct values will be put into the components of~\texttt{x}. The function
\begin{lstlisting}
double boundpin(double x,double fmin,double fmax)
\end{lstlisting}
is the inverse of the function \texttt{boundp}. Notice that the control vector,
\texttt{control}, determines how many active parameters there are.
The job of the function \texttt{nvarcal} is to use this information to calculate
the number of active parameters.
\input nvarcal.cpp
The three functions \texttt{reset}, \texttt{xinit}, and \texttt{nvarcal} all
share a common control structure and make it possible to coordinate the various
user options in the estimation routine.
Here is the code for the main routine in the mixture analysis program. The data
are assumed to be in a file named \texttt{mixture.dat}. The initial values for
the parameters are assumed to be in a file named \texttt{mixture.par}. \input
mixture.cpp
The job of the function \texttt{fcomp} is to put the \texttt{x} values into the
model by calling the function \texttt{reset}, and then to evaluate the
log-likelihood by calling the function \texttt{log\_likelihood}.
\input fcomp_m.cpp
Notice that while the parameterization of the mixture proportions,~\texttt{p},
has restricted the proportions to lie between~$0$ and~$1$, the proportions have
not been restricted, so they sum to~1. The function \texttt{normalize\_p}
restricts the~\texttt{p} so that they sum to~$1$.
\input normaliz.cpp
\X{penalty function}
The penalty $1000.*\log(\texttt{psum})*log(\texttt{psum})$ has been added to the
function being minimized, to remove the degeneracy in the parameterization of
the~\texttt{p}. Without this penalty, the minimizing value would be independent
of the value of \texttt{psum}. Including the penalty ensures that the minimum
value will occur for the value $\texttt{psum}=1$.
\section{Saving the parameter estimates}
\scAD\ uses the same memory locations for the derivative calculations as are
used for holding the variable objects. As a consequence, after the derivatives
have been calculated, the current value of the parameter estimates has been
lost. In order to save the results after the minimization has been carried out,
it is necessary to calculate the parameters one more time and print them into a
file. The simplest way to do this is to use the code for \texttt{fcomp} that
already exists, and to put a switch into \texttt{fcomp} so that saving the
parameters is enabled. The parameters are written into the file
\texttt{mixture.par} by the routine \texttt{save\_pars}.
\input savepar.cpp
%xx \htmlnewfile
\chapter{A Neural Network}\label{ch:neural-networks}
\X{neural net}
\XX{neural net}{feed-forward}
This example illustrates the use of \scAD\thinspace's 2- and 3-dimensional
ragged arrays.
\section{Description of the neural network}
\label{sec:description-nn}
An $n$-layer feed-forward neural network can be viewed as a process that, given
a set of $m_1$ inputs $(x_{11},\ldots,x_{1m_1})$ at layer $1$, produces a set of
$m_n$ outputs $(x_{n1},\ldots,x_{nm_n})$ at layer $n$. Level~$1$ is referred to
as the ``input layer.'' Level~$n$ is referred to as the ``output layer.''
Level~$2$ through Level~$n-1$ are referred to as ``hidden layers.'' The
quantities $x_{ij}$ are referred to as ``nodes.''
\XX{neural net}{input layer}
\XX{neural net}{output layer}
\XX{neural net}{nodes}
In a feed-forward neural network, the values of the nodes in each layer are
determined by the values in the previous layer via the relationship
\begin{equation}\label{eq:neural-network-01} % \eqno(\number\mychapno.1)
x_{i+1,j}=\Phi\left(\sum_{k=1}^{m_{i-1}}w_{ijk}x_{ik}+b_{i+1,j}\right)
\end{equation}
The function $\Phi$ is referred to as a ``squashing function.'' To the best of
our knowledge, the particular form of $\Phi$ does not seem to be especially
important, so long as it is a continuously differentiable monotone increasing
function that maps $(-\infty,\infty)$ into some desired bounded set $(a,b)$. The
bounded set is often of the form $(-0.5,0.5)$ or~$(0,1)$. The $w_{ijk}$ are
referred to as ``weights,'' while the $b_{ij}$ are called ``bias terms.''
\XX{neural net}{squashing function}
\XX{neural net}{bias terms}
The shape or topology of the neural net is completely determined by the number
of levels, $n$, and the number of nodes in each level, $m_i$. Assume that the
$m_i$ are contained in an \texttt{ivector} object \texttt{num\_nodes}.
\section{Implementation of the neural network}
\X{2-dimensional ragged array}
The $x_{ij}$ can be viewed as a 2-dimensional ragged array with $n$~rows and
$m_i$~elements in the $i^\textrm{th}$~row. Such an array can be created by the
declaration
\X{ragged matrix}
\begin{lstlisting}
dmatrix x(1,num_levels,1,num_nodes); // these are the nodes
\end{lstlisting}
The bias terms $b_{ij}$ can also be viewed as a ragged matrix \texttt{B} with
$n-1$ rows. The declaration for~\texttt{B} requires a slight modification of the
\texttt{ivector} called \texttt{num\_nodes} to reflect the fact that \texttt{B}
does not need to contain row~1. The legal row index values for \texttt{B} should
be from~2 to~$n$.
\begin{lstlisting}
ivector iv(2,num_levels);
for (int i=2;i<=num_levels;i++)
{
iv(i)=num_nodes(i); // Put the desired elements of num_nodes into iv
}
// Now declare B
dmatrix B(2,num_levels,1,iv);
\end{lstlisting}
For each $i$, $1\le i> num_learn; // read in the number of examples to be used
// Loop through the training data
for (int k=1;k<= num_learn; k++)
{
infile >> x[1];
if (!infile)
{
cerr << "Error reading x[1] from file learning.smp\n";
exit(1);
}
infile >> correct_response;
if (!infile)
{
cerr << "Error reading correct_response from file learning.smp\n";
exit(1);
}
for (int i=1;i0$) for a point,
then that point is considered unclassified.
A typical input consists of the pair of coordinates of a point on the spiral
$(0.0618,0.1901)$ together with the triple $(0.5,-0.5,-0.5)$, which is the
desired output for this point, since it lies on the first spiral arm.
We trained neural nets of increasing complexity until a design was obtained that
had enough parameters to classify all the points in the learning sets correctly.
The neural net we chose has 6 layers with 2 input nodes, 8 nodes in each of the
4 hidden layers, and 3 output nodes. This neural net has 267 parameters to be
estimated.
\X{\fontindexentry{tt}{GRADFIL1.TMP}}
Training the neural net required about 20 hours on a 20 MHz 386 computer. The
file \texttt{GRADFIL1.TMP}, which stores the temporary information needed to
calculate the gradient, was about 3 MB in size. It was stored on a \textsc{ram}
disk. At any time, it is possible to stop the program by typing `q.' The current
parameter estimates are printed into the file \texttt{EST.PAR}. To restart the
learning process, you must first copy the file \texttt{EST.PAR} into the file
\texttt{WEIGHTS.PAR}. Then it is possible to carry out training sessions during
periods when the computer is not required.
\begin{figure}[h]
\centering\hskip1pt
\input grph6.tex
\emptycaption{}
\label{fig:description-nn-02} % \number\mychapno.2
\end{figure}
Figure~\ref{fig:description-nn-02} % \number\mychapno.2
illustrates how the neural net has classified the points in the disk. The points
in \texttt{A}, \texttt{B}, and \texttt{C} have been classified as being
associated with the corresponding spiral. The small points in \texttt{D} are
unclassified. The large points in \texttt{D} are the training set. The
unclassified points appear to mark fairly regular boundaries between the
spirals. Our solution does not seem to exhibit the ``bumps and gaps'' reported
by Touretsky and Pomerlau for the double spiral solution. Perhaps this is due to
the superiority of our minimization procedure over the method they employed. It
may also be due to the use of more parameters, combined with a small quadratic
penalty function on the parameters to introduce stability into the model. In any
event, we seem to have obtained a nice solution to the problem.
\bibliographystyle{plain}
\bibliography{autodif}
\printindex % Make the index here.
\end{document}