Now it may happen, due to a combination of poor conditioning of the
linear operator and unfortunate choices of starting guess of
the model vector
, that the denominator of the right-hand side
of (cgparameter) may be very small or vanish to numerical
accuracy. Similar circumstances can arise in the nonlinear
least-squares problem in cases of sparse or irregularly sampled data.
When such circumstances arise in practice, it may be necessary to
regularize the method by adding an additional constraint equation
to the least-squares functional (Fofs). Regularization is a
well-known technique often associated with the names of
Tikhonov and Arsénine (1976),
Levenberg (1944), and Marquardt (1963, 1970), among others.
The regularization constraint usually takes the form of a quadratic
functional of the model vector. One typical choice is
where
(often
called the damping parameter) is some small positive constant
and
is some value of the model vector that the final
solution should not deviate from too much. Other typical choices
of regularization constraint are based on differentials of the model
taking the form
, where
might be
either a simple gradient or a Laplacian of the model - assuming that
is some simple physical quantity. If
is a more
complicated vector of model parameters which is not easily given a simple
physical interpretation, then some other choice of regularization constraint
might be needed.
It is preferable to avoid using regularization if possible, because such techniques tend to modify the entire spectrum of the operator to be inverted and therefore tend to degrade resolution.
For the linear least-squares problem, I can carry the analysis
further to understand how the iteraton scheme modifies the model
estimate at each step. Assuming that the linear operator is
a matrix of dimensions
, where m and n are usually
not equal, it is helpful to use
and its transpose
to form a square, symmetric matrix and the associated eigenvalue
problem
0 & LL^T & 0
_q _q =
_q_q _q .
Assuming that the m-vectors and the n-vectors
are normalized to unity, the singular value decomposition of the
matrix
may now be written in terms of these eigenfunctions
as
L = _q=1^r _q _q _q^T.
The sum is taken over r terms, where r is the rank of .
Now, each model estimate may be expanded in terms of the
appropriate eigenfunctions according to
s^(k) = _q=1^r _q^(k) _q,
where the s are the expansion coefficients for the
kth iteration. Similarly, the data vector may also be expanded as
d = _q=1^r _q _q,
where the s are constant coefficients. Both the model vector and
the data vector expansion might in addition include a term from the null
space of
, but for simplicity I will ignore this possibility
for the present purposes.
With these definitions of the various coefficients, I find that equation (cgparameter) becomes
^(k+1) = _q=1^r _q^2(_q-_q_q^(k))^2
_q=1^r _q^4(_q-_q_q^(k))^2,
and the update equation for the s becomes
_q^(k+1) = _q^(k) + ^(k+1)_q(_q-_q_q^(k)). It follows from (alphaupdate) that the iteration process has converged when
_q^(k) = _q_q. As the coefficients approach convergence, I discover that the denominator of (Ggexpansion) can become quite small. If I assume that the eigenvectors with the largest eigenvalues converge most quickly, then after some number of iterations the main contributions to the denominator will be from the terms associated with the smallest eigenvalues, and these contributions are proportional to the fourth power of these small eigenvalues. If this happens, some type of regularization may be required to obtain useful results.
Consider the simplest type of regularization involving a model vector constraint so that the modified objective function becomes:
F_(s) = _i=1^m [d_i-N_i(s)]^2 + _j=1^n (s_j-s_j)^2. The expansion of the constraint vector is given by
B<>s = _q=1^r ¯_q _q. The equation for the modified model update is then given by
_q^(k+1) = _q^(k) + ^(k+1)[_q_q +
¯_q - (_q^2 + )_q^(k)].
Clearly, the modified iteration sequence for has converged
when
_q^(k) = _q_q + ¯_q_q^2 + ,
showing that the limiting relation is just a linear combination of
the starting value and the result for pure least-squares as seen
in (alphalimit). If the damping parameter is small, these two
values will of course be quite close. But, in general, for any
nonzero value of the damping parameter, it is inevitable that the
resolution will suffer due to the fact that the coefficients cannot
approach their optimum value (alphalimit) but are actually
constrained away from it by the regularization procedure. This is why
regularization should be avoided, or at least minimized, as much as possible.