Next: SUMMARY AND SUGGESTIONS FOR Up: Berryman: Nonlinear least squares Previous: NONLINEAR OUTPUT LEAST-SQUARES METHOD

REGULARIZATION

Now it may happen, due to a combination of poor conditioning of the linear operator ${\bf L}$ and unfortunate choices of starting guess of the model vector ${\bf s}$ , that the denominator of the right-hand side of (cgparameter) may be very small or vanish to numerical accuracy. Similar circumstances can arise in the nonlinear least-squares problem in cases of sparse or irregularly sampled data. When such circumstances arise in practice, it may be necessary to regularize the method by adding an additional constraint equation to the least-squares functional (Fofs). Regularization is a well-known technique often associated with the names of Tikhonov and Arsénine (1976), Levenberg (1944), and Marquardt (1963, 1970), among others.

The regularization constraint usually takes the form of a quadratic functional of the model vector. One typical choice is $\mu({\bf s}-\bar{\bf s})^T({\bf s}-\bar{\bf s})$ where $\mu$ (often called the damping parameter) is some small positive constant and $\bar{\bf s}$ is some value of the model vector that the final solution should not deviate from too much. Other typical choices of regularization constraint are based on differentials of the model taking the form $({\bf Ds})^T({\bf Ds})$ , where ${\bf Ds}$ might be either a simple gradient or a Laplacian of the model - assuming that ${\bf s}$ is some simple physical quantity. If ${\bf s}$ is a more complicated vector of model parameters which is not easily given a simple physical interpretation, then some other choice of regularization constraint might be needed.

It is preferable to avoid using regularization if possible, because such techniques tend to modify the entire spectrum of the operator to be inverted and therefore tend to degrade resolution.

For the linear least-squares problem, I can carry the analysis further to understand how the iteraton scheme modifies the model estimate at each step. Assuming that the linear operator ${\bf L}$ is a matrix of dimensions $m\times n$ , where m and n are usually not equal, it is helpful to use ${\bf L}$ and its transpose ${\bf L}^T$ to form a square, symmetric matrix and the associated eigenvalue problem

0 & LL^T & 0 _q _q = _q_q _q . Assuming that the m-vectors $\psi_q$ and the n-vectors $\phi_q$ are normalized to unity, the singular value decomposition of the matrix ${\bf L}$ may now be written in terms of these eigenfunctions as

L = _q=1^r _q _q _q^T. The sum is taken over r terms, where r is the rank of ${\bf L}$ .

Now, each model estimate ${\bf s}^{(k)}$ may be expanded in terms of the appropriate eigenfunctions according to

s^(k) = _q=1^r _q^(k) _q, where the $\alpha_q^{(k)}$ s are the expansion coefficients for the kth iteration. Similarly, the data vector may also be expanded as

d = _q=1^r _q _q, where the $\delta_q$ s are constant coefficients. Both the model vector and the data vector expansion might in addition include a term from the null space of ${\bf L}$ , but for simplicity I will ignore this possibility for the present purposes.

With these definitions of the various coefficients, I find that equation (cgparameter) becomes

^(k+1) = _q=1^r _q^2(_q-_q_q^(k))^2 _q=1^r _q^4(_q-_q_q^(k))^2, and the update equation for the $\alpha_q$ s becomes

_q^(k+1) = _q^(k) + ^(k+1)_q(_q-_q_q^(k)). It follows from (alphaupdate) that the iteration process has converged when

_q^(k) = _q_q. As the coefficients approach convergence, I discover that the denominator of (Ggexpansion) can become quite small. If I assume that the eigenvectors with the largest eigenvalues converge most quickly, then after some number of iterations the main contributions to the denominator will be from the terms associated with the smallest eigenvalues, and these contributions are proportional to the fourth power of these small eigenvalues. If this happens, some type of regularization may be required to obtain useful results.

Consider the simplest type of regularization involving a model vector constraint so that the modified objective function becomes:

F_(s) = _i=1^m [d_i-N_i(s)]^2 + _j=1^n (s_j-s_j)^2. The expansion of the constraint vector is given by

B<>s = _q=1^r ¯_q _q. The equation for the modified model update is then given by

_q^(k+1) = _q^(k) + ^(k+1)[_q_q + ¯_q - (_q^2 + )_q^(k)]. Clearly, the modified iteration sequence for $\alpha_q$ has converged when

_q^(k) = _q_q + ¯_q_q^2 + , showing that the limiting relation is just a linear combination of the starting value and the result for pure least-squares as seen in (alphalimit). If the damping parameter $\mu$ is small, these two values will of course be quite close. But, in general, for any nonzero value of the damping parameter, it is inevitable that the resolution will suffer due to the fact that the coefficients cannot approach their optimum value (alphalimit) but are actually constrained away from it by the regularization procedure. This is why regularization should be avoided, or at least minimized, as much as possible.

Next: SUMMARY AND SUGGESTIONS FOR Up: Berryman: Nonlinear least squares Previous: NONLINEAR OUTPUT LEAST-SQUARES METHOD

Stanford Exploration Project
11/12/1997