In an abstract setting,
let for
be a set of m measurements. This data may be
organized into a data m-vector
. Let
be an n-vector containing the n parameters for the model/solution
space to be constructed.
Then, let
be a (possibly nonlinear) function of the
model parameter vector
such that
in infinite precision forward modeling of the physical problem of interest.
If the data di and the functions Ni are known, then the inversion
problem is to find an approximate
that satisfies the data
(i. e., has as small a data misfit as possible) in some sense.
If life were simple, the operators Ni would be analytically invertible,
so one might think it could be possible to
write the solution to the inverse problem as .However, the result is certainly a strange looking formula:
While it is perfectly sensible that the domain of each function Ni
is a set of n model parameters, and that the range is
a single datum, the postulated inverse function with multiple model outputs from single datum
inputs is clearly not sensible. In most cases, there will simply
not be sufficient information
contained in a single datum to determine multiple model parameters.
For strictly linear problems, there might exist an appropriate
operator X acting on the full data m-vector such that
produces the model n-vector. However, even for
linear problems, the existence of such an operator is not necessarily
guaranteed since the sizes m and n of the two vector spaces will generally differ.
This difficult situation leads to the introduction of pseudoinverses in linear
problems or to the methods to be described now in nonlinear problems.
So, instead of using some hypothetical analytical approach, the desired solution to the inverse problem will make simultaneous use of all (or at least most) of the data while determining some sort of optimal fit to a chosen set of model parameters. This situation is common in data analysis and often leads to formulation of least-squares methods for this type of inversion problem.
An approach to this problem based on nonlinear least-squares inversion considers the nonlinear function
F(s) = _i=1^m [d_i-N_i(s)]^2,
which is simply the sum of the squares of the data residuals
ri = di - Ni.
I formulate an ``evolution'' equation for the model vector
, where
is an evolution parameter
- treated as a continuous generalization of an iteration number.
In particular, I must ultimately discretize this parameter in
my final numerical method, in which case the particular values
of
will be exactly the iteration numbers.
But for purposes of explaining the method, it will prove useful to
treat
initially as a continuous parameter.
The thrust of the nonlinear least-squares method is to produce
a model that minimizes the least-squares error function
. One way to guarantee [see, for example, Jeffrey and
Rosner (1986) and Lu and Berryman (1991)] that a local minimum is
achieved is to pose the evolution equation for
in a way that
guarantees the value of
decreases monotonically as
increases (or at each iteration step for the discretized problem).
Taking the derivative of
with respect to the evolution parameter,
the chain rule gives
F(s) =
- 2 _j=1^n
_i=1^m [d_i-N_i(s)]N_is_j
s_j.
It is desired that the evolution equation for be chosen to guarantee
that (derivativeofF) is
. It is easy to see that
(derivativeofF) will always be negative or zero if the evolution
equations for the model parameters sj are chosen to be
s_j = ()
_i=1^m[d_i-N_i(s)]N_is_j,
where is a positive parameter to be determined.
Then,
F(s) =
- 2 ()_j=1^n
[_i=1^m [d_i-N_i(s)]N_is_j]^2,
so each term in the sum over j in (derivativeofF2) is a square
quantity and must be positive or vanish identically. The choice
(evolutioneqn) of
evolution then clearly guarantees the desired monotonically decreasing
behavior of the total squared error functional as .
To implement this procedure, a choice of discretization must be made
together with a choice for . The obvious choice of discretization for the
evolution equation (evolutioneqn) is
s_j(k+1) - s_j(k) s_j = ()
[d_i-N_i(s(k))]N_i(s)s_j|_s=
s(k),
where I have taken the finite step size for the evolution to be
. In the continuous evolution problem, the
infinitesimal changes in the evolution parameter guarantee similarly
infinitesimal changes in the model vector
and therefore
this makes the
choice of
largely arbitrary. In contrast, for the
discretized problem, the evolution of
is finite at each step
and care must be taken not to violate the desired condition that the
least-squares functional should decrease at each step. Such
violations may occur if the step size is taken too large.
Reconsidering (Fofs), I find that, by keeping only those terms
proportional to the first and second powers of the components of
, I have
F(s(k+1)) = _i=1^m [d_i - N_i(s(k)+s)]^2
_i=1^m [d_i-N_i(s(k))]^2 - 2 _j=1^n
_i=1^m [d_i-N_i(s)]N_is_j
s_j
+ _i=1^m [_j=1^nN_is_j
s_j]^2.
After substituting (discreteevolution),
the parameter can now be chosen so that the right-hand side
of (discreteFofs) decreases as much as possible at each step of
the iteration scheme. The optimum choice is easily shown to be
(k+1) = _j=1^n [_i=1^m [d_i-N_i(s(k))]N_i s_j]^2 _j,j'[ _i'=1^m [d_i'-N_i'(s(k))]N_i's_j _i=1^mN_is_j N_is_j' _i''=1^m [d_i''-N_i''(s(k))]N_i'' s_j'], since this minimizes the right-hand side of (discreteFofs).
It will prove enlightening to compare the procedure just presented
with the well-known method of conjugate gradients (Hestenes and
Stiefel, 1952; Fomel, 1996) for a linear operator such that
. Then, model updates are obtained using
s^(k+1) = s^(k) + ^(k+1)u^(k+1),
where, for the discrete iterative problem, I put the iteration numbers
in superscripts. The new vector is the (somehow) known
update direction in the model space and
is a parameter
used to optimize the step size. The updated residual vector is again given by
r^(k+1) = d - Ls^(k+1) = r^(k) - ^(k+1)Lu^(k+1). The magnitude of the residual vector is easily shown to decrease most at each step of the iteration sequence if the optimization parameter satisfies
^(k+1) = (r^(k),Lu^(k+1)) ||Lu^(k+1)||^2. Equation (cgparameter) is exactly analogous to the formula (lambda) obtained in the nonlinear least-squares problem if the components of the residual are given by
r_i^(k) = d_i - N_i(s^(k)), the matrix elements of the linear operator are
(L)_ij = L_ij = N_is_j|_ s= s^(k), and the update direction vector satisfies
^(k+1)u^(k+1) = s,
where the components of the right-hand side of (update) are
given by (evolutioneqn) evaluated at .With this identification, it becomes
clear that the nonlinear least-squares method outlined above is one natural
generalization of the well-known conjugate gradients technique.