| |
(2) |
),
although
) is not a linear problem: a particular point in the
residual can oscillate between the
In this Chapter, a quasi-Newton method is chosen. The quasi-Newton method is an iterative process where the solution to the problem is updated as follows:
| (3) |
Quasi-Newton methods intend to approximate the Hessian without explicitely computing it. The Hessian is then re-estimated and improved at each iteration of the descent method Gill et al. (1986). Quasi-newton methods can be equivalent to conjugate-gradient on a quadratic function and they are often more robust than conjugate-gradient on general nonlinear problems. They also tend to require less iterations Gill et al. (1986). However, one major drawback of the quasi-Newton methods is that they require the storage of the approximate Hessian in memory, which can be troublesome for large scale problems.
In Appendix
, a computationally effective method
that does not require the storage of the Hessian is presented. This method is
called limited-memory BFGS (L-BFGS) Nocedal (1980). With this
technique, only a limited number of solution step and gradient step
vectors are stored as the iterations go
on. For each new iteration, these vectors are combined
to form the approximate Hessian (see Appendix
for details). In the case where
all the solution step and gradient step vectors are kept in memory at
every iteration,
the L-BFGS method becomes equivalent to the BFGS update Nocedal (1980).
With the L-BFGS method, the storage needed is reduced compare to
BFGS and makes it affordable to use for image estimation.
One computational burden is the line search
algorithm that ensures sufficient decrease of the misfit function.
This comes from the need to evaluate the misfit function many times
before finding a good step length
.The value
is tested first Liu and Nocedal (1989),
thus saving substantial computational time.
Another source of improvement is by scaling the Hessian at each
iteration Liu and Nocedal (1989). This scaling greatly improves the
performances of the quasi-Newton method. Liu and Nocedal (1989) show
on numerical examples for large-scale problems that the L-BFGS method
with the scaling of the Hessian and an appropriate line search algorithm
(see Appendix
for details) is usually faster than conjugate-gradient
using the Polak-Ribière formula Kelley (1999).
One problem with using quasi-Newton method, however, is that the Huber function
is not twice continuously differentiable.
This assumption is at the heart of the convergence properties
of the L-BFGS method. Nonetheless, the L-BFGS update requires
the computation of the gradient only (see Appendix
).
Furthermore, given that the approximate Hessian is not an exact
representation of the real one, the violation to this
initial condition are expected to be mild. Examples in the next section show that
the method is never-the-less giving satisfying results and is robust
to outliers, as expected.
In this section, I have proposed an efficient algorithm that will minimize any
misfit function using the Huber norm. This algorithm is a
limited-memory BFGS technique that saves computational time and
memory requirement by (1) limiting the number of vectors kept
in memory for the update of the Hessian
,
(2) testing a default value for the step length
, and (3)
scaling the Hessian at each iteration.
In the next section, this algorithm is tested on a geophysical problem
and compare the Huber norm with the
norm with and
without regularization.