Next: The Iteratively Reweighted Least
Up: IRLS and the Huber
Previous: IRLS and the Huber
In the work reported here, I use
a hybrid l1-l2 error measure proposed by Huber Huber (1973):
data:image/s3,"s3://crabby-images/3fca7/3fca775bf3b3d5e2434d5c6cf2dbcb8327599355" alt="\begin{displaymath}
M_{\epsilon}(r) =
\left\{
\begin{array}
{cc}
\frac{r^2}{2\ep...
...frac{\epsilon}{2}, & \epsilon < \vert r\vert \end{array}\right.\end{displaymath}"
I will call
the Huber misfit function
,or Huber function for short (Figure 1). Note that the Huber function is smooth near
zero residual, and weights small residuals by the mean square. It is reasonable
to suppose that the Huber function, while maintaining robustness against large residuals,
is easier to minimize than l1. The parameter
, which controls the limit
between l1 and l2, is called the Huber threshold.
huber
Figure 1 Error measure proposed by Huber Huber (1973).
The upper part above is the l1 norm, while the lower part is the l2 norm.
|
| data:image/s3,"s3://crabby-images/ec746/ec746ae7fb6ac1450ec9be6b00746d6869853c2f" alt="huber" |
The problem is this: given some observed data
, we want to find the
best model
that explains the data via the operator
.
This may be posed in terms of an inverse problem leading to the minimization of the function
| data:image/s3,"s3://crabby-images/0b3e7/0b3e7cb082bb3f1290f8ec11d2c6180bc3ffee89" alt="\begin{displaymath}
f(\bold{m})= E(\bold{Hm}-\bold{d}_{\bf obs}),\end{displaymath}" |
(1) |
where E is an error measure function. As discussed above, I will use the Huber
function and try to minimize
| data:image/s3,"s3://crabby-images/5124b/5124b13ac7472ef966fa569e201aa7f0fe552f7f" alt="\begin{displaymath}
f(\bold{m}) = H_{\epsilon}(\bold{Hm}-\bold{d}_{\bf obs}).\end{displaymath}" |
(2) |
We seek to find the stationary point
of the function f.
This solution point satisfies
. This is a nonlinear
system of equations, and from Taylor expansion we have
data:image/s3,"s3://crabby-images/f51d0/f51d09089243049050bc760e8f62f27043d47f1e" alt="\begin{displaymath}
\bold{f}'(\bold{m+\delta m})\simeq \bold{f}'(\bold{m})+\bold{f}''(\bold{m})\delta \bold{m}\end{displaymath}"
if
is sufficiently small. The Newton's method consists in finding
such that
data:image/s3,"s3://crabby-images/e01c8/e01c821c4b73f664e2fa93b04937fcb2c134a3ae" alt="\begin{displaymath}
\bold{H} \delta \bold{m} = -\bold{f}'(\bold{m}) \mbox{ with } \bold{H}=\bold{f}''(\bold{m}),\end{displaymath}"
and computing the next iterate by
| data:image/s3,"s3://crabby-images/2915f/2915f6cf8ad47a8977e30ebdccab6fcee7206e05" alt="\begin{displaymath}
\bold{m}_{n+1}=\bold{m}_{n}+\alpha_n\delta \bold{m}_n\end{displaymath}" |
(3) |
where
is a steplength given by a Line Search algorithm. The general process of the program is then
- 1.
- compute the gradient
data:image/s3,"s3://crabby-images/649bb/649bb156af650e08e21909d4f2fad10408cef13a" alt="$\bold{f}'(\bold{m})$"
- 2.
- compute
- 3.
- compute
using a line search
- 4.
- update the solution
- 5.
- update the Hessian
- 6.
- go to 1.
Because the Huber function is not twice continuously differentiable, the Hessian is not
computed directly but approximated using a limited Memory BFGS update Guitton (2000),
as proposed by Nocedal (1980) and Liu and Nocedal (1989).
I have implemented for the line search a More and Thuente More and Thuente (1994) algorithm,
which ensures sufficient decrease for the function f and obeys curvature conditions
(the so-called Wolfe conditions, Kelley (1999)), thus guaranteeing that a quasi-Newton
update is possible.
The steplength
is always tried first, saving significant
computation. These choices lead to good performances for both convergence
rate and computation cost. I call ``Huber solver'' the algorithm detailed above when
used with the Huber norm.
Next: The Iteratively Reweighted Least
Up: IRLS and the Huber
Previous: IRLS and the Huber
Stanford Exploration Project
4/27/2000