next up previous print clean
Next: conclusion Up: Guitton: Huber solver Previous: L-BFGS update

Solving the Huber problem with a quasi-Newton method

The Huber norm Huber (1973, 1981) is a hybrid l1-l2 measure. We expect to find the minimum of the function using a quasi-Newton method with a L-BFGS update of the Hessian Guitton and Symes (1999). The Huber norm is
\begin{eqnarray}
f(\bold{x})&=&\vert\bold{Ax}-\bold{m}\vert _{Huber}, \nonumber ...
 ...{r}\vert _{Huber},\\  &=& \sum_{i=1}^N M_{\epsilon}(r_i),\nonumber\end{eqnarray}
(12)
where
(13)
$\epsilon$ commands the limit between an l1 or l2 treatment of the residual; we call it the Huber threshold and it must be given by the user. The gradient of the objective function is given by
\begin{displaymath}
\bold{\nabla} f(\bold{x})=\bold{A}^T(\bold{Ax}-\bold{m})^\epsilon_{-\epsilon},\end{displaymath} (14)
where ${\bf z}^\epsilon_{-\epsilon}$ is the vector whose ith component is

\begin{displaymath}
z_i=max\{-\epsilon,min\{\epsilon,z_i\}\}.\end{displaymath}

Because the Huber function is not twice continuously differentiable, it does not satisfy the three necessary conditions that guarantee the convergence to a minimum. However, we only need to compute the gradient for the BFGS update of the Hessian. Furthermore, given that the approximated Hessian is certainly a vague approximation of the real one (Symes, 1999, Personal communication), the violation of the initial conditions is mild. In addition, results Guitton (2000) show that this method converges to a minimum. Li (1995) shows that the Huber function has a unique minimizer for any meaningful choice of $\epsilon$. Indeed, if the l1 problem $f(\bold{x}) = \vert\bold{Ax}-\bold{m}\vert _1$ has multiple solutions Tarantola (1987), then the Huber problem, provided that $\epsilon$ is small enough, also has multiple solutions. This is annoying since we want to find a global minimum for the problem using quasi-Newton updates. In practice, however, it seems that

\begin{displaymath}
\epsilon = \frac{max\vert\bold{d}\vert}{100}\end{displaymath}

is a good choice for the threshold Darche (1989). The threshold being set properly, the Huber function has mathematical properties that allow the use of quasi-Newton methods. We can now define an efficient algorithm in order to solve the Huber problem:

Algorithm2

1.
Choose $\bold{x}_0$ and the threshold $\epsilon$. Set k=0
2.
Compute $\nabla f(\bold{x}_k)$ using equation 14
3.
Compute $\bold{d}_k=-\bold{B}_k\nabla f(\bold{x}_k)$ using a L-BFGS update (Algorithm 1, step 3)
4.
Compute the step $\lambda_k$ using a MoreThuente line search ($\lambda_k=1$ tried first)
5.
Update the solution $\bold{x}_{k+1}=\bold{x}_k+\lambda_k\bold{d}_k$
6.
Go to step 2
This algorithm will converge to the minimizer $\bold{x}^*$, as proven by Liu and Nocedal (1989).
next up previous print clean
Next: conclusion Up: Guitton: Huber solver Previous: L-BFGS update
Stanford Exploration Project
4/27/2000