Next: conclusion Up: Guitton: Huber solver Previous: L-BFGS update

Solving the Huber problem with a quasi-Newton method

The Huber norm Huber (1973, 1981) is a hybrid l¹-l² measure. We expect to find the minimum of the function using a quasi-Newton method with a L-BFGS update of the Hessian Guitton and Symes (1999). The Huber norm is

	$\begin{eqnarray} f(\bold{x})&=&\vert\bold{Ax}-\bold{m}\vert _{Huber}, \nonumber ... ...{r}\vert _{Huber},\\ &=& \sum_{i=1}^N M_{\epsilon}(r_i),\nonumber\end{eqnarray}$
		(12)

where

(13)

$\epsilon$ commands the limit between an l¹ or l² treatment of the residual; we call it the Huber threshold and it must be given by the user. The gradient of the objective function is given by

$\begin{displaymath} \bold{\nabla} f(\bold{x})=\bold{A}^T(\bold{Ax}-\bold{m})^\epsilon_{-\epsilon},\end{displaymath}$

(14)

where ${\bf z}^\epsilon_{-\epsilon}$ is the vector whose ith component is

$\begin{displaymath} z_i=max\{-\epsilon,min\{\epsilon,z_i\}\}.\end{displaymath}$

Because the Huber function is not twice continuously differentiable, it does not satisfy the three necessary conditions that guarantee the convergence to a minimum. However, we only need to compute the gradient for the BFGS update of the Hessian. Furthermore, given that the approximated Hessian is certainly a vague approximation of the real one (Symes, 1999, Personal communication), the violation of the initial conditions is mild. In addition, results Guitton (2000) show that this method converges to a minimum. Li (1995) shows that the Huber function has a unique minimizer for any meaningful choice of $\epsilon$ . Indeed, if the l¹ problem $f(\bold{x}) = \vert\bold{Ax}-\bold{m}\vert _1$ has multiple solutions Tarantola (1987), then the Huber problem, provided that $\epsilon$ is small enough, also has multiple solutions. This is annoying since we want to find a global minimum for the problem using quasi-Newton updates. In practice, however, it seems that

$\begin{displaymath} \epsilon = \frac{max\vert\bold{d}\vert}{100}\end{displaymath}$

is a good choice for the threshold Darche (1989). The threshold being set properly, the Huber function has mathematical properties that allow the use of quasi-Newton methods. We can now define an efficient algorithm in order to solve the Huber problem:

Algorithm2

1.: Choose $\bold{x}_0$ and the threshold $\epsilon$ . Set k=0
2.: Compute $\nabla f(\bold{x}_k)$ using equation 14
3.: Compute $\bold{d}_k=-\bold{B}_k\nabla f(\bold{x}_k)$ using a L-BFGS update (Algorithm 1, step 3)
4.: Compute the step $\lambda_k$ using a MoreThuente line search ( $\lambda_k=1$ tried first)
5.: Update the solution $\bold{x}_{k+1}=\bold{x}_k+\lambda_k\bold{d}_k$
6.: Go to step 2

This algorithm will converge to the minimizer $\bold{x}^*$ , as proven by Liu and Nocedal (1989).

Next: conclusion Up: Guitton: Huber solver Previous: L-BFGS update

Stanford Exploration Project
4/27/2000