Next: Solving the Huber problem
Up: A quasi-Newton method for
Previous: Proof
For the sake of completeness, I give the updating formulas of the Hessian as presented
by Nocedal (1980). We define first
![\begin{displaymath}
\bold{\rho}_i = 1/\bold{y}_i^T\bold{s}_i \mbox{ and } \bold{v}_i=(I-\bold{\rho}_i\bold{y}_i\bold{s}_i^T).\end{displaymath}](img48.gif)
In addition, we pose
.As described above, when k, the number of iterations, obeys
, where m is the storage limit, we
have the usual BFGS update
For k+1 > m we have the special limited-memory update
| ![\begin{eqnarray}
\bold{B}_{k+1}&=&\bold{v}_k^T\bold{v}_{k-1}^T\cdots\bold{v}_{k-...
...bold{v}_k \nonumber\\ & &+\rho_k\bold{s}_k\bold{s}_k^T. \nonumber\end{eqnarray}](img52.gif) |
|
| |
| |
| (5) |
| |
| |
| |
It is easy to show that the special updated Hessian is also spd. The L-BFGS algorithm is then
Algorithm1
- 1.
- Choose
, m,
,
and a symmetric positive definite
. Set k=0
- 2.
- Compute
| ![\begin{eqnarray}
\bold{d}_k&=&-\bold{B}_k\nabla f(\bold{x}_k)\\ \bold{x}_{k+1}&=&\bold{x}_k+\lambda_k\bold{d}_k,
\end{eqnarray}](img57.gif) |
(6) |
| (7) |
where
verifies the Wolfe conditions More and Thuente (1994):
| ![\begin{eqnarray}
f(\bold{x}_k+\lambda_k\bold{d}_k) &\leq& f(\bold{x}_k) + \mu \l...
...d{d}_k\vert&\geq& \nu\vert\nabla f(\bold{x}_k)^T\bold{d}_k\vert.
\end{eqnarray}](img59.gif) |
(8) |
| (9) |
We always try steplength
first.
- 3.
- Let
=min
,
. Check if
.
- If no:
(steepest descent step) and delete the pairs
. - If yes: Update
times using the pairs
,
i.e., let
| ![\begin{eqnarray}
\bold{B}_{k+1}&=&\bold{v}_k^T\bold{v}_{k-1}^T\cdots\bold{v}_{k-...
...ld{v}_k \nonumber\\ & &+\rho_k\bold{s}_k\bold{s}_k^T. \nonumber
\end{eqnarray}](img68.gif) |
|
| |
| |
| (10) |
| |
| |
| |
- 4.
- Set k:=k+1 and go to 2.
The update
is not formed explicitly; instead, we compute
with an iterative formula Nocedal (1980).
Liu and Nocedal (1989) propose that we scale the initial symmetric positive definite
at each iteration:
| ![\begin{displaymath}
\bold{B}_k^0=\frac{\bold{y}_k^T\bold{s}_k}{\parallel\bold{y}_k\parallel^2}\bold{B}_0.\end{displaymath}](img71.gif) |
(11) |
This scaling greatly improves the performances of the method. Liu and Nocedal (1989) show that
the storage limit for large-scale problems has little effect on the method's performances.
A common choice for m is m=5 (this is the default in my implementation as well).
Conditions (8) and (9) are satisfied if we use an appropriate line search algorithm. I programmed a
MoreThuente line search algorithm More and Thuente (1994), which ensures sufficient decrease of the objective
function (equation 8) and obeys the curvature condition given in equation (9). We do not describe this
program here. In practice, the initial guess
for the Hessian can be the identity matrix
;
then it might be scaled as proposed above.
Liu and Nocedal (1989) prove that the L-BFGS algorithm converges to the local minimizer
and that the family of solutions
converges R-linearly
(remember that the usual BFGS gives q-superlinear convergence, which is better).
Next: Solving the Huber problem
Up: A quasi-Newton method for
Previous: Proof
Stanford Exploration Project
4/27/2000