[1]
[1] (
)
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
et al.
[1]
Proof:
![]() |
(1) |
S T V W min LS [1]LS[#1] [1]LS[#1] Int Bdy k_- k_+ k_s i(r - t) i(r - t) i(z - t)
ABSTRACTResolution matrices are useful in seismic tomography because they allow us to evaluate the information content of reconstructed images. Techniques based on the multiplicity of equivalent exact formulas that may be used to define the resolution matrices have been used previously by the author to design algorithms that avoid the need for any singular value decomposition of the ray-path matrix. An explicit procedure is presented for computing both model and data resolution matrices using Arnoldi's algorithm for iterative inversion in seismic tomography. Arnoldi's method differs from the Lanczos scheme by including explicit reorthogonalization of basis vectors. Some convenient notation is introduced to permit ready comparison of Arnoldi's method with the Lanczos approach. Arnoldi's method requires greater storage of basis vectors but completely overcomes the lack of basis vector orthogonality, which is the major practical limitation of the Lanczos method. |
INTRODUCTION
In an earlier contribution to SEP-77 (Berryman, 1993), I discussed general methods of constructing resolution matrices for iterative inverses and applied these techniques to both Lanczos and Paige-Saunders (LSQR) algorithms (Lanczos, 1950; Paige and Saunders, 1982). The comment was made that those methods could be applied to a wide variety of iterative inversion methods. To illustrate the general utility of the methods, I show how the methods can be applied to Arnoldi's algorithm for iterative inversion. This algorithm may be viewed as a variant of the Lanczos algorithm, but with reorthogonalization of the basis vectors explicitly built in. I will first review the Lanczos method and then present the Arnoldi approach using a similar notation. Hopefully the similarities and differences of the two techniques will then become clear.
REVIEW OF RESOLUTION MATRICES
Discussions of resolution played a central role in the classic paper of Aki
et al. (1977) on large scale seismic tomography. Such discussions
continue to play a key role in the interpretation of inversion results in
seismology. For example, in their recent review article -- which also
includes an extensive discussion of resolution in seismic inverse problems --
Evans and Achauer (1993) state that ``
evaluating the whole resolution
matrix, not just its diagonal elements, is a required part of interpreting [the
reconstructed model].''
Indeed, any inverse method based on incomplete or imperfect data should be
expected to produce a filtered or blurred image of the object or region
of interest. Apparent resolution is therefore an important figure of merit for
practical imaging and inversion schemes, since the user of such schemes will
eventually want to know what the smallest object is that can be distinguished.
Judgments concerning reliability of features observed in the
reconstructed image depend strongly on our understanding of inherent
resolution capabilities of the inverse method used to produce the image.
Therefore, a convenient quantitative measure of pointwise or cellwise model
reliability is certainly helpful and perhaps essential for subsequent
interpretation.
For completeness, I will review the main ideas about resolution matrices. For linear inversion problems, resolution matrices can be understood most easily by considering a matrix equation of the form
= ,
and first asking the question: Given matrix and data vector ,what model vector solves this equation? When the matrix is
square and invertible,
the answer to the question is relatively easy:
,with
being the usual matrix inverse of .However, it often happens in geophysical inversion problems that is
not square, or not invertible even if it is square. In these situations,
the least-squares method is often used, resulting in the normal equations
^T= ^T,
which can often be solved approximately for since the normal matrix
is square and symmetric -- although it may still be singular.
It proves convenient now to introduce an approximate inverse
called the
Moore-Penrose pseudoinverse (Moore, 1920; Penrose, 1955a).
This generalized inverse is the unique matrix that satisfies the four
conditions:
,
,
,and
.Although other choices for the approximate inverse are known [for example, see
Rao (1965)], I will restrict discussion here to this best known approximate
inverse. Then, after multiplying (Mst) on the left by
,the result is
^= ^.
If it were true that
(the identity matrix),
then I would have solved the inversion problem exactly, and also have perfect
model resolution. But it is precisely in those problems for which no such
inverse exists that the following analysis is needed. Define the matrix
coefficient of in (MMsMt) to be the resolution matrix
^.
The deviations of from the identity matrix , i.e.,
the components of the difference matrix
, determine
the degree of distrust appropriate for the components of the solution
vector that are most poorly resolved.
For definiteness, consider the seismic tomography problem (see Figure 1):
is an
ray-path matrix,
is a data m-vector of first arrival traveltimes, and is the model n-vector for (possibly rectangular) cells of constant slowness
(inverse velocity). I seek the slownesses given the
measured traveltimes in and the estimates of the ray paths between
source and receiver locations contained in the matrix [see
Berryman (1990) and references therein]. Then, the resolution matrix
defined in
(RMM) is the model resolution, since the slowness vector is the desired
model of acoustic wave slowness. A data resolution matrix may also be defined.
First, multiply (MMsMt) on the left by so
^= = ^,
and then compare (MMMsMMt) to (Mst), noting that the matrix product
multiplying should equal the
identity matrix if the approximate inverse
is a true inverse.
Again, deviations of this matrix from the identity provide information about
the degree to which the computed solution makes use of all the data in
. Thus, the data resolution matrix (Jackson, 1972; Wiggins, 1972)
is defined by
_data ^, while the resolution matrix defined previously in (RMM) is the model resolution (Backus and Gilbert, 1968; Jackson, 1972)
_model ^. Furthermore, for seismic inversion, I must also consider mathematial nonlinearities involved in the process of finding a ray-path matrix that is consistent with the model .For the present purposes, assume that and are the final (and mutually consistent) products of an iterative algorithm (Berryman, 1990). Then, the question of resolution needs to be studied carefully in order to explore fully the range of possible solutions resulting from inherent nonuniqueness of the inverse problem.
The significance of these two resolution matrices may be understood by considering the singular value decomposition (SVD) of the matrix ,given by
= _i=1^r _i _i_i^T,
where the m-vectors
and n-vectors
are the eigenvectors
of determined by
and
and the
s are the eigenvalues.
The eigenvectors are also assumed to satisfy orthonormality
conditions
and
.The rank of (the number of nonzero eigenvalues) has a value
. The Moore-Penrose pseudoinverse is then known to be given
by
^= _i=1^r _i^-1_i_i^T, so the resolution matrices are written explicitly in terms of sums of the outer products of the eigenvectors as
_data = _i=1^r _i_i^T. When displayed in this form, it is clear that the resolution matrices simply express the completeness of the resolved model or data spaces respectively. They are projection operators on the span of the resolved parts of the model and data vector spaces.
COMPUTING RESOLUTION
Now it is important to recognize that, although the resolution matrices have generally been defined by equations (Rd) and (Rm) -- or by (vv) and (uu) (which implicitly assume that a singular value decomposition has been performed), it may nevertheless be possible to compute these matrices in other ways. Of particular importance is the computation of an effective inverse matrix generated by an iterative inversion procedure.
To establish the plausibility of computing resolution without singular value
decomposition, first consider a simple pedagogical example that would
not be ideal for computations.
For convenience define
to be a parameter having the significance of a
continuous iteration number and let
be the current approximation
to the pseudoinverse
.Then the current value of the approximate solution vector
is given by
and the unresolved part of
the data vector is clearly given by the difference vector
. The length of this vector is
a scalar measure of the unresolved portion of the data. By designing the
iterative inversion scheme to decrease the length of this vector
progressively as
, the derivative of its square with respect
to the continuous iteration number
is given by
^T(-^T^T)(-)
= -
^T^T^T(-)-
^T(-^T^T).
A sufficient condition for the traveltime data resolution to improve
continuously as
is then [see Lu and Berryman (1990)]
the equation of motion for given by
= ^T(-),
where
is some arbitrary scalar that determines the rate of
convergence. It follows by construction that the right hand side of
(firstderivative) is always negative or zero. Thus, the nonnegative
length
is a continuously decreasing function of
the iteration parameter
as long as
has the equation of
motion given by (eom). Clearly, the right hand side of (eom)
vanishes if and only if the approximate inverse matrix satisfies
^T() = ^T,
which is equivalent to the normal equations of least-squares,
so
as expected (Penrose, 1955b).
Defining effective resolution matrices
and
,then it follows from (eom) that
_data =
^T(-_data).
Assuming initial conditions for the integration are given by
and
, it is easy to see
that the solutions of these equations will be symmetric matrices as desired.
This approach becomes an iterative procedure when the equations are
solved numerically by discretizing the independent parameter
and
stepping from one discrete value of
to the next.
The procedure just outlined establishes that iterative procedures for computing resolution matrices are certainly possible. However, this particular method is less than ideal because it would be computationally intensive. So next consider a general procedure that could be applied to many iterative methods that are used in practice.
To find a more efficient approach that is also inherently symmetric, first analyze the SVD of the normal matrix, i.e.,
^T= _i=1^r _i_i_i^T_j=1^r _j _j_j^T = _i=1^r _i^2_i_i^T. Then, it is easily shown that
(^T)= _i=1^r _i_i^T = _model, and similarly that an alternative is given by
^T(^T)^= _i=1^r _i_i^T = _model, a formula that is automatically symmetric. The data resolution is given by the various alternative forms
^= ^T(^T)^= (^T) = _j=1^r _j_j^T = _data.
Iterative methods such as the method of Lanczos (1950) for solving
(Mst) in the
least-squares sense often compute the pseudoinverse -- not of itself
but rather -- of the normal matrix
. The appearance of the
pseudoinverse
of the normal matrix in these
alternative expressions for the resolution matrices
(Rmagain)-(Rdagain) then provides both a
motivation and a clue to constructing methods for computing resolution with
such iterative methods. Thus, in the following discussion, I am able to
show how these various equivalent formulas for the resolution matrices may be
used efficiently, in the course of a routine computation whose primary goal is
to find an approximate solution to by iterative methods.
TRIDIAGONALIZATION METHOD OF LANCZOS
Consider the linear inversion problem of the form ,where is the unknown.
For a crosshole seismic tomography problem (Berryman, 1990),
is an
ray-path matrix, is an m-vector of
first arrival traveltimes, and is the model slowness (inverse velocity)
vector.
The method of Lanczos (1950) solves
this problem by introducing a sequence of orthonormal vectors
through a process of tridiagonalization. To obtain the minimum norm
least-squares solution of , the method may be applied
to the normal equations (normaleqns),
since they have the form with square symmetric matrix
, unknown vector , and data vector
.
Lanczos algorithm
With
and
,the Lanczos algorithm is a projection procedure equivalent to the following:
[^(1)(^(1))^T+^(2)(^(2)
)^T]^(1) = ^(1),
and, for
,
[^(k-1)(^(k-1))^T+^(k)(^(k)
)^T+^(k+1)(^(k+1))^T]
^(k) = ^(k).
By simple induction on the recursion formulas, all the basis vectors are ideally orthonormal and therefore satisfy
. Of course, these
conditions are not satisfied exactly for finite precision calculations --
leading to some practical difficulties that will not be
discussed here for lack of space (Parlett, 1980).
For the application to traveltime tomography, the pertinent constants are
defined by
,
,and
. The equations (lanczos1)-(lanczos3)
determine a tridiagonal system of the form
^(k+1)_k^TN_k+1+
_k_k = ^T_k for 2kr,
where the unit vector
is all zeroes except for a one in the kth
position. The tridiagonal matrix of coefficients is defined by
_k = D_1 & N_2 & & &
N_2 & D_2 & N_3 & &
& N_3 & D_3 & N_4 &
& & & &
& & & N_k & D_k
for 2k r,
and where the matrix
is composed of the resulting orthonormal vectors.
In practical implementations of the algorithm, the constants Nk+1
are generally found through the normalization condition implicit in
(lanczos3).
Assuming infinite precision, the process stops when k=r (the rank of the
matrix) because then
(or is numerically negligible).
It follows from (tridiagonal) that this tridiagonalization process results
in the identity
^T= _r_r_r^T.
Since
is invertible,
the Moore-Penrose inverse (Penrose, 1955a) of the
normal matrix is given by
The solution to the least-squares inversion problem may therefore be
written as
.The first column of the
inverse of
(the only one needed) may be found using an elementary
recursion relation.
Lanczos resolution
Since the Lanczos algorithm directly produces a sequence of orthonormal vectors in the model space, it is straightforward to see that the model resolution matrix for this method is
_model = ^T(^T)^=
_r_r^T,
which is also clearly symmetric. I compute the data resolution matrix
using the fact that
together with (Adagger), so that
_data = _r(_r)^-1_r^T^T. Both resolution matrices are symmetric if the full Lanczos inverse is computed.
Assuming only that orthogonality has been maintained among the vectors
[a nontrivial assumption for realistic applications
(Parlett, 1980)],
the matrix
is just the span of the model space associated with
the ray-path matrix . Thus, (modelreslanczos) is understood
intuitively as the result one obtains by finding a new orthonormal set of
basis vectors for the span of the resolution matrix. In other methods
such as the Paige-Saunders LSQR algorithm (Paige and Saunders, 1982), it is
also possible to obtain a corresponding formula for the data resolution.
However, for the Lanczos algorithm, the data resolution must be computed
indirectly from the available information using (datareslanczos).
ARNOLDI INVERSION WITH REORTHOGONALIZATION
Once again, consider the linear inversion problem of the form
, where is the unknown.
For a crosshole seismic tomography problem (Berryman, 1990),
is an
ray-path matrix, is an m-vector of
first arrival traveltimes, and is the model slowness (inverse velocity)
vector.
The method of Arnoldi (1951) solves
this problem by introducing a sequence of orthonormal vectors
through a process similar to Lanczos tridiagonalization, but with all vectors
reorthogonalized against all previous
for i < k. To obtain the minimum norm
least-squares solution of , the method may be applied
to the normal equations (normaleqns),
since they have the form with square symmetric matrix
, unknown vector , and data vector
.
Arnoldi algorithm
With
and
,the Arnoldi algorithm is a projection procedure equivalent to the following:
[^(1)(^(1))^T+^(2)(^(2)
)^T]^(1) = ^(1),
and, for
,
[_i=1^k+1^(i)(^(i))^T]
^(k) = ^(k).
By construction, all the basis vectors satisfy
,so to numerical accuracy the vectors are orthonormal and furthermore
they would each be identical to the corresponding vectors appearing in the Lanczos process if the computations could be performed
with infinite precision. Results would be identical to the results
of Lanczos method in this idealized situation.
In fact, (arnoldi1) and (arnoldi2) are identical to (lanczos1) and (lanczos2),
while (arnoldi3) is identical to (lanczos3) for k=2. Differences only begin
to arise for
.
For the application to traveltime tomography, the pertinent constants are
defined by
,
,and
and
(except that
).
Note that N1 is exactly the same as the constant of the same name
in the Lanczos algorithm. The equations (arnoldi1)-(arnoldi3)
determine a system of the form
_k+1_k+1 = ^T_k for 2kr,
where the upper Hessenberg matrix (
) of coefficients is defined by
_k+1 = d_1 & n_1,2 & n_1,3 & n_1,4 & & n_1,k
n_2,1 & d_2 & n_2,3 & n_2,4 & & n_2,k
& n_3,2 & d_3 & n_3,4 & &
& & n_4,3 & & &
& & & & d_k-1 & n_k-1,k
& & & & n_k,k-1 & d_k
& & & & & n_k+1,k
for 2k r,
and where the matrix
is composed of the resulting orthonormal vectors.
Assuming infinite precision, the process stops when k=r (the rank of the
matrix) because then
(or is numerically negligible).
Thus, the bottom row of the Hessenberg matrix
is full of zeroes,
and therefore may be truncated to create the square (
) matrix
which contains only the first r rows of
.Then, it follows from (hessenberg) that this process results
in the identity
^T= _r_r_r^T.
Since
is invertible,
the Moore-Penrose inverse (Penrose, 1955a) of the
normal matrix is given by
The solution to the least-squares inversion problem may therefore be
written as
.The first column of the
inverse of
is the only one needed and may be found using
an elementary recursion relation.
Arnoldi resolution
Like the Lanczos algorithm, the Arnoldi algorithm produces a sequence of orthonormal vectors in the model space. It is therefore straightforward to see that the model resolution matrix for this method is
_model = ^T(^T)^=
_r_r^T,
which is also clearly symmetric. In analogy with the results for the Lanczos
algorithm, I can again compute the data resolution matrix using the fact that
together with (AdaggerH), so
_data = _r(_r)^-1_r^T^T. Both resolution matrices are symmetric if the full Arnoldi inverse is computed.
By design, orthogonality is maintained among the vectors
,so the matrix
is just the span of the model space associated with
the ray-path matrix . Thus, (modelresarnoldi) is understood
intuitively as the result one obtains by finding a new orthonormal set of
basis vectors for the span of the resolution matrix.
However, for the Arnoldi algorithm as for the Lanczos algorithm, the data
resolution must be computed indirectly from the available information using
(dataresarnoldi).
DISCUSSION
If computations could be performed with infinite precision, the Lanczos and Arnoldi algorithms would be the same. To see that this is so, consider the definition of Arnoldi matrix element ni,j:
n_i,j (^(i))^T^T^(j).
Since the normal matrix
is symmetric, it follows immediately
that ni,j = nj,i, at least in principle. For example, the Lanczos
algorithm computes N2 = n2,1 and then uses the identity based on the
symmetric nature of the normal matrix to make the substitution n1,2 = N2.
The remaining matrix elements in the first row of the Hessenberg matrix
(hessenbergmatrix) then vanish
,again in principle. In practice, these elements do not quite vanish
to working precision. By retaining these nonzero matrix elements in the
algorithm, Arnoldi provides an algorithmically efficient method of dealing
with the nonorthogonality of the basis vectors that would be generated by
using the pure Lanczos method without any form of reorthogonalization. The
price that must be paid for this improvement is the increase in storage
needed to accommodate all the basis vectors, which must be retained to
compute the far off-diagonal matrix elements used in the reorthogonalization.
ACKNOWLEDGMENTS
I thank H. Tal-Ezer for introducing me to the Arnoldi algorithm.
REFERENCES