Working in an optimization problem, we begin from residuals between theory and practice. These residuals can be scaled to make new optimization residuals before we start minimizing their energy. What scaling should we use? The scaling can be a simple weighting function or a filter. A filter is simply a weighting function in Fourier space.
The basic idea of common sense, which also comes to us as results proven by Gauss or from the theory of statistical signal processing, is this: The optimization residuals should be roughly of equal scale. This makes sense because squaring magnifies scale, and anything small will be ignored while anything large will dominate. Scaling optimization residuals to be in a common range makes them all equally influential on the final solution. Not only should optimization residuals be of like scale in physical space, they should be of like scale in Fourier space or eigenvector space, or any other space that we might use to represent the optimization residuals. This implies that the optimization residuals should be uncorrelated. If the optimization residuals were correlated, they would have a spectrum that was not white. Not white means of differing sizes in Fourier space. Residuals should be the same size as one another in physical space, likewise in Fourier space. Thus the optimization residuals should be orthogonal and of unit scale, much like Fourier components or as eigenvectors are orthonormal.
Let us approach the problem backwards. Suppose we have two random variables that we take to be the ideal optimization residuals x1 and x2. In reality the two may be few or trillions. In the language of statistics, the optimization residuals are expected to have zero mean, an idea that is formalized by writing E(x1)=0 and E(x2)=0. Likewise these ideal optimization residuals have equal energy, E(x12)=1 and E(x22)=1. Finally, these two optimization residuals are uncorrelated, a condition which is written as E(x1 x2)=0. The expectation symbol E() is like a summation over many instances of the random variable.
Now suppose there exists a transformation from these ideal optimization residuals
to two experimental residuals y1 and y2, say
where
![]() |
(51) |
![]() |
(52) | |
(53) |
![]() |
(54) | |
(55) | ||
(56) | ||
(57) |
Given a matrix ,there is a simple well-known method
called the Cholesky factorization method that will factor
into two parts like
and
.The method creates for us either an upper or a lower triangular
matrix (our choice) for
.You can easily reinvent the Cholesky method
if you multiply
the symbols for two triangular matrices like
and
and notice the procedure that
works backwards from
to
.The experimenter seeks not
, however, but its inverse,
the matrix that takes us from the experimental residuals
to the ideal optimization residuals
that are uncorrelated and of equal energies.
The Cholesky factorization costs N3 computations,
which is about the same as the cost of the matrix inversion
of
or
.For geophysical maps and other functions on Cartesian spaces,
the Prediction Error Filter (PEF) accomplishes the same
general goal and has the advantage that we have already
learned how to perform the operation using operators
instead of matrices.
The multivariate spectrum of experimental residuals
|
If I were a little stronger at analysis (or rhetoric)
I would tell you that
the optimizers preconditioned variable is the statisticians IID (Independent Identically Distributed) random variable.
For stationary (statistically constant) signals and images,
is the model-space PEF.
Echo soundings and
interval velocity
have statistical properties
that change with depth.
There
is a diagonal weighting matrix
(perhaps before or after a PEF).