Next: A FORMAL DEFINITION FOR Up: Preconditioning Previous: THEORY OF UNDERDETERMINED LEAST-SQUARES

SCALING THE ADJOINT

First I remind you of a rarely used little bit of mathematical notation. Given a vector $\bold m$ with components (m₁,m₂,m₃), the notation ${\bf diag\ }\bold m$ means

$\begin{displaymath} {\bf diag\ }\bold m \quad =\quad \left[ \begin{array} {ccc... ... & 0 & 0 \\ 0 &m_2 & 0 \\ 0 & 0 & m_3 \end{array} \right]\end{displaymath}$

(38)

Given the usual linearized fitting goal between data space and model space, $\bold d \approx \bold F \bold m$ ,the simplest image of the model space results from application of the adjoint operator $\hat \bold m = \bold F' \bold d$ .Unless $\bold F$ has no physical units, however, the physical units of $\hat \bold m$ do not match those of $\bold m$ ,so we need a scaling factor. The theoretical solution $\bold m_{\rm theor} = (\bold F'\bold F)^{-1}\bold F'\bold d$ suggests that the scaling units should be those of $(\bold F'\bold F)^{-1}$ .We could probe the operator $\bold F$ or its adjoint with white noise or a zero-frequency input. Bill Symes suggests we probe with the data $\bold d$ because it has the spectrum of interest. He proposes we make our image with $\hat \bold m = \bold W^2 \bold F'\bold d$ where we choose the weighting function to be

$\begin{displaymath} \bold W^2 \quad =\quad { {\bf diag\ } {\bold F' \bold d} \over {\bf diag\ } {\bold F'\bold F\bold F'\bold d} }\end{displaymath}$ (39)

which obviously has the correct physical units. (The mathematical function ${\bf diag}$ takes a vector and lies it along the diagonal of a square matrix.) The weight $\bold W^2$ can be thought of as a diagonal matrix containing the ratio of two images. A problem with the choice (39) is that the denominator might vanish or might even be negative. The way to stabilize any ratio is suggested at the beginning of Chapter ; that is, we revise the ratio a/b to

$\begin{displaymath} \bold W^2 \quad =\quad { {\bf diag} < ab \gt \over {\bf diag} < b^2 + \epsilon^2 \gt }\end{displaymath}$ (40)

where $\epsilon$ is a parameter to be chosen, and the angle braces indicate the possible need for local smoothing.

To go beyond the scaled adjoint we can use $\bold W$ as a preconditioner. To use $\bold W$ as a preconditioner we define implicitly a new set of variables $\bold p$ by the substitution $\bold m=\bold W\bold p$ .Then $\bold d \approx \bold F\bold m=\bold F\bold W\bold p$ .To find $\bold p$ instead of $\bold m$ ,we do CD iteration with the operator $\bold F\bold W$ instead of with $\bold F$ .As usual, the first step of the iteration is to use the adjoint of $\bold d\approx \bold F\bold W\bold p$ to form the image $\hat\bold p=(\bold F\bold W)'\bold d$ .At the end of the iterations, we convert from $\bold p$ back to $\bold m$ with $\bold m=\bold W\bold p$ .The result after the first iteration $\hat\bold m=\bold W\hat\bold p=\bold W(\bold F\bold W)'\bold d=\bold W^2\bold F'\bold d$ turns out to be the same as Symes scaling.

By (39), $\bold W$ has physical units inverse to $\bold F$ .Thus the transformation $\bold F\bold W$ has no units so the $\bold p$ variables have physical units of data space. Experimentalists might enjoy seeing the solution $\bold p$ with its data units more than viewing the solution $\bold m$ with its more theoretical model units.

The theoretical solution for underdetermined systems $\bold m =\bold F' (\bold F \bold F')^{-1} \bold d$ suggests an alternate approach using instead $\hat{\bold m} =\bold F' \bold W_d^2 \bold d$ .A possibility for $\bold W_d^2$ is

$\begin{displaymath} \bold W_d^2 \quad =\quad { {\bf diag\ } {\bold d} \over {\bf diag\ } {\bold F\bold F'\bold d} }\end{displaymath}$ (41)

Experience tells me that a broader methodology is needed. Appropriate scaling is required in both data space and model space. We need something that includes a weight for each space, $\bold W_m$ and $\bold W_d$ where $\hat \bold m = \bold W_m \bold F'\bold W_d \bold d$ .

I have a useful practical example (stacking in v(z) media) in another of my electronic books (BEI), where I found both $\bold W_m$ and $\bold W_d$ by iterative guessing. But I don't know how to give you a simple strategy that is not iterative. Either this is a major unsolved opportunity for a theorist, or I'll need to write down my iterative guess.

The PhD thesis of James Rickett experiments extensively with data space and model space weighting functions in the context of seismic velocity estimation.

Next: A FORMAL DEFINITION FOR Up: Preconditioning Previous: THEORY OF UNDERDETERMINED LEAST-SQUARES

Stanford Exploration Project
4/27/2004