next up previous print clean
Next: Operator approach to missing Up: Missing-data program Previous: Missing-data program

Matrix approach to missing data

missing data

Customarily, we have referred to data by the symbol $\bold d$.Now that we are dividing the data space into two parts, known and unknown (or missing), we will refer to this complete space as the model (or map) space $\bold m$.

There are 15 data points in Figures [*]-[*]. Of the 15, 4 are known and 11 are missing. Denote the known by k and the missing by u. Then the sequence of missing and known is (u,u,u,u,k,u,k,k,k,u,u,u,u,u,u). Because I cannot print $15\times 15$ matrices, please allow me to describe instead a data space of 6 values (m1, m2, m3, m4, m5, m6) with known values only m2 and m3, that is arranged like (u,k,k,u,u,u).

Our approach is to minimize the energy in the residual, which is the filtered map (model) space. We state the fitting goals $\bold 0\approx \bold F\bold m$ as  
 \begin{displaymath}
\left[ 
\begin{array}
{c}
 0 \\  
 0 \\  
 0 \\  
 0 \\  
 0...
 ..._2 \\  
 m_3 \\  
 m_4 \\  
 m_5 \\  
 m_6
 \end{array} \right]\end{displaymath} (1)
We rearrange the above fitting goals, bringing the columns multiplying known data values (m2 and m3) to the left, getting $\bold y =-\bold F_k \bold m_k \approx \bold F_u \bold m_u$. 
 \begin{displaymath}
\left[ 
\begin{array}
{c}
 y_1 \\  
 y_2 \\  
 y_3 \\  
 y_4...
 ...ay}
{c}
 m_1 \\  
 m_4 \\  
 m_5 \\  
 m_6
 \end{array} \right]\end{displaymath} (2)
This is the familiar form of an overdetermined system of equations $\bold y \approx \bold F_u \bold m_u$which we could solve for $\bold m_u$as illustrated earlier by conjugate directions, or by a wide variety of well-known methods.

The trouble with this matrix approach is that it is awkward to program the partitioning of the operator into the known and missing parts, particularly if the application of the operator uses arcane techniques, such as those used by the fast-Fourier-transform operator or various numerical approximations to differential or partial differential operators that depend on regular data sampling. Even for the modest convolution operator, we already have a library of convolution programs that handle a variety of end effects, and it would be much nicer to use the library as it is rather than recode it for all possible geometrical arrangements of missing data values.

Note: Here I take the main goal to be the clarity of the code, not the efficiency or accuracy of the solution. So, if your problem consumes too many resources, and if you have many more known points than missing ones, maybe you should fit $\bold y \approx \bold F_u \bold m_u$and ignore the suggestions below.


next up previous print clean
Next: Operator approach to missing Up: Missing-data program Previous: Missing-data program
Stanford Exploration Project
4/27/2004