The missing data problem is probably the simplest to understand and
interpret results.
We begin by binning our data onto a regular mesh.
For in fitting goals (2) we will use a selector
matrix
,which is `1' at locations where we have data and `0' at unknown locations.
As an example, let's try to interpolate
a day's worth of data
collected by SeaBeam (Figure
), which measures
water depth under and to the side of a ship Claerbout (1999).
sea.init
Figure 1 Depth of the ocean under ship tracks. | ![]() |
Figure shows the result of estimating a PEF from the
known data locations and then using it to interpolate the entire mesh.
Note how the solution has a lower spatial frequency as we move away from
the recorded data. In addition, the original tracks of the ship are still
clearly visible.
sea.pef
Figure 2 Result of using a PEF to interpolate Figure ![]() | ![]() |
If we look at a histograms of the known data and our estimated data we can see the effect of the PEF. The histogram of the known data has a nice Gaussian shape. The predicted data is much less Gaussian with a much lower variance. We want estimated data to have the same statistical properties as the known data (for a Gaussian distribution this means matching the mean and variance).
pef.histo
Figure 3 Histogram for the known data (solid lines) and the estimated data (`*'). Note the dissimilar shapes. | ![]() |
Geostatisticians are confronted with the same problem. They can produce
smooth, low frequency models through kriging, but must add a little
twist to get model with the statistical properties as the data.
To understand how, a brief review of kriging is necessary.
Kriging estimates each model point by a linear combination of nearby data
points. For simplicity lets assume that the data has a standard
normal distribution.
The geostatistician find all of the points m1 .... mn around the point they
are trying to estimate m0. The vector distance between all data points
and each data point and the estimation point
are then computed.
Using the predefined covariance function estimate C, a covariance
value is then extracted
between all known point pairs Cij and
between known points and
estimation point Ci0 at the given distances
and
(Figure
).
They compute the weights (w1 ... wn) by solving the set
of equations implied by
![]() |
(3) |
![]() |
(4) |
![]() |
The smooth models provided by kriging often prove
to be poor representations of earth properties.
A classic example is fluid flow where kriged models tend to give inaccurate
predictions. The geostatistical solution
is to perform Gaussian stochastic simulation, rather than kriging, to
estimate the field Deutsch and Journel (1992).
There are two major differences between kriging and simulation.
The primary difference
is that a random component is introduced into the estimation process.
Stochastic simulation, or sequential Gaussian simulation, begins
with a random point being selected in the model space.
They then perform kriging, obtaining
a kriged value m0 and a kriging variance .Instead of using m0 for the model value we
select a random number
from a normal distribution.
We use as our model point estimate mi,
![]() |
(5) |
The difference between kriging and simulation has a corollary in our least squares estimation problem. To see how let's write our fitting goals in a slightly different format,
![]() |
||
(6) |
![]() |
||
(7) |
We can get an estimate of , or in the case of the missing
data problem
, by applying fitting goals
(6). If we look at the variance of the model residual
and
we can get a good estimate of
,
![]() |
(8) |
![]() |
movie.distir
Figure 6 Histogram of the known data (solid line) and the four different realizations of Figure ![]() | ![]() |
Figure shows eight different realizations with a
random noise level calculated through equation (8).
Note how we have done a good job emulating the
distribution of the known data. Each image shows some similar
features but also significant differences (especially note
within the `V' portion of the known data).
![]() |
A potentially attractive feature of setting up the problem in this
manner is that it easy to have both a space-varying covariance function
(a steering filter or non-stationary PEF) along with a non-stationary
variance. Figure shows the SeaBeam example
again with the variance increasing from left to right.
non-stat
Figure 8 Realization where the variance added to the image increases from left to right. | ![]() |