The basic idea of least-squares fitting is that the residual is orthogonal to the fitting functions. Applied to the PE filter, this idea means that the output of a PE filter is orthogonal to lagged inputs. The orthogonality applies only for lags in the past, because prediction knows only the past while it aims to the future. What we want to show here is different, namely, that the output is uncorrelated with itself (as opposed to the input) for lags in both directions; hence the output spectrum is white.
In (21) are two separate and independent autoregressions,
for finding the filter
,and
for finding the filter
.By noticing that the two matrices are really the same
(except a row of zeros on the bottom of
is a row in the top of
)we realize that the two regressions must result in the same filters
,and the residual
is a shifted version of
.In practice, I visualize the matrix being a thousand components tall
(or a million)
and a hundred components wide.
![]() |
(21) |
Our goal is a different theorem that is imprecise when applied
to the three coefficient filters displayed in (21),
but becomes valid as the filter length tends to infinity
and the matrices become infinitely wide.
Actually, all we require is the last component in
,namely bn tend to zero.
This generally happens because as n increases,
yt-n becomes a weaker and weaker predictor of yt.
The matrix contains
all of the columns that are found in
except the last (and the last one is not important).
This means that
is not only orthogonal to all
of
's columns (except the first)
but
is also orthogonal to all of
's columns except the last.
Although
isn't really perpendicular to the last column
of
, it doesn't matter because that column
has hardly any contribution to
since |bn|<<1.
Because
is (effectively)
orthogonal to all the components of
,
is also orthogonal to
itself.
(For any
and
, if
and
then
and also
).
Here is a detail:
In choosing the example of equation (21),
I have shifted the two fitting problems by only one lag.
We would like to shift by more lags and get the same result.
For this we need more filter coefficients.
By adding many more filter coefficients we are adding many more columns
to the right side of .That's good because we'll be needing to neglect more columns
as we shift
further from
.Neglecting these columns is commonly justified by the experience
that ``after short range regressors have had their effect,
long range regressors generally find little remaining to predict.''
(Recall that the damped harmonic oscillator from physics,
the finite difference equation that predicts the future from the past,
uses only two lags.)
Here is the main point:
Since and
both contain the same signal
but time-shifted,
the orthogonality at all shifts means that the autocorrelation
of
vanishes at all lags.
An exception, of course, is at zero lag.
The autocorrelation does not vanish there
because
is not orthogonal to its first column
(because we did not minimize with respect to a0).
As we redraw
for various lags,
we may shift the columns only downward
because shifting them upward would bring in the first column
of
and the residual
is not orthogonal to that.
Thus we have only proven that
one side of the autocorrelation of
vanishes.
That is enough however, because autocorrelation functions
are symmetric, so if one side vanishes, the other must also.
If and
were two-sided
filters like
the proof would break.
If
were two-sided,
would catch the
nonorthogonal column of
.Not only is
not proven to be perpendicular
to the first column of
,but it cannot be orthogonal to it
because a signal cannot be orthogonal to itself.
The implications of this theorem are far reaching.
The residual ,a convolution of
with
has an
autocorrelation that is an impulse function.
The Fourier transform of an impulse is a constant.
Thus the spectrum of the residual is ``white''.
Thus
and
have mutually inverse spectra.
Since the output of a PEF is white, the PEF itself has a spectrum inverse to its input. |
An important application of the PEF
is in missing data interpolation.
We'll see examples later in this chapter.
My third book,
PVI
has many
examples
in one dimension with both synthetic data and field data
including the gap parameter.
Here we next extend these ideas to two (or more) dimensions.