previous up next print clean
Next: WEIGHTS AND CONSTRAINTS Up: Data modeling by least Previous: Data modeling by least

MORE EQUATIONS THAN UNKNOWNS

When there are more linear equations than unknowns, it is usually impossible to find a solution which satisfies all the equations. Then one often looks for a solution which approximately satisfies all the equations. Let a and c be known and x be unknown in the following set of equations where there are more equations than unknowns.  
 \begin{displaymath}
\left[ 
\begin{array}
{cccc}
 a_{11} & a_{12} & \cdots & a_{...
 ...{c} 
 c_1 \\  
 c_2 \\  
 \vdots \\  c_n \\ \end{array} \right]\end{displaymath} (1)

Usually there will be no set of xi which exactly satisfies (1). Let us define an error vector ej by  
 \begin{displaymath}
\left[ 
\begin{array}
{cccc}
 a_{11} & a_{12} & \cdots & a_{...
 ...
{c}
 e_1 \\  
 e_2 \\  
 \vdots \\  e_n \\ \end{array} \right]\end{displaymath} (2)
It simplifies the development to rewrite this equation as follows (a trick I learned from John P. Berg).  
 \begin{displaymath}
\left[ 
\begin{array}
{cccc}
 -c_1 & a_{11} & \cdots & a_{1m...
 ...
{c}
 e_1 \\  
 e_2 \\  
 \vdots \\  e_n \\ \end{array} \right]\end{displaymath} (3)
We may abbreviate this equation as  
 \begin{displaymath}
{\bf Bx} \eq {\bf e}\end{displaymath} (4)
where B is the matrix containing c and a. The ith error may be written as a dot product and either vector may be written as the column
\begin{eqnarraystar}
e_i 
\eq [b_{i1} \quad b_{i2}\quad \cdots] 
\;
\left[ 
\beg...
 ...1} \\  
 b_{i2} \\  
 \vdots \end{array} \right] \nonumber \\ \end{eqnarraystar}
Now we will minimize the sum squared error E defined as $\sum {e_i}^2$ 
 \begin{displaymath}
E \eq \sum_i [1 \quad x_1 \quad \cdots] 
\; \left[ 
\begin{a...
 ...\begin{array}
{c}
 1 \\  
 x_1 \\  
 \vdots \end{array} \right]\end{displaymath} (5)
The summation may be brought inside the constants  
 \begin{displaymath}
E \eq [1 \quad x_1 \quad x_2 \quad \cdots] 
\; \left\{
 \sum...
 ...ay}
{c}
 1 \\  
 x_1 \\  
 x_2 \\  
 \vdots \end{array} \right]\end{displaymath} (6)
The matrix in the center, call it rij, is symmetrical. It is a positive (more strictly, nonnegative) definite matrix because you will never be able to find an x for which E is negative, since E is a sum of squared ei. We find the x with minimum E by requiring $\partial E/\partial x_1 = 0, \
\partial E/\partial x_2 = 0, \, \ldots ,
\partial E/\partial x_m = 0.$ Notice that this will give us exactly one equation for each unknown. In order to clarify the presentation we will specialize (6) to two unknowns.  
 \begin{displaymath}
E 
\eq [1 \quad x_1 \quad x_2]
\; \left[ 
\begin{array}
{ccc...
 ...[ 
\begin{array}
{c}
 1 \\  
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (7)
Setting to zero the derivative with respect to x1, we get  
 \begin{displaymath}
0 \eq {\partial E \over \partial x_1} 
\eq [0 \quad 1 \quad ...
 ...left[ 
\begin{array}
{c}
 0 \\  
 1 \\  
 0 \end{array} \right]\end{displaymath} (8)
Since rij = rji, both terms on the right are equal. Thus (8) may be written  
 \begin{displaymath}
0 \eq {\partial E \over \partial x_1} 
\eq 2 [r_{10} \quad r...
 ...[ 
\begin{array}
{c}
 1 \\  
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (9)
Likewise, differentiating with respect to x2 gives  
 \begin{displaymath}
0 \eq {\partial E \over \partial x_2} 
\eq 2 [r_{20} \quad r...
 ...[ 
\begin{array}
{c}
 1 \\  
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (10)
Equations (9) and (10) may be combined  
 \begin{displaymath}
\left[ 
\begin{array}
{c}
 0 \\  
 0 \end{array} \right] 
\e...
 ...[ 
\begin{array}
{c}
 1 \\  
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (11)
This form is two equations in two unknowns. One might write it in the more conventional form  
 \begin{displaymath}
\left[ 
\begin{array}
{cc}
 r_{11} & r_{12} \\  r_{21} & r_{...
 ...ft[ 
\begin{array}
{c}
 r_{10} \\  
 r_{20} \end{array} \right]\end{displaymath} (12)
The matrix of (11) lacks only a top row to be equal to the matrix of (7). To give it that row, we may augment (11) by  
 \begin{displaymath}
v \eq r_{00} + r_{01} x_1 + r_{02} x_2\end{displaymath} (13)
where (13) may be regarded as a definition of a new variable v. Putting (13) on top of (11) we get  
 \begin{displaymath}
\left[ 
\begin{array}
{c}
 v \\  
 0 \\  
 0 \end{array} \ri...
 ...[ 
\begin{array}
{c}
 1 \\  
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (14)
The solution x of (12) or (14) is that set of xk for which E is a minimum. To get an interpretation of v, we may multiply both sides by $[1 \quad x_1 \quad x_2]$, getting  
 \begin{displaymath}
v \eq [1 \quad x_1 \quad x_2] 
\; \left[ 
\begin{array}
{c}
...
 ...[ 
\begin{array}
{c}
 1 \\  
 x_1 \\  
 x_2 \end{array} \right]\end{displaymath} (15)

Comparing (15) with (7), we see that v is the minimum value of E.

Occasionally, it is more convenient to have the essential equations in partitioned matrix form. In partitioned matrix form, we have for the error (6)  
 \begin{displaymath}
E \eq [1 \ \vdots \ {\bf x}]^T 
\left[ 
\begin{array}
{c}
 -...
 ...1 \\  \hbox to 0.25in{\dotfill} \\  {\bf x} \end{array} \right]\end{displaymath} (16)
The final equation (14) splits into
      \begin{eqnarray}
V &= & {\bf c}^T{\bf c} - {\bf c}^T {\bf Ax}
\\ {\bf 0} &= & -{\bf A}^T {\bf c} + {\bf A}^T {\bf Ax}\end{eqnarray} (17)
(18)
where (18) represents simultaneous equations to be solved for x. Equation (18) is what you have to set up in a computer. It is easily remembered by a quick and dirty (very dirty) derivation. That is, we began with the overdetermined equations ${\bf Ax} \approx {\bf c}$;premultiplying by ${\bf A}^T$ gives $({\bf A}^T {\bf A}) {\bf x} = {\bf A}^T {\bf c}$ which is (18).

In physical science applications, the variable zj is frequently a complex variable, say zj = xj + iyj. It is always possible to go through the foregoing analyses, treating the problem as though xi and yi were real independent variables. There is a considerable gain in simplicity and a saving in computational effort by treating zj as a single complex variable. The error E may be regarded as a function of either xj and yj or zj and $\overline{z}_j$. In general $j = 1,\, 2, \, \ldots , \, N,$ but we will treat the case N = 1 here and leave the general case for the Exercises. The minimum is found where
      \begin{eqnarray}
0 &= & {dE \over dx} 
\eq {\partial E \over \partial z} {dz \ov...
 ...er \partial z} 
- {\partial E \over \partial \overline{z}} \right)\end{eqnarray} (19)
(20)
Multiplying (20) by i and adding and subtracting these equations, we may express the minimum condition more simply as
      \begin{eqnarray}
0 &= & {\partial E \over \partial z}
\\ 0 &= & {\partial E \over \partial \overline{z}}\end{eqnarray} (21)
(22)

However, the usual case is that E is a positive real quadratic function of z and $\overline{z}$ and that $\partial E/\partial z$ is merely the complex conjugate of $\partial E/\partial \overline{z}$. Then the two conditions (21) and (22) may be replaced by either one of them. Usually, when working with complex variables we are minimizing a positive quadratic form like  
 \begin{displaymath}
E(z^*, z) \eq \vert{\bf Az} - {\bf c}\vert^2 \eq ({\bf z}^* {\bf A}^* - {\bf c}^*)
({\bf Az} - {\bf c})\end{displaymath} (23)
where * denotes complex-conjugate transpose. Now (22) gives  
 \begin{displaymath}
0 \eq {\partial E \over \partial z^*} \eq
{\bf A}^* ({\bf Az} - {\bf c})\end{displaymath} (24)
which is just the complex form of (18).

Let us consider an example. Suppose a set of wave arrival times ti is measured at sensors located on the x axis at points xi. Suppose the wavefront is to be fitted to a parabola $t_i \approx a + bx_i + {cx_i}^2$.Here, the xi are knowns and a, b, and c are unknowns. For each sensor i we have an equation  
 \begin{displaymath}[-t_i\quad 1 \quad x_i \quad {x_i}^2]
\; \left[
\begin{array}...
 ...\  
 a \\  
 b \\  
 c \end{array} \right] 
\quad\approx\quad 0\end{displaymath} (25)
When i has greater range than 3 we have more equations than unknowns. In this example, (14) takes the form  
 \begin{displaymath}
\left\{ \sum^n_{i = 1} 
 \left[ \begin{array}
{c}
 -t_i \\  ...
 ...begin{array}
{c}
 v \\  
 0 \\  
 0 \\  
 0 \end{array} \right]\end{displaymath} (26)
This may be solved by standard methods for a, b, and c.

The last three rows of (26) may be written  
 \begin{displaymath}
\sum^n_{i = 1} 
 \left[ \begin{array}
{c}
 1 \\  
 x_i \\  
...
 ...left[ 
\begin{array}
{c}
 0 \\  
 0 \\  
 0 \end{array} \right]\end{displaymath} (27)
This says the error vector ei is perpendicular (or normal) to the functions 1, x, and x2, which we are fitting to the data. For that reason these equations are often called normal equations.

EXERCISES:

  1. Extend (24) by fitting waves observed in the x, y plane to a two-dimensional quadratic.
  2. Let y(t) constitute a complex-valued function at successive integer values of t. Fit y(t) to a least-squares straight line $y(t) \approx \alpha + \beta t$ where $\alpha = \alpha_r + i\alpha_t$and $\beta = \beta_r + i\beta_t$. Do it two ways: (a) Assume $\alpha_r$, $\alpha_t$,$\beta_i$, and $\beta_r$ are four independent variables, and (b) Assume $\alpha$, $\bar {\alpha}$, $\beta$, and $\bar {\beta}$ are independent variables. (Leave answer in terms of $s_n = \sum_t t^n$.)
  3. Equation (14) has assumed all quantities are real. Generalize equation (14) to all complex quantities. Verify that the matrix is Hermitian.
  4. At the jth seismic observatory (latitude xj, longitude yj) earthquake waves are observed to arrive at time tj. It has been conjectured that the earthquake has an origin time t, latitude x, and longitude y. The theoretical travel time may be looked up in a travel time table $T(\Delta)$where T is the travel time and $\Delta$ is the great circle angle. One has

    \begin{displaymath}
\cos \Delta \eq \sin y \sin y_i + \cos y \cos y_i \cos (x - x_i) \end{displaymath}

    The time residual at the jth station, supposing that the earthquake occurred at (x, y, t), is

    \begin{displaymath}
e_j \eq t + T(\Delta_j) - t_j\end{displaymath}

    The time residual, supposing that the earthquake occurred at (x + dx, y + dy, t + dt), is

    \begin{displaymath}
e_j 
\eq t + dt + T(\Delta_j) 
+ \left( {\partial T \over \p...
 ...lta}
 {\partial \Delta \over \partial y} \right)_j \, dy - t_j \end{displaymath}

    Find equations to solve for dx, dy, and dt which minimize the sum-squared time residuals.
  5. Gravity gj has been measured at N irregularly spaced points on the surface of the earth (colatitude xj, longitude yj, j = 1, N). Show that the matrix of the normal equation which fits the data to spherical harmonics may be written as a sum of a column times its transpose, as in the preceding problem. How would the matrix simplify if there were infinitely many uniformly spaced data points? (NOTE: Spherical harmonics S are the class of functions

    \begin{displaymath}
S^m_n (x, y) + P^m_n (\cos x) \exp (imy)\end{displaymath}

    for $(m = -n, \ldots , -1, 0, 1, \ldots , n)$ and $(n = 0, 1, \ldots , \infty )$ where Pmn is an associated Legendre polynomial of degree n and order m.)
  6. Ocean tides fit sinusoidal functions of known frequencies quite accurately. Associated with the tide is an earth tilt. A complex time series may be made from the north-south tilt plus $\sqrt{-1}$ times the east-west tilt. The observed complex time series may be fitted to an analytical form $\sum^N_{j= 1} A_j e^{i\omega} j^t$. Find a set of equations which may be solved for the Aj which gives the best fit of the formula to the data. Show that some elements of the normal equation matrix are sums which may be summed analytically.
  7. The general solution to Laplace's equation in cylindrical coordinates $(r, \theta)$ for a potential field P which vanishes at $r = \infty$ is given by

    \begin{displaymath}
P\, (r, \theta) \eq {\rm Re} \sum^{\infty}_{m = 0} A_m {e^{im\theta} 
\over
 r^{m+1} } \end{displaymath}

    Find the potential field surrounding a square object at the origin which is at unit potential. Do this by finding N of the coefficients Am by minimizing the squared difference between $P(r, \theta)$ and unity integrated around the square. Give the answer in terms of an inverse matrix of integrals. Which coefficients Am vanish exactly by symmetry?


previous up next print clean
Next: WEIGHTS AND CONSTRAINTS Up: Data modeling by least Previous: Data modeling by least
Stanford Exploration Project
10/30/1997