es
The Gaussian elimination algorithm, although unpleasant to work through by hand for any but very small systems, is easily understood, easily programmed, and generally quite effective and efficient. It is one of the most frequently executed algorithms on computers. The Gaussian elimination algorithm has been known in Western mathematics since Gauss published it in 1809 and in Chinese mathematics since about A.D. 200 or earlier. However, in rare cases, it gets into serious trouble.

For solving linear equations, the natural class of numbers would be the rational numbers; given any problem posed in terms of rational numbers, the exact answer and all the intermediate values calculated are also rational numbers. However, for algorithms such as Gaussian elimination, exact arithmetic on rational numbers is entirely impractical because the number of bits required to maintain exactness increases exponentially with the size of the matrix. Like- wise, then, time and space requirements increase exponentially. More sophis- ticated algorithms are known that are polynomial in the number of bits, but they are very sophisticated to be implemented.

Viewpoints on numerical computation.
Pure
math
Exact
computation
Floating
point
Realizable ×
Exact answer ×
Reasonable
computation time
×

Sensitivity of Solutions

Previously, we discussed how to solve linear vector equation Ax = b using elimination procedure. One might expect that if the the entries of matrix A are slightly modified, then its solution will be closed to the original solution. Unfortunately, there exist linear systems Ax = b whose solutions are not stable under small perturbations of either b or A. Now we pay more attention to this issue.
Example 1: Consider the following two systems. The first one
\[ \begin{split} 577\,x + 880\,y &= 3, \\ 28\,x + 427\,y &= 9, \end{split} \]
has the unique solution, as Mathematica confirms
Solve[{577 x + 880 y ==3, 28 x + 427 y == 9}, {x,y}]
{{x -> -(2213/73913), y -> 1703/73913}}
{{x -> -0.0299406, y -> 0.0230406}}
We will call the second system as the approximate system, and it is given as follows:
\[ \begin{split} 577\,x + 880\,y &= 3, \\ 28\,x + 427.1\,y &= 9, \end{split} \]
Solve[{577 x + 880 y ==3, 28 x + 427.1 y == 9}, {x,y}]
{{x -> -0.0299315, y -> 0.0230346}}
In matrix form, the augmented matrices for these two systems are given by
\[ \begin{bmatrix} 577&880&3 \\ 280&427&9 \end{bmatrix} \qquad\mbox{and} \qquad \begin{bmatrix} 577&880&3 \\ 280&427.1&9 \end{bmatrix} \]
It is apparent that these two systems have very close solution sets.    ■
Example 2: We consider two systems of linear equations with almost identical entries:
\[ \begin{split} 577\,x + 880\,y &= 3, \\ 280\,x + 427\,y &= 9, \end{split} \]
and the approximate system
\[ \begin{split} 577\,x + 880\,y &= 3, \\ 280\,x + 427.1\,y &= 9, \end{split} \]
We use Mathematica to solve these systems:
LinearSolve[{{577, 880}, {280, 427}}, {3, 9}]
\( \left\{ \frac{2213}{7} , \frac{3451}{7} \right\} \)
N[%]
{316.143, -207.286}
Now for approximate system
N[LinearSolve[{{577, 880}, {280, 427.1}}, {3, 9}]]
{-180.891, 118.61}
So we see that solutions two these two systems are completely different. This is clearly interesting, and we need to know why the behavior of solutions are so unstable to small changes. In real life, there is almost always error in a linear system because its values are derived from measurements with inherent error and humans are sometimes are not accurate. We must know whether it is possible to avoid this sensitivity to error and also what causes it.

Since we used a two-dimensional system, it can be visualized easily in the plane. So let us solve for y in each equation.

Expand[NSolve[577 x + 880 y == 3, y]]
{{y -> 0.00340909 - 0.655682 x}}
Expand[NSolve[280 x + 427 y == 9, y]]
{{y -> 0.0210773 - 0.655738 x}}
Note that these two lines are very close to being parallel because they have nearly identical slopes. This may be the cause of our sensitivity to slight amounts of error since a small change in the slope of one line can move their intersection point a great distance from the original intersection point.    ■
Example 3: We reconsider the previous example, which we rewrite using floating points to isolate one variable:
\[ \begin{split} 0.655682 \,x + y &= 0.00340909 \\ \left( 0.655738 + E \right) x + y &= 0.0210773 , \end{split} \]
where the error in the slope is E. The exact system is the one for E = 0, and they become parallel lines when E = 0.0292836 with no solution the system. When E = 0, the solution to the system is as given below:
Ae = {{0.655682, 1, 0.00340909}, {0.655738, 1, 0.0210773}}
RAe = RowReduce[Ae]
exactSol = RAe[[All, -1]]
{315.516, -206.875}
Note that if we use more decimal places and solve the equations
\[ \begin{split} 0.6556818181818181 \,x + y &= 0.003409090909090909 , \\ 0.6557377049180327\, x + y &= 0.02107728337236534 , \end{split} \]
we get more accurate answer:
\[ x = 316.143, \qquad y = -207.286 . \]
The exact solution's x coordinate is given by exactSol[[1]] and its y coordinate is given by exactSol[[2]]. For E ≠ 0, if we denote X and Y as solutions to the approximate system, then its distance from the exact solution is given by
\[ d = \sqrt{\left( X - exactSol[[1]] \right)^2 + \left( Y - exactSol[[2]] \right)^2} . \]
We ask Mathematica for help with these calculations:
As = {{0.655682, 1, 0.00340909}, {0.655738 +EE, 1, 0.0210773}}
approxSol = RowReduce[As] [[All, -1]]
{(1. (0.0176682 + 8.67362*10^-19 EE))/(0.000056 + 1. EE), ( 0.00340909 (-3.39813 + 1. EE))/(0.000056 + 1. EE)}
Next, we plot error function:
errorFun = Sqrt[(exactSol[[1]] - approxSol[[1]])^2 + (exactSol[[2]] - approxSol[[2]])^2]; Plot[errorFun, {EE, -1,1}, PlotRange -> {{-0.3, 0.3}, {340, 400}}, PlotStyle -> {{Red, Thickness[0.01]}}, AxesOrigin ->{-0.3,340}]
Plot of error within interval (-0.3, 0.3)
The plot of the distance between the exact solution and the approximate system solutions has a vertical asymptote at E = = 0.0292836, which is the value of E that makes the lines parallel, and so the system is without a solution. The closer E is to this value, the larger the error. As E moves away from 0.0292836, the error stabilizes at a value close to 278.

We are going to plot these solutions as a parametric plot in the xy-plane to see their behavior more clearly.    ■

Example 4: Consider the system
\[ \begin{bmatrix} 10&7&8&7 \\ 7&5&6&5 \\ 8&6&10&9 \\ 7&5&9&10 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{bmatrix} 32 \\ 23 \\ 33 \\ 31 \end{bmatrix} . \]
It is easy to verify that the given system has the solution x = [1, 1, 1, 1]. If we perturb slightly the right-hand side,obtaining the new system
\[ \begin{bmatrix} 10&7&8&7 \\ 7&5&6&5 \\ 8&6&10&9 \\ 7&5&9&10 \end{bmatrix} \begin{pmatrix} x_1 + \Delta x_1 \\ x_2 + \Delta x_2 \\ x_3 + \Delta x_3 \\ x_4 + \Delta x_4 \end{pmatrix} = \begin{bmatrix} 32.1 \\ 22.9 \\ 33.1 \\ 30.9 \end{bmatrix} , \]
the new solutions turns out to be
\[ {\bf x} = \left[ 9.2, -12.6, 4.3, -1.1 \right] . \]
In other words, a relative error of the order 1/200 in th edata (here, b) produces a relative error of the order 10/1 in the solution, which represents an amplification of the relative error of the order 2000.

Now, let us perturb the matrix slightly, obtaining the new system

\[ \begin{bmatrix} 10&7&8.1&7.2 \\ 7.08&5.04&6&5 \\ 8&5.98&9.98&9 \\ 6.99&4.99&9&9.98 \end{bmatrix} \begin{pmatrix} x_1 + \Delta x_1 \\ x_2 + \Delta x_2 \\ x_3 + \Delta x_3 \\ x_4 + \Delta x_4 \end{pmatrix} = \begin{bmatrix} 32 \\ 23 \\ 33 \\ 31 \end{bmatrix} . \]
This time, the solution is
\[ {\bf x} = \left[ -81, 137, -34, 22 \right] . \]
Again, a small change in the data alters the result ratherdrastically.    ■
The problem is that the matrix of the system is badly conditioned, a concept that we will now explain.

 

Determinant value and ill-conditioning


As some previous examples show, there exist some pathological problems of finding solutions to a linear equation A x = b that can flank the robust numerical solvers dedicated for such problems. To analyze the sensitivity of solutions to linear equations A x = b we have to employ a matrix norm ‖ · ‖ (see section), but not the determinant. It turns out that the determinant does not reflect whether matrix elements are small or large.

Let us consider a linear equation with the matrix

\[ {\bf A} = \begin{bmatrix} 3\times 10^5 & 2\times 10^5 \\ 4\times 10^5 & 5\times 10^5 \end{bmatrix} . \]
Its determinant is det(A) = 7×1010 and its Euclidean norm ≈ 728538. So det(A) ≫ ‖ A ‖.

Now suppose we divide the 2×2 system for our example by 1010 without changing anything in the statement of the problem. In other words, multiplying or dividing the matrix A should not lead to fundamental change. This yields the matrix

\[ {\bf B} = \begin{bmatrix} 3\times 10^{-5} & 2\times 10^{-5} \\ 4\times 10^{-5} & 5\times 10^{-5} \end{bmatrix} . \]
We find det(B) = 7×10-10, so det(A) ≪ ‖ B ‖ ≈ 0.0000728538.

Using matrix perturbation theory and carefully study the effect a small change of the given matrix on the solution vector x + Δx looks like when when you start with A + ΔA. Following Trefethen & Bau, when we change A into A + δA, its inverse A−1 changes to A−1 + δ(A−1):

\[ \delta \left({\bf A}^{-1} \right) = - {\bf A}^{-1} \cdot \delta{\bf A} \cdot {\bf A}^{-1}. \]
Then we get
\begin{equation} \label{EqSent.1} \frac{\| \Delta {\bf x} \|}{\| {\bf x} \|} \le \| {\bf A} \| \, \| {\bf A}^{-1} \| \, \frac{\| \Delta {\bf A} \|}{\| {\bf A} \|} . \end{equation}
This relates an error bound on ‖Δx‖/ ‖x‖ with an error bound on ‖ΔA‖/ ‖A‖. The coefficient in front of ‖ΔA‖/ ‖A‖ tells us whether or not a small perturbation will tend to get magnified. Equation \eqref{EqSent.1} naturally leads us to introducing the condition number
\begin{equation} \label{EqSent.2} \kappa ({\bf A}) = \| {\bf A} \| \, \| {\bf A}^{-1} \| . \end{equation}
When κ(A) is of order unity, we are dealing with a well-conditioned problem, i.e., a small perturbation is not amplified. Conversely, when κ(A) is large, a perturbation is typically amplified, so our problem is ill-conditioned. Crucially, this condition number involves both the matrix A and its inverse A−1. Hence, the condition number has nothing to do with the determinant. As a result, an overall scaling has no effect. For example, the condition's numbers of the previously considered 2×2 matrices are
\[ \kappa ({\bf A}) = 7.5824, \qquad \kappa ({\bf B}) =7.5824 \]
A = {{3*10^5 , 2*10^5}, {4*10^5, 5*10^5}}
N[Norm[A] *Norm[Inverse[A]]]
7.582401374401513
These numbers are not vwry large, so the corresponding system of linear equations are well-conditions.
Example 5: We reconsider the system from Example 4:
\[ \begin{bmatrix} 10&7&8&7 \\ 7&5&6&5 \\ 8&6&10&9 \\ 7&5&9&10 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{bmatrix} 32 \\ 23 \\ 33 \\ 31 \end{bmatrix} . \]
Using Mathematica, we find the corresponding condition number:
A = {{10,7,8,7}, {7,5,6,5},{8,6,10,9},{7,5,9,10}};
kappa = N[Norm[A]*Norm[Inverse[A]]]
2984.09
Since this number is around 3,000, the corresponding problem is ill-conditioned. Now we check with another norms:
N[Norm[A, 1]*Norm[Inverse[A], 1]]
4488
N[Norm[A, Infinity]*Norm[Inverse[A], Infinity]]
4488
N[Norm[A, "Frobenius"]*Norm[Inverse[A], "Frobenius"]]
3009.58
All of them indicate that the problem is ill-conditioned.    ■
\( \varnothing \)

 


  1. Higham, Nicholas, Gaussian Elimination, Manchester Institute for Mathematical Sciences, School of Mathematics, The University of Manchester, 2011.
  2. Trefethen, L.N., Bau, D. III, Numerical Linear Algebra, Society for Inductrial and Applied Mathematics, Pennsylvania, 1997.