While solving a problem numerically, there are two sources of possible errors. First, there may be errors due to inaccurate input data. Second, there are errors caused by the algorithm itself because of used approximations made within the calculations. In order to estimate the errors in the computed answers from both these sources, we need to understand how much the solutions are changed (in this case we say that they are perturbed) if the input data are slightly perturbed.
Condition Numbers
Let us consider a linear system of equations that we write in concise form A x = b, where A is a square invertible matrix (this condition guarantees a unique solution), x is a column vector of unknowns, and b is a given column vector. Now suppose we add a small vector δb to b and consider the perturbed system A z = b
+ δb. This system also has a unique solution z, which we hope is not far away from x. Let δx denote the difference between z and x, so z = x + δx.
In order to quantify the size of vectors, we introduce vector norm, ∥⋅∥. It does not matter which norm you use---all norms are equivalent in finite dimensional vector spaces. We mostly will use the Euclidean norm, denoted ∥⋅∥2. It make sense to speak about small quantity δx by using relative terms: when we say that δx is small, we mean that it is small in comparison with x. Then its size δx relative to x is given by ∥δx∥ / ∥x∥, and the size of δb is given by
∥δb∥ / ∥b∥
The equations A x = b and A(x + δx) = b + δb imply that A(δx) = δb. Since matrix A is invertible, we get δx = A−1(δb). Whatever vector norm has been chosen, we can use the induced matrix norm to measure matrices. The latter equation leads to
The latter provides a bound for ∥δx∥ / ∥x∥ in terms of ∥δb∥ / ∥b∥. The factor ∥A∥ ∥A−1∥ is called the condition number of A and denoted as
Till some extend, the condition number is (very roughly) the rate at which the solution x will change with respect to a change in b.
From this inequality \eqref{EqCond.2}, it follows that if κ(A) is not too large, then small (relative) perturbation in the coefficient b imply small values of ∥δx∥ / ∥x∥. That is, the linear system A x = b is not overly sensitive to perturbation in input data b. As a rule of thumb, if the condition number κ(A) = 10k, then you may lose up to k digits of accuracy on top of what would be lost to the numerical method due to loss of precision from arithmetic method.
If condition number of matrix A is not too large, then the matrix is called well-conditioned. Otherwise, a small value of ∥δb∥ / ∥b∥ does not guarantee that ∥δx∥ / ∥x∥ will be small. This means that the system A x = b is potentially very sensitive to perturbations of b. Thus, if κ(A) is large, matrix A is said to be ill-conditioned.
Theorem 1:
For any induced matrix norm and for any square matrix A, κ(A) ≥ 1.
Since the identity matrix is equal to I = A−1A, we get 1 = ∥I∥ ≤ ∥A−1∥ ∥A∥ = κ(A).
Example 1:
Let us consider the following matrix
\[
\mathbf{A} = \begin{bmatrix} 4.1 & 1.4 \\ 9.7 & 3.3 \end{bmatrix} .
\]
Let us take input data b to be the second column of A, so the solution to A x = b is simply x = [0, 1]T.
A = {{4.1, 1.4}, {9.7, 3.3}};
b = {1.4, 3.3};
LinearSolve[A, b]
{0., 1.}
Now add 0.01 to the first component of b.
b2 = {1.41, 3.3};
LinearSolve[A, b2]
{-0.66, 2.94}
The solution changes dramatically. To calculate the condition number, we enter in Mathematica notebook:
SingularValueList[A]
{11.1243, 0.00449467}
Then ration of largest and smallest singular values gives us the required numerical value
11.124296822630027/0.0044946661166292015
2475
Since the condition number κ(A) = 2475 is large, we claim that the given matrix is ill-conditioned.
With Mathematica, we verify formula (1).
A = {{4.1, 1.4}, {9.7, 3.3}};
AI = Inverse[A]
{{-66., 28.}, {194., -82.}}
The condition number with 2-norm is
kappa2 = Norm[A, 2] * Norm[AI, 2]
2475
It is the same with Frobenius norm:
kappaF = Norm[A, Frobenius] * Norm[AI, Frobenius]
2475
With infinity norm, we have
kappaI = Norm[A, Infinity] * Norm[AI, Infinity]
3588
It is the same as 1-norm
kappa1 = Norm[A, 1] * Norm[AI, 1]
3588
■
End of Example 1
Of course, the condition number depends on the choice of vector norm---they all are equivalent, so these norms differ by a nonzero constant multiple. The most appropriate for our needs is the Euclidean norm, or 2-norm inherited from Hilbert space ℓ², which leads to
where λmax and λmin
are maximal and minimal (by module) eigenvalues of A, respectively.
For vector 1-norm, the corresponding matrix norm (A = [𝑎i,j]) is maximum absolute column sum
There are many different condition numbers. In general, a condition
number applies not only to a particular matrix, but also to the problem being solved.
For instance, let f(x)
be a real-valued differentiate function of a real variable x. Suppose we are given f(x + δx) for some perturbed value of x instead of required value f(x). The best that we can
do (without more information) is to try to bound
the absolute error |f(x + δx) − f(x)|. We may use a simple linear approximation to f to get the estimate
f(x + δx) ≈ f(x) + δxf'(x) and so the error becomes
|f(x + δx) − f(x)| ≈ |δx| ⋅ |f′(x)|. We call |f′(x)| the absolute condition number of f at x. If |f′(x)| is large enough, then the error may be large even if δx is small; in this case, we call fill-conditioned at x.
We can similarly estimate the relative error and define the relative condition number (or often just condition number for short).
\[
\frac{\left\vert f(x + \delta x) - f(x) \right\vert}{\left\vert f(x) \right\vert} \approx \frac{\left\vert \delta x \right\vert}{\left\vert x \right\vert} \cdot \frac{\left\vert f' (x) \right\vert \cdot |x|}{\left\vert f(x) \right\vert} .
\]
The condition number κ(A) also appears in the bound for how much a change E = δA in a matrix A can affect its inverse.