This chapter is devoted to several iterative methods for solving the large sparse linear systems

\begin{equation} \label{EqPart4.1} {\bf A}\,{\bf x} = {\bf b} , \qquad {\bf A} \in \mathbb{C}^{n\times n} , \quad {\bf x}, \ {\bf b} \in \mathbb{C}^{n\times 1} . \end{equation}

As usual, we denote by ℂ^m×n the group of m-by-n matrices with complex coefficients; a similar notation is used for real-valued matrices, ℝ^m×n. Linear equations of the form \eqref{EqPart4.1} frequently arise from the solution of boundary-value problems for partial differential equations. In particular, we come across these linear systems when solving elliptic equations with either finite-difference approximations or with finite element approximations. Large sparse linear systems become apparent from many other practical problems too, of course, and the methods discussed here are important more generally.

Iterative methods are suited for use with sparse matrices. If A is dense, your best course of action is probably to factor A and solve the equation by back substitution. The time spent factoring a dense A is roughly equivalent to the time spent solving the system iteratively; and once A is factored, the system can be back solved quickly for multiple values of b. Compare this dense matrix with a sparse matrix of larger size that ﬁlls the same amount of memory. The triangular factors of a sparse A usually have many more nonzero elements than A itself. Factoring may be impossible due to limited memory, and will be time-consuming as well; even the backsolving step may be slower than iterative solution. On the other hand, most iterative methods are memory-efﬁcient and run quickly with sparse matrices.

Except when the matrix A in Eq.\eqref{EqPart4.1} has very special structure and fast direct methods apply, iterative methods are usually the method of choice for large sparse linear systems. All iterative methods can be divided into two categories: stationary methods and nonstationary iterative methods. Since the linear matrix/vector equation \eqref{EqPart4.1} is not suitable for stationary iterative schemes, it requires preconditioning operation that transforms the given problem into fixed point problem

\[ {\bf x} = f({\bf x}) . \]

There are many ways to transfer a linear system of equation written in succinct form \eqref{EqPart4.1} into fixed point form. For instance, Lewis Fry Richardson proposed in 1910 a simple form

\[ {\bf x} = \left( \mathbf{I} - \tau\,\mathbf{A} \right) {\bf x} + \tau\,{\bf b} , \qquad \tau \ne 0, \]

by adding and subtracting x. This system is equivalent to the original system \eqref{EqPart4.1} for any value of the parameter τ ≠ 0. In general, it is possible to replace the given equation \eqref{EqPart4.1} by an equivalent one

\begin{equation} \label{EqPart4.3} {\bf x} = \mathbf{B}\,{\bf x} + \phi \in \mathbb{C}^{n \times 1} . \end{equation}

This fixed point equation is used to generate a stationary iterative scheme

\begin{equation} \label{EqPart4.2} {\bf x}^{(k+1)} = \mathbf{B}\, {\bf x}^{(k)} + \phi , \qquad k= 0,1,2,\ldots , \end{equation}

which is actually a one-step recursive method because every next approximation x^(k+1) depends solely on one previous step, x^(k). On the other hand, in a nonstationary iterative algorithm, every next iterate is evaluated according to function that depends on step:

\begin{equation*} %\label{EqPart4.3} {\bf x}^{(k+1)} = g_k \left( {\bf x}^{(k)} \right) , \qquad k= 0,1,2,\ldots . \end{equation*}

For transferring the given linear system A x = b into fixed point form, it is convenient to split the coefficient matrix A into a sum of two matrices, one of which must be invertible,

\[ {\bf A} = {\bf S} + \left( {\bf A} - {\bf S} \right) , \qquad \det \mathbf{S} \ne 0 . \]

This allows us to rewrite the given equation (1) as

\[ {\bf A}\,{\bf x} = {\bf b} \qquad \Longrightarrow \qquad {\bf S}\,{\bf x} = {\bf b} + \left( {\bf S} - {\bf A} \right) {\bf x} . \]

Applying the inverse of S, we arrive at the equivalent equation

\[ {\bf x} = {\bf S}^{-1} {\bf b} + {\bf S}^{-1} \left( {\bf S} - {\bf A} \right) {\bf x} = {\bf S}^{-1} {\bf b} + \left( {\bf I} - {\bf S}^{-1} {\bf A} \right) , \]

where I is the identity matrix. This equation yields a classical stationary iterative scheme:

\begin{equation} \label{EqPart4.4} {\bf x}^{(k+1)} = {\bf S}^{-1} {\bf b} + \left( {\bf I} - {\bf S}^{-1} {\bf A} \right) {\bf x}^{(k)}, \qquad k=0,1,2,\ldots . \end{equation}

Stationary Iterative Methods

Theorem 1: Given vector b and splitting A = S + (A − S) of matrix A, with A and S nonsingular, the iteration scheme (4) converges if and only if the spectral radius \[ \rho \left( {\bf S}^{-1} {\bf M} \right) < 1 . \] Here ρ(X) is the largest modulus of eigenvalues of matrix X.

Proof of this main statement can be found in Darve & Wootters book.

Its statement follows from an important theorem of matrix theory:

Theorem 2: If H is a square matrix, then H^kx → 0 as k → ∞ for every vector x ∈ ℂ^n×1 if and only if each eigenvalue λ of H is less than one in modulus.

Let λ₁, λ₂, … , &lambda_n be the eigenvalues of square matrix H. Every square matrix H may be transformed to a blockwise diagonal matrix S^-1H S (the Jordan canonical form of H), each of whose diagonal blocks has the form \[ {\bf J}_i = \begin{bmatrix} \lambda_i & 1 & 0 & \cdots & 0 \\ 0&\lambda_i & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & \lambda_i\end{bmatrix} . \] Note that the same eigenvalue may appear in several blocks. In other words, in a certain oblique column coordinate system (the columns of S) matrix H takes the relatively simple form of a direct sum of transformations J_i. We can decompose an arbitrary vector v in the same coordinate system writing v = v₁ + v₂ + ⋯ + v_m, where the term v_i of the sum corresponds to the Jordan block J_i. We may assume that v_i ≠ 0.

Then we have only to prove the theorem for an arbitrary block of the form above.

Every Jordan block can be decomposed into a sum of the diagonal matrix λ_iI and the nilpotent matrix C: \[ {\bf J}_i = \lambda_i {\bf I} + {\bf C} \in \mathbb{R}^{m \times m} , \] where m = m_{i depends on index i, which we drop for simplicity. Matrix C is independent of th eigenvalue and depends only on the dimension of eigenspace; it consists of one on the superdiagonal and zeroes elsewhere:

\[
{\bf C}_i =
\begin{bmatrix} 0& 1 & 0 & \cdots & 0 \\
0&0 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0&0&0& \cdots & 0 \end{bmatrix} \in \mathbb{R}^{m \times m} .
\]

It can be shown that power matrices C^k (k = 1, 2, …) vanish when power k exceeds the dimension of the Jordan block. You can verify with Mathematica that every time you multiply by C, superdiagonal of ones is shifted up.}

c = {{0, 1, 0}, {0, 0, 1}, {0, 0, 0}}; c.c

\( \displaystyle \quad \begin{pmatrix} 0&0&1 \\ 0&0&0 \\ 0&0&0 \end{pmatrix} \)

c = {{0, 1, 0}, {0, 0, 1}, {0, 0, 0}}; c.c.c

\( \displaystyle \quad \begin{pmatrix} 0&0&0 \\ 0&0&0 \\ 0&0&0 \end{pmatrix} \)

Hence, \begin{align*} {\bf J}_i^k &= \left( \lambda_i {\bf I} + {\bf C} \right)^k = \lambda_i^k {\bf I} + \binom{k}{1} \lambda_i^{k-1} {\bf C} + \binom{k}{2} \lambda_i^{k-2} {\bf C}^2 \\ & \quad + \cdots + \binom{k}{m-1} \lambda_i^{k-m+1} {\bf C}^{m-1} , \tag{E2.1} \end{align*} where \[ \binom{k}{n} = \frac{k^{\underline{n}}}{n!}, \qquad k^{\underline{n}} = k \left( k-1 \right) \left( k-2 \right) \cdots \left( k-n+1 \right) , \] is the binomial coefficient. Here are some values of binomial coefficients: \begin{align*} \binom{k}{1} &= = k , \\ \binom{k}{2} &= = \frac{k(k-1)}{2} = \frac{1}{2} \left( k^2 - k \right) , \\ \binom{k}{3} &= = \frac{k(k-1)(k-2)}{3!} = \frac{1}{6} \left( k^3 - 3k^2 + 2k \right) , \\ \binom{k}{4} &= = \frac{k(k-1)(k-2)(k-3)}{4!} = \frac{1}{24} \left( k^4 - 6k^3 + 11k^2 - 6k \right) . \end{align*} So the binomial coefficient \( \quad \binom{k}{n} \quad \) is a polynomial in k, the upper index, of degree n, the lower index. From Eq.(E2.1), it follows that \begin{align*} {\bf J}_i^k {\bf v} &= \lambda_i^k {\bf v} + \binom{k}{1} \lambda_i^{k-1} {\bf C} {\bf v} + \binom{k}{2} \lambda_i^{k-2} {\bf C}^2 {\bf v} \\ & \quad + \cdots + \binom{k}{m-1} \lambda_i^{k-m+1} {\bf C}^{m-1} {\bf v} . \tag{E2.2} \end{align*} Any eigenvector u_i of the block J_i must satisfy the equation J_iu_i = λ_iu_i, so C_iu_i = 0.

The power of nilpotent matrix Cⁿ is a matrix whose only nonzero elements are "1" in the upper superdiagonal in the upper right corner. Moreover, the power matrix C^m-1 (last term in Eq.(E2.2)) has a single nonzero entry of "1" in the upper right corner.

If |λ_i| < 1, then every term \( \displaystyle \quad \binom{k}{n} \lambda^{k-n}_i {\bf C}^n {\bf v} \quad \) in Eq.(E2.2) turns to zero as k → ∞ because exponential functions decreases faster than any polynomial.

If |λ_i| > 1, then every term in Eq.(E2.2) blows up for any vector v.

If |λ_i| = 1 and λ_i ≠ 1, then every term in Eq.(E2.2) oscillates, so the Jordan block J_iv diverges for any v. If u is an eigenvector corresponding to λ_i with |λ_i| = 1, then J_iu = λ_iv, so C_iu = 0. Expanding arbitrary vector v over eigenvectors (including generalized eigenvectors), we see that at least one part of it containing u oscillates. Therefore, the corresponding Jordan block does not converge.

If λ_i = 1 and m = 1, then the Jordan block becomes one-dimensional and it consists of one diagonal entry of 1. Then if v is an eigenvector corresponding to λ_i = 1, then J_iv = v.

Corollary 1: Power matrix H^kv converges to some limit vector as k → ∞ jf and only if all eigenvalues of matrix H, |λ_i| < 1, or matrix H has simple eigenvalue λ = 1 for each Jordan b;ock.

Following Householder (1958), the number ρ = max |λ_i| is called the spectral radius of matrix H

Brigs, W.L., Multigrid Tutorial,
Burden, R.L. and Faires, J.D., Numerical Analysis, ninth edition, Brows/Cole, Cengage, Boston, MA,
Darve, E. and Wootters, M., Numerical Linear Algebra with Julia, SIAM, Philadelphia, 2021.
Forsythe, G.E. and Wasow, W.R., Finite Difference Methods for Partial Differential Equations. New York: John Wiley & Sons, Inc., 1960.
Loehr, N., Advanced Linear Algebra, CRC Press, 2014.
Watkins, D.S., Fundamentals of Matrix Computations, Wiley; 3rd edition, 2010.
White, R.E., Computational Linear Algebra with Applications and MATLAB Computations, CRC Press Boca Raton, FL, 2023. doi: 10.1201/9781003304128

Introduction to Linear Algebra

Fundamentals

Conditioning and Stability

Direct Methods

Iterative Methods

Factorizations

SVD

Eigenvalues

Functions of Matrices

Miscellany

Preliminaries

Glossary

Reference

Stationary Iterative Methods