The conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations A x = b with symmetric (or self-adjoint in case when A ∈ ℂn×n) positive definite matrix A. The biconjugate gradient method provides a generalization to non-symmetric matrices (but it may suffer of numerical instability). The conjugate gradient method always converges to the exact solution of A x = b in a finite number of iterations (in exact arithmetic). In this sense it is not really an iterative method from mathematical prospective. It can be viewed as a “direct method” like Gaussian elimination, in which a finite set of operations produces the exact solution. However, there is a good reason to interprit the conjugate gradient method as an iterative method than a direct method---it generates a finite sequence of approximations

\begin{equation} \label{EqCG.1} {\bf x}^{(k)} = g_k \left( {\bf x}^{(0)} , {\bf x}^{(1)} , \ldots , {\bf x}^{(k-1)} \right) , \qquad k = 0, 1, 2, \ldots , m \leqslant n , \end{equation}
to the true solution x* of equation A x = b. The corresponding residual and eror at every step are
\[ {\bf r}^{(k)} = {\bf b} - {\bf A}\,{\bf x}^{(k)} , \qquad \mbox{and} \qquad e_k = {\bf x}^{\ast} - {\bf x}^{(k)} , \]
respectively. Eq.\eqref{EqCG.1} is called the full-history recurrence; however, it is known as nonstationary iterative method in computational linear algebra.

The conjugate gradient method (henceforth, CG) was developed independently by the American mathematician Magnus Hestenes (1906–1991) and Eduard Steifel (1907–1998) from Switzerland in the early fifties. Later, they met during the "Symposium on Simultaneous Linear Equations and the Determination of Eigenvalues," organized in Los Angeles in 1951 by the Institute of Numerical Analysis of the U.S. National Bureau of Standards. Then they realized that their algorithms were the same and wrote a famous joint paper that was published in 1952. About the same time, Cornelius Lanczos (1893--1974) invented a similar algorithm, but he allpied it to eigenvalue problems. Actually, the CG algorithm is a particular case of more general method known as the Krylov subspace mathod that was first published in 1931 by the Russian mathematician and naval engineer Alexey Krylov (1863--1945).

To solve an n × n positive definite linear system by a direct method and the CG algorithm, it requires n steps by both methods to determine a solution, and the steps of the conjugate gradient method are more computationally expensive than those of Gaussian elimination. However, the conjugate gradient method is useful for solving large sparse systems with nonzero entries occurring in predictable patterns. When the matrix has been preconditioned to make the calculations more effective, good results are obtained in only about √n iterations.


Conjugate Gradient Method

We consider a system of n equations with n unknowns

\[ \mathbf{A}\,{\bf x} = {\bf b} \in \mathbb{R}^{n\times 1} , \]
where x is an unknown column vector, b is a known column vector, and A is a known, square, symmetric, positive-definite matrix. These systems arise in many important settings, such as finite difference and finite element methods for solving partial differential equations, structural analysis, circuit analysis, and math homework.    

Example 1: We start with a one-dimensional version of Poisson's equation for Laplace's operator Δ, \[ -\frac{{\text d}^2 u}{{\text d}x^2} = f(x) , \qquad 0 < x < \ell , \tag{1.1} \] where f(x) is a given function and u(x) is the unknown function that we want to compute. Function u(x) must also satisfy the Dirichlet boundary conditions u(0) = u(ℓ) = 0. Of course, other boundary conditions (such as Neumann or of the third kind) can be supplied to Laplace's operator; however, we choose one of possible simple conditions for our illustrative example.

Recall that a differential operator is called positive (or negative) if all its eigenvalues are positive (or negative, respectively). Since the second order derivative operator \( \displaystyle \quad \texttt{D}^2 = {\text d}^2 / {\text d}x^2 \quad \) is negative, it is customary to negate the Laplacian.

We discretize the boundary value problem by trying to compute an approximate solution at N + 2 evenly spaced points xi between 0 and ℓ: xi = i h, where h = ℓ/(N+1) and 0 ≤ iN + 1. We abbreviate ui = u(xi) and fi = f(xi). To convert differential equation (1.1) into a linear equation for the unknowns u₁, u₂, … , uN, we use finite differences to approximate \[ \left. \frac{{\text d}u}{{\text d}x} \right\vert_{x = (i-1/2) h} \approx \frac{u_{i} - u_{i-1}}{h} , \] \[ \left. \frac{{\text d}u}{{\text d}x} \right\vert_{x = (i+1/2) h} \approx \frac{u_{i+1} - u_{i}}{h} . \] Subtracting these approximations and dividing by h yield the centered difference approximation \[ \left. \frac{{\text d}^2 u}{{\text d}x^2} \right\vert_{x = i h} \approx \frac{2 u_i - u_{i-1} - u_{i+1}}{h^2} + \tau_i , \] where τi, the so-called truncation error, can be shown to be \( \displaystyle \quad O\left( h^2 \cdot \left\| \frac{{\text d}^4 u}{{\text d}x^4} \right\|_{\infty} \right) . \quad \) We may now rewrite equation (1.1) at x = xi as \[ - u_{i-1} + 2\,u_i - u_{i+1} = h^2 f_i + h^2 \tau_i , \tag{1.2} \] where 0 < i < N + 1. From Dirichlet's boundary conditions, we derive that u₀ = uN+1 = 0. Hence, we get N equations with N unknowns u₁, u₂, … , uN: \[ {\bf T}_N {\bf u} = h^2 \begin{pmatrix} f_1 \\ \vdots \\ f_N \end{pmatrix} + h^2 \begin{bmatrix} \tau_1 \\ \vdots \\ \tau_N \end{bmatrix} , \] where \[ {\bf T}_N = \begin{bmatrix} 2 & -1 & 0 & \cdots & 0 \\ -1& 2 & -1 & \cdots &0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & -1 \end{bmatrix} , \qquad {\bf u} = \begin{pmatrix} u_1 \\ \vdots \\ u_N \end{pmatrix} \tag{1.3} \] To solve this system of equations, we will ignore terms with τ because they are small compared to f. This yields \[ {\bf T}_N {\bf v} = h^2 {\bf f} . \] The coefficient matrix TN plays a central role in all that follows, so we will examine it in some detail. First, we will compute its eigenvalues and eigenvectors. One can easily use trigonometric identities to confirm the following lemma.

make picture of eigenvalues for matrix T_{22}
Lemma 1: The eigenvalues of TN are \( \displaystyle \quad \lambda_j = 2 \left( 1 - \cos \frac{\pi j /\ell}{N+1} \right) . \quad \) The corresponding eigenvectors are \[ z_j (k) = \sqrt{\frac{2}{N+1}}\,\sin \left( \frac{jk\pi /\ell}{N+1} \right) , \qquad \| z_j \|_2 = 1. \] Let Z = [ z₁, z₂, … , zN ] be the orthogonal matrix whose columns are the eigenvectors, and Λ = diag(λ₁,λ₂, … , λN), so we can write \( \displaystyle \quad {\bf T}_N = {\bf Z}\,\Lambda \,{\bf Z}^{\mathrm T} . \)
Hence, the largest eigenvalue is \( \displaystyle \quad \lambda_N = 2 \left( 1 - \cos \frac{}{N+1} \right) \quad \) and the smallest eigenvalue is λ₁, where for small i, \[ \lambda_i = 2 \left( 1 - \cos \frac{i\pi}{N+1} \right) \approx 2 \left( 1 - \left( 1 - \frac{i^2 \pi^2}{2 (N+1)^2} \right) \right) = \left( \frac{i\pi}{N+1} \right)^2 . \] Therefore, matrix TN is symmetric and positive definite with condition number \( \displaystyle \quad \lambda_N / \lambda_1 \approx \frac{4}{\pi^2} \left( N+1 \right)^2 \quad \) for large N. The eigenvectors are sinusoids with lowest frequency at j = 1 and highest at j = N.

Now we know enough to bound the error, i.e., the difference between the solution of \( \displaystyle \quad {\bf T}_N \hat{\bf u} = h^2 {\bf f} \quad \) and the true solution u of the Dirichlet boundary value problem. Taking norms, we get \[ \| u - \hat{u} \|_2 \le h^2 \| {\bf T}_N^{-1} \|2 \| \tau \|_2 \approx h^2 \left( \frac{N+1}{\pi} \right)^2 \| \tau \|_2 , \] which has an order of \[ O \left( \| \tau \|_2 \right) = O \left( h^2 \left\| \frac{{\text d}^4 u}{{\text d} x^4} \right\|_{\infty} \right) . \] Hence, the error \( \displaystyle \quad u - \hat{u} \quad \) goes to zero proportionally to h², provided that the solution is smooth enough (its fourth derivative is bounded on the interval [0, ℓ]).

From now on we will not distinguish between u and its approximation \( \displaystyle \quad \hat{u} \quad \) and so it will simplify notation by letting TNu = h²f. It turns out that the eigenvalues and eigenvectors of h−2TN also approximate the eigenvalues and eigenfunctions of the given boundary value problem (BVP). So we say that \( \displaystyle \quad \hat{\lambda}_i \quad \) is an eigenvalue and \( \displaystyle \quad \hat{z}_i (x) \quad \) is an eigenfunction of the given BVP if \[ - \frac{{\text d}^2 \hat{\bf z}_i (x)}{{\text d} x^2} = \hat{\lambda}_i \hat{\bf z}_i (x) , \qquad \hat{\bf z}_i (0) = \hat{\bf z}_i (\ell ) = 0 . \tag{1.4} \] Let us solve the Sturm--Liouville problem above for \( \displaystyle \quad \hat{\lambda}_i \quad \) and unknown function \( \displaystyle \quad \hat{z}_i (x) . \quad \) The general solution of Eq.(1.4) is a linear combination of trigonometric functions: \[ \hat{z}_i (x) = \alpha\,\sin \left( \sqrt{\hat{\lambda}_i}\, x \right) + \beta\,\cos \left( \sqrt{\hat{\lambda}_i}\, x \right) , \] for some real constants α and β. The boundary condition u(0) = 0 dictates that β must vanish. In order to satisfy another boundary condition at x = ℓ, we have to choose λ so that \( \displaystyle \quad \sqrt{\hat{\lambda}_i}\,\ell = i \pi \quad \) is multiple of π. Thus, \[ \hat{\lambda}_i = i^2 \pi^2 / \ell^2 \qquad\mbox{and} \qquad \hat{z}_i (x) = \sin \left( \frac{i \pi x}{\ell} \right) , \quad i =1,2,\ldots . \] Since eigenfunction can be multiplied by any nonzero constant, we set α = 1 without any loss of generality. The eigenvector zi is precisely equal to the eigenfunction \( \displaystyle \quad \hat{z}_i (x) \quad \) evaluated at the sample points xj = jh when scaled by \( \quad \sqrt{\frac{2}{N+1}} . \quad \) When i is small, the eigenvalue is well approximated by \[ h^{-2} \lambda_i = \left( N+1 \right)^2 \left( 1 - \cos \left( \frac{i\pi / \ell}{N+1} \right) \right) = O \left( \frac{1}{(N+1)^2} \right) . \] Thus, we see there is a close correspondence between matrix operator TN or h−2TN and the second derivative operator \( \displaystyle \quad -\texttt{D}^2 = {\text d}^2 / {\text d}x^2 . \quad \) This correspondence will be the motivation for the design and analysis of later algorithms.

Note:    The boundary conditions for Poisson's equation may lead to nn-symmetric analogue of matrix TN and/or positive semi-definite matrices.    ■

End of Example 1
   
Example 2:    ■
End of Example 2
   
Example 3: Now we turn to Poisson's equation in two dimensions: \[ - \frac{\partial^2 u(x,y)}{\partial x^2} - \frac{\partial^2 u(x,y)}{\partial y^2} = f(x,y) \] on the unit (for simplicitly) square:    0 < x, y < 1, with homogeneous boundary condition u = 0 on the boundary of the square. We discretize at the grid points in the square which are at ( xi, yj ) with    ■
End of Example 3
   
Example 4: Now we turn to Poisson's equation in three dimensions: \[ - \frac{\partial^2 u(x,y,z)}{\partial x^2} - \frac{\partial^2 u(x,y, z)}{\partial y^2} - \frac{\partial^2 u(x,y, z)}{\partial z^2} = f(x,y,z) \] in the unit (for simplicitly) cube:    0 < x, y, z < 1, with homogeneous boundary condition u = 0 on the boundary of the square. We discretize at the grid points in the square which are at ( xi, yj ) with    ■
End of Example 4

There are two similar notations for inner product of two vecors x and y, one is widely used by mathematicians when vectors are separated by a comma, and Dirac's notation when vectors are separated by vertical bar:

\[ \langle \mathbf{x} , \mathbf{y} \rangle = \langle \mathbf{x} \mid \mathbf{y} \rangle = \sum_{i=1}^n x_i y_i = {\bf x} \bullet {\bf y} , \]
where \( \displaystyle \quad \mathbf{x} = \left( x_1 , x_2 , \ldots , x_n \right) \quad \) and \( \displaystyle \quad \mathbf{y} = \left( y_1 , y_2 , \ldots , y_n \right) \quad \) are n-dimensional vectors with real entries. It does not matter whether vectors belong to ℝn or column space ℝn×1 or row space ℝ1×n---it is impostant that both vectors x and y are from the same vector space. When vectors are complex, dot product xy must be replaced by complex conjugate:
\[ \langle \mathbf{x} , \mathbf{y} \rangle = \langle \mathbf{x} \mid \mathbf{y} \rangle = \sum_{i=1}^n x_i^{\ast} y_i \qquad \left( x_i^{\ast} = \overline{x_i} \right) . \]

Plot3D[x^2 - x*y + y^2, {x, -4, 4}, {y, -4, 4}, PlotStyle -> Orange, AspectRatio -> 1] Plot3D[x^2 + x*y- y^2, {x, -4, 4}, {y, -4, 4}, PlotStyle -> Blue, AspectRatio -> 1]

Positive definite function.
       
Undefined function.

A self-adjoint matrix is positive definite if and only if all its eigenvalues are positive. Unfortunately, mathematicians are not consistent: a linear operator is called positive if all its eigenvalues are positive. This definition from functional analysis does not require matrix A (as operator) to be self-adjoint (A = A).

Throughout this section we assume that the matrix A ∈ ℝn×n is symmetric positive definite.

When square real matrix A is positive definite, ⟨x , A x⟩ = ⟨xA x⟩ > 0 for x0. Also, since A is symmetric, we have ⟨x , A x⟩ = ⟨A xx⟩, which justifies the Dirac notation: ⟨xAx⟩ for ⟨xA x⟩. Therefore, we can introduce A-inner product for symmetric matrices

\begin{equation} \label{EqCG.2} \langle \mathbf{x} \mid \mathbf{y} \rangle_A = \langle \mathbf{x} \mid {\bf A}\,\mathbf{y} \rangle = \langle {\bf A}^{\ast}\mathbf{x} \mid \mathbf{y} \rangle = \langle {\bf A}\,\mathbf{x} \mid \mathbf{y} \rangle \end{equation}
and corresponding A-norm
\begin{equation} \label{EqCG.3} \| \mathbf{x} \|_A = +\sqrt{ \langle \mathbf{x} \mid \mathbf{x} \rangle_A } = + \sqrt{\langle \mathbf{x} \mid {\bf A}\,\mathbf{x} \rangle} = + \sqrt{\langle {\bf A}^{\ast} \mathbf{x} \mid \mathbf{x} \rangle} . \end{equation}
We turn now to an iterative method based on searching approach. To find the solution of a linear system A x = b, the conjugate-gradient method searches for the solution of an equivalent minimization problem. The following result is a basic tool in the development of the conjugate gradient method.
Theorem 1: The vector x is a solution to the positive definite linear system A x = b if and only if x produces the minimal value of the functional
\begin{equation} \label{EqCG.2} \phi ({\bf u}) = \frac{1}{2}\,\left\langle {\bf u} \mid {\bf A}\,{\bf u} \right\rangle - \left\langle {\bf u} \mid {\bf b} \right\rangle . \end{equation}
   
Example 5:    ■
End of Example 5
\[ \mathbf{x} = g({\bf x}) . \]
Using an initial guess x(0), we generate a sequence of iterative values
\[ \mathbf{x}^{(k+1)} = g\left( {\bf x}^{(k)} \right) , \qquad k=0,1,2,\ldots . . \]
Its realization involves splitting the main matrix A into two parts, A = S + (AS), where S must be invertible square matrix. Upon multiplication both sides of the given equation by S−1, we can reduce the given problem to a fixed point problem:
\[ \mathbf{x} = {\bf B}\,{\bf x} + {\bf d} , \qquad {\bf B} = {\bf I} - {\bf S}^{-1} {\bf A} , \]
which is suitable for sequential iterations. However, stationary iterative methods utilize the same function g(x) for every iteration loosing all information gathered throughout the iteration. Now we are going to use function g(x) that depends on every iteration, so g(x) = gk(x) for every step k.


Gradient Method

The methods considered in this section is based on solution in the form:

\[ \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \alpha_k {\bf p}_k , \qquad k=0,1,2,\ldots , \]
where vector pk is the search direction and the scalar αk is the step size. Note that this formula includes basic stationary methods, since such methods operate with α ≡ 1 and vector pk does not depend on k.

The simplest nonstationary scheme based on setting a varying search direction and step size is ob- tained by setting p k = r k , i.e., M k −1 = α k I , with I the identity matrix. The resulting family of methods is called gradient descent.

Using spliting with nonsingular matrix S, we rewrite the main equation A x = b as
\[ \mathbf{S}\,\mathbf{x} = \left( \mathbf{S} - \mathbf{A} \right) \mathbf{x} + \mathbf{b} \qquad \iff \qquad \mathbf{x} = \mathbf{x} - \mathbf{S}^{-1} \mathbf{A}\,\mathbf{x} + \mathbf{S}^{-1} \mathbf{b} . \]
This identity leads to the ietrative scheme
\[ \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - \mathbf{S}^{-1} \mathbf{A}\,\mathbf{x}^{(k)} + \mathbf{S}^{-1} \mathbf{b} , \qquad k=0,1,2,\ldots . \]
We are looking at the gradient descent iteration
\[ \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \alpha_k \mathbf{r}_k , \]
where the residual vector is
\[ \mathbf{r}_k = {\bf b} - {\bf A}\,{\bf x}^{(k)} , \qquad k=0,1,2,\ldots . \]

 

  1. Burden, R.L. and Faires, J.D., Numerical Analysis, nineth edition, Broks/Cole, Cengage, Boston, MA,
  2. Darve, E. and Wootters, M., Numerical Linear Algebra with Julia, SIAM, Philadelphia, 2021.
  3. Watkins, D.S., Fundamentals of Matrix Computations, Wiley; 3rd edition, 2010.
  4. White, R.E., Computational Linear Algebra with Applications and MATLAB Computations, CRC Press Boca Raton, FL, 2023. doi: 10.1201/9781003304128

 

  1. Barrett, R., Berry, M., Chan, T., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and van der Vorst, H., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, Pennsylvania, 1993.
  2. Burden, R.L. and Faires, J.D., Numerical Analysis, nineth edition, Broks/Cole, Cengage, Boston, MA,
  3. Darve, E. and Wootters, M., Numerical Linear Algebra with Julia, SIAM, Philadelphia, 2021.
  4. Hestenes, Magnus R.; Stiefel, Eduard (December 1952). "Methods of Conjugate Gradients for Solving Linear Systems" (PDF). Journal of Research of the National Bureau of Standards. 49 (6): 409. doi:10.6028/jres.049.044
  5. Lanczos, C. (1950). "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators" (PDF). Journal of Research of the National Bureau of Standards. 45 (4): 255–282. doi:10.6028/jres.045.026
  6. Loehr, N., Advanced Linear Algebra, CRC Press, 2014.
  7. Shewchuk, J.R., An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, 1994, School of Computer Science, Carnegie Mellon University.
  8. <
  9. Shi, X., Wei, Y., and Zhang, W., Convergence of General Nonstationary Iterative Methods for Solving Singular Linear Equations, Taylor & Francis Online, Volume 38, Issue 11,
  10. /ol>