The conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations A x = b with symmetric (or self-adjoint in case when A ∈ ℂn×n) positive definite matrix A. The biconjugate gradient method provides a generalization to non-symmetric matrices (but it may suffer of numerical instability). The conjugate gradient method always converges to the exact solution of A x = b in a finite number of iterations (in exact arithmetic). In this sense it is not really an iterative method from mathematical prospective. It can be viewed as a “direct method” like Gaussian elimination, in which a finite set of operations produces the exact solution. However, there is a good reason to interprit the conjugate gradient method as an iterative method than a direct method---it generates a finite sequence of approximations
The conjugate gradient method (henceforth, CG) was developed independently by the American mathematician Magnus Hestenes (1906–1991) and Eduard Steifel (1907–1998) from Switzerland in the early fifties. Later, they met during the "Symposium on Simultaneous Linear Equations and the Determination of Eigenvalues," organized in Los Angeles in 1951 by the Institute of Numerical Analysis of the U.S. National Bureau of Standards. Then they realized that their algorithms were the same and wrote a famous joint paper that was published in 1952. About the same time, Cornelius Lanczos (1893--1974) invented a similar algorithm, but he allpied it to eigenvalue problems. Actually, the CG algorithm is a particular case of more general method known as the Krylov subspace mathod that was first published in 1931 by the Russian mathematician and naval engineer Alexey Krylov (1863--1945).
To solve an n × n positive definite linear system by a direct method and the CG algorithm, it requires n steps by both methods to determine a solution, and the steps of the conjugate gradient method are more computationally expensive than those of Gaussian elimination. However, the conjugate gradient method is useful for solving large sparse systems with nonzero entries occurring in predictable patterns. When the matrix has been preconditioned to make the calculations more effective, good results are obtained in only about √n iterations.
Conjugate Gradient Method
We consider a system of n equations with n unknowns
Example 1: We start with a one-dimensional version of Poisson's equation for Laplace's operator Δ, \[ -\frac{{\text d}^2 u}{{\text d}x^2} = f(x) , \qquad 0 < x < \ell , \tag{1.1} \] where f(x) is a given function and u(x) is the unknown function that we want to compute. Function u(x) must also satisfy the Dirichlet boundary conditions u(0) = u(ℓ) = 0. Of course, other boundary conditions (such as Neumann or of the third kind) can be supplied to Laplace's operator; however, we choose one of possible simple conditions for our illustrative example.
Recall that a differential operator is called positive (or negative) if all its eigenvalues are positive (or negative, respectively). Since the second order derivative operator \( \displaystyle \quad \texttt{D}^2 = {\text d}^2 / {\text d}x^2 \quad \) is negative, it is customary to negate the Laplacian.
We discretize the boundary value problem by trying to compute an approximate solution at N + 2 evenly spaced points xi between 0 and ℓ: xi = i h, where h = ℓ/(N+1) and 0 ≤ i ≤ N + 1. We abbreviate ui = u(xi) and fi = f(xi). To convert differential equation (1.1) into a linear equation for the unknowns u₁, u₂, … , uN, we use finite differences to approximate \[ \left. \frac{{\text d}u}{{\text d}x} \right\vert_{x = (i-1/2) h} \approx \frac{u_{i} - u_{i-1}}{h} , \] \[ \left. \frac{{\text d}u}{{\text d}x} \right\vert_{x = (i+1/2) h} \approx \frac{u_{i+1} - u_{i}}{h} . \] Subtracting these approximations and dividing by h yield the centered difference approximation \[ \left. \frac{{\text d}^2 u}{{\text d}x^2} \right\vert_{x = i h} \approx \frac{2 u_i - u_{i-1} - u_{i+1}}{h^2} + \tau_i , \] where τi, the so-called truncation error, can be shown to be \( \displaystyle \quad O\left( h^2 \cdot \left\| \frac{{\text d}^4 u}{{\text d}x^4} \right\|_{\infty} \right) . \quad \) We may now rewrite equation (1.1) at x = xi as \[ - u_{i-1} + 2\,u_i - u_{i+1} = h^2 f_i + h^2 \tau_i , \tag{1.2} \] where 0 < i < N + 1. From Dirichlet's boundary conditions, we derive that u₀ = uN+1 = 0. Hence, we get N equations with N unknowns u₁, u₂, … , uN: \[ {\bf T}_N {\bf u} = h^2 \begin{pmatrix} f_1 \\ \vdots \\ f_N \end{pmatrix} + h^2 \begin{bmatrix} \tau_1 \\ \vdots \\ \tau_N \end{bmatrix} , \] where \[ {\bf T}_N = \begin{bmatrix} 2 & -1 & 0 & \cdots & 0 \\ -1& 2 & -1 & \cdots &0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & -1 \end{bmatrix} , \qquad {\bf u} = \begin{pmatrix} u_1 \\ \vdots \\ u_N \end{pmatrix} \tag{1.3} \] To solve this system of equations, we will ignore terms with τ because they are small compared to f. This yields \[ {\bf T}_N {\bf v} = h^2 {\bf f} . \] The coefficient matrix TN plays a central role in all that follows, so we will examine it in some detail. First, we will compute its eigenvalues and eigenvectors. One can easily use trigonometric identities to confirm the following lemma.
Now we know enough to bound the error, i.e., the difference between the solution of \( \displaystyle \quad {\bf T}_N \hat{\bf u} = h^2 {\bf f} \quad \) and the true solution u of the Dirichlet boundary value problem. Taking norms, we get \[ \| u - \hat{u} \|_2 \le h^2 \| {\bf T}_N^{-1} \|2 \| \tau \|_2 \approx h^2 \left( \frac{N+1}{\pi} \right)^2 \| \tau \|_2 , \] which has an order of \[ O \left( \| \tau \|_2 \right) = O \left( h^2 \left\| \frac{{\text d}^4 u}{{\text d} x^4} \right\|_{\infty} \right) . \] Hence, the error \( \displaystyle \quad u - \hat{u} \quad \) goes to zero proportionally to h², provided that the solution is smooth enough (its fourth derivative is bounded on the interval [0, ℓ]).
From now on we will not distinguish between u and its approximation \( \displaystyle \quad \hat{u} \quad \) and so it will simplify notation by letting TNu = h²f. It turns out that the eigenvalues and eigenvectors of h−2TN also approximate the eigenvalues and eigenfunctions of the given boundary value problem (BVP). So we say that \( \displaystyle \quad \hat{\lambda}_i \quad \) is an eigenvalue and \( \displaystyle \quad \hat{z}_i (x) \quad \) is an eigenfunction of the given BVP if \[ - \frac{{\text d}^2 \hat{\bf z}_i (x)}{{\text d} x^2} = \hat{\lambda}_i \hat{\bf z}_i (x) , \qquad \hat{\bf z}_i (0) = \hat{\bf z}_i (\ell ) = 0 . \tag{1.4} \] Let us solve the Sturm--Liouville problem above for \( \displaystyle \quad \hat{\lambda}_i \quad \) and unknown function \( \displaystyle \quad \hat{z}_i (x) . \quad \) The general solution of Eq.(1.4) is a linear combination of trigonometric functions: \[ \hat{z}_i (x) = \alpha\,\sin \left( \sqrt{\hat{\lambda}_i}\, x \right) + \beta\,\cos \left( \sqrt{\hat{\lambda}_i}\, x \right) , \] for some real constants α and β. The boundary condition u(0) = 0 dictates that β must vanish. In order to satisfy another boundary condition at x = ℓ, we have to choose λ so that \( \displaystyle \quad \sqrt{\hat{\lambda}_i}\,\ell = i \pi \quad \) is multiple of π. Thus, \[ \hat{\lambda}_i = i^2 \pi^2 / \ell^2 \qquad\mbox{and} \qquad \hat{z}_i (x) = \sin \left( \frac{i \pi x}{\ell} \right) , \quad i =1,2,\ldots . \] Since eigenfunction can be multiplied by any nonzero constant, we set α = 1 without any loss of generality. The eigenvector zi is precisely equal to the eigenfunction \( \displaystyle \quad \hat{z}_i (x) \quad \) evaluated at the sample points xj = j h when scaled by \( \quad \sqrt{\frac{2}{N+1}} . \quad \) When i is small, the eigenvalue is well approximated by \[ h^{-2} \lambda_i = \left( N+1 \right)^2 \left( 1 - \cos \left( \frac{i\pi / \ell}{N+1} \right) \right) = O \left( \frac{1}{(N+1)^2} \right) . \] Thus, we see there is a close correspondence between matrix operator TN or h−2TN and the second derivative operator \( \displaystyle \quad -\texttt{D}^2 = {\text d}^2 / {\text d}x^2 . \quad \) This correspondence will be the motivation for the design and analysis of later algorithms.
Note: The boundary conditions for Poisson's equation may lead to nn-symmetric analogue of matrix TN and/or positive semi-definite matrices. ■
There are two similar notations for inner product of two vecors x and y, one is widely used by mathematicians when vectors are separated by a comma, and Dirac's notation when vectors are separated by vertical bar:
Throughout this section we assume that the matrix A ∈ ℝn×n is symmetric positive definite.
When square real matrix A is positive definite, 〈x , A x〉 = 〈x∣A x〉 > 0 for x ≠ 0. Also, since A is symmetric, we have 〈x , A x〉 = 〈A x∣x〉, which justifies the Dirac notation: 〈x∣A∣x〉 for 〈x∣A x〉. Therefore, we can introduce A-inner product for symmetric matrices
Gradient Method
The methods considered in this section is based on solution in the form:
The simplest nonstationary scheme based on setting a varying search direction and step size is ob- tained by setting p k = r k , i.e., M k −1 = α k I , with I the identity matrix. The resulting family of methods is called gradient descent.
Using spliting with nonsingular matrix S, we rewrite the main equation A x = b as
- Burden, R.L. and Faires, J.D., Numerical Analysis, nineth edition, Broks/Cole, Cengage, Boston, MA,
- Darve, E. and Wootters, M., Numerical Linear Algebra with Julia, SIAM, Philadelphia, 2021.
- Watkins, D.S., Fundamentals of Matrix Computations, Wiley; 3rd edition, 2010.
- White, R.E., Computational Linear Algebra with Applications and MATLAB Computations, CRC Press Boca Raton, FL, 2023. doi: 10.1201/9781003304128
- Barrett, R., Berry, M., Chan, T., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and van der Vorst, H., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, Pennsylvania, 1993.
- Burden, R.L. and Faires, J.D., Numerical Analysis, nineth edition, Broks/Cole, Cengage, Boston, MA,
- Darve, E. and Wootters, M., Numerical Linear Algebra with Julia, SIAM, Philadelphia, 2021.
- Hestenes, Magnus R.; Stiefel, Eduard (December 1952). "Methods of Conjugate Gradients for Solving Linear Systems" (PDF). Journal of Research of the National Bureau of Standards. 49 (6): 409. doi:10.6028/jres.049.044
- Lanczos, C. (1950). "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators" (PDF). Journal of Research of the National Bureau of Standards. 45 (4): 255–282. doi:10.6028/jres.045.026
- Loehr, N., Advanced Linear Algebra, CRC Press, 2014.
- Shewchuk, J.R., An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, 1994, School of Computer Science, Carnegie Mellon University. <
- Shi, X., Wei, Y., and Zhang, W., Convergence of General Nonstationary Iterative Methods for Solving Singular Linear Equations, Taylor & Francis Online, Volume 38, Issue 11, /ol>