A system of linear equations written in standard form is not suitable for performing operations with computer solvers.
At the same time, we observe that elementary reduction operations involve only the coefficients of the unknown variables and the right-hand sides of the linear equations, but not variables. Therefore, the unknown variables are dummy variables and can be replaced by any characters---it does not matter how to denote unknown variables. Hence, the system is completely identified by matrix A = [𝑎i,j] of its coefficients and the constant term b.
So a matrix that includes the coefficient matrix A and the right-hand vector b can serve as a device for representing and solving a system of equations.
The augmented matrix of the m × n system
A x = b is obtained by taking the coefficient m × n matrix A and adjoining b as an
additional (n + 1)th column: We indicate the augmented matrix like
this: [A | b] or just [Ab].
Lazy people like me do not write a vertical line separating coefficient matrix A from appending vector or matrix.
The augmented matrix contains all the information we need
to reconstruct the original system.
Example 7:
Write the augmented matrix for the given system of equations.
\begin{align*}
2\,y - 3\, z &= 5,
\\
4\,x - 3\, y + 2\, z &= 7,
\\
2\, x + 8\, y + 4\,z &= 3 .
\end{align*}
The augmented matrix displays the coefficients of the variables and an additional column for the constants:
\[
\left[ \begin{array}{ccc|c} 0&2&3&5 \\
4&-3&2&7 \\
2&8&4&3 \end{array} \right] .
\]
Suppose you are given an augmented matrix without separating vertical line:
\[
\begin{bmatrix} 3&0&-2&5&6 \\ 0&4&9&8&4 \\ 1&2&0&-3&7 \end{bmatrix} .
\]
In order to restore the system of linear equations, we need to identify unknown variables. So we denote them as x₁, x₂, x₃, x₄ because there are five entries in a row.
Then we append these variables to each column entry of the given augmented matrix to obtain the required system of equations:
\begin{align*}
3\,x_1 - 2\,x_3 + 5\,x_4 &= 6 , \\
4\,x_2 + 9\, x_3 + 8\, x_4 &= 4 , \\
x_1 + 2\, x_2 - 3\, x_4 &= 7 .
\end{align*}
End of Example 7
The first known use of augmented matrices appeared around 200 B.C. in a Chinese manuscript entitled Nine Chapters of Mathematical Art. In this manuscript, the augmented matrix was written in dual form; hence, elementary operations were performed on columns rather than rows, as we use right now.
The actual use of the term augmented matrix was evolved by the American
mathematician
Maxime Bôcher
(1867--1918) in his book
Introduction to Higher Algebra, published in 1907. Bôcher was an
outstanding expositor of mathematics whose elementary textbooks were greatly
appreciated by students. His achievements were documented by
William F. Osgood (1919). In addition to being an outstanding research mathematician and lecturer, Maxime Bôcher was an expert in Latin, chemistry, philosophy, zoology, geography, and music.
We have seen the elementary operations for solving systems of linear equations. If we choose to work with augmented matrices instead, the elementary operations for systems of equations should be translated to the following elementary row operations for matrices:
An elementary row operation on a matrix M is any one of
the following three ways of modifying M to produce a new equivalent matrix.
Interchanging the positions of two rows. We abbreviate it as Ri ↕ Rj.
Multiplying one row of the matrix by a nonzero scalar.
This operation is denoted as Ri ← c·Ri for row i.
Adding a constant multiple of a row to another row. The following notation is used: Ri ← Ri + c·Rj.
Note that it is safe to swap any two columns in the coefficient matrix, but you have to match it with the corresponding change in the order of entries of the unknown vector.
Example 8:
We are going to demonstrate how solving the system by using elementary operations corresponds to transforming the augmented matrix using elementary row operations. At the beginning, the system and the corresponding augmented matrix are:
Multiplying the first equation by ½ corresponds to multiplying the first row of the augmented matrix by ½, which is notated by R₁ ← ½·R₁. These operations result in:
Then, we add the first equation to the second equation. This corresponds to adding the first row of the augmented matrix to the second row of the matrix, which is abbreviated by R₂ ← R₂ + R₁. We obtain
Next, we add 5 times the first equation to the third equation. This corresponds to adding 5 times the first row of the matrix to the third row of the matrix, which is notated by R₃ ← R₃ + (5)·R₁ or more simply R₃ ← R₃ + 5·R₁. The result is
With this operation, we converted the leading term in position (1, 2) into pivot because all entries below it are zeroes. We mark the pivot by putting "1" into box.
We multiply the second equation by 2 and the third equation by (−3). This yields R₂ ← 3·R₂, R₃ ← (-2)·R₃:
Since the leading term (6) at position (2, 1) has all zeroes below it, we consider it as a pivot. It is convenient (for humans, not computers) to keep leading terms as zeroes, we divide the second row by 6 and the third row by 47/2; hence
Recall that two linear systems are said to be equivalent if they have
the same solution set. The idea of equivalence for systems does not have an obvious analog for matrices. We extend this definition to augmented
matrices by adopting a different approach.
Two rectangular matrices of the same dimensions are said to be row equivalent, denoted
by the symbol \( \displaystyle \underset{R}{\sim} \) placed between the two matrices, if there exists a sequence of elementary row operations that transfers one matrix into another one.
Therefore, the elementary operations for systems of linear equations are naturally transferred on arbitrary rectangular matrices from 𝔽m×n and then on augmented matrices. The next result shows that the concept of row equivalence is a useful one.
Theorem 3:
Let S₁ and S₂ be two systems of m equations in n unknowns with augmented
coefficient matrices M₁ and M₂, respectively. If matrix M₁ is row equivalent to matrix M₂, then
system S₁ is equivalent to system S₂.
Let S₁ and S₂ be two systems of m equations in n unknowns with augmented
coefficient matrices M₁ and M₂, respectively.
Suppose that M₁ and M₂ are row equivalent, and further suppose that there
is a single elementary row operation which, when applied to M₁, produces M₂. Then, the
corresponding elementary operation, when applied to S₁, will produce the system S₂. By
Theorem 2, however, because S₂ is obtained from S₁ by a single elementary
operation, the systems S₁ and S₂ are equivalent. So, if the sequence of operations that act
on M₁ to produce M₂ is 1 operation long, then S₁ is equivalent to S₂.
Next suppose that M₁ and M₂ are row equivalent, and that the sequence of elementary row
operations that produces M₂ from M₁ is 2 operations long. Let B₁ be the matrix obtained from
M₁ by applying the first of the two elementary row operations, and let S₃ be the corresponding system of linear equations. Because the sequence of operations that act on M₁ to produce
B₁ is 1 operation long, S₁ and S₃ have the same solution set by the previous argument.
Furthermore, only one elementary row operation is required to produce C from B₁, so by
the same argument S₃ and S₂ must have the same solution set. Hence, S₁ and S₂ have the same
solution set, so S₁ and S₂ are equivalent. For a sequence of k elementary row operations, the
same argument can be employed using k steps. Therefore, in general, if M₁ is row equivalent to M₂, then S₁ is equivalent to S₂.
In order to make entries nicer, we multiply the third row by 3 and divide the last row by 5. This yields
\[
{\bf M} \,\underset{R}{\sim}\, \begin{bmatrix} \fbox{$\,1\,$} & 2 & 3 & 4 & 5 & 15 \\ 0&0&\fbox{$\, 6\,$} & -3& -5& -25 \\ 0& 0& 0& \fbox{$\,3\,$} & 1 & -7 \\ 0&0&0& -3 &-1&7
\end{bmatrix} \qquad \begin{split} R_3 \leftarrow \, 3\cdot R_3 \\ R_4 \leftarrow \, \left( \frac{1}{5} \right) \cdot R_4
\end{split}
\]
Finally, we add last two rows to obtain
\[
{\bf M} \,\underset{R}{\sim}\, \begin{bmatrix} \fbox{$\,1\,$} & 2 & 3 & 4 & 5 & 15 \\ 0&0&\fbox{$\, 6\,$} & -3& -5& -25 \\ 0& 0& 0& \fbox{$\,3\,$} & 1 & -7 \\ 0&0&0& 0 &-0&0
\end{bmatrix} \qquad \begin{split} R_4 \leftarrow \, R_4 + R_3
\end{split}
\]
We convert this augmented matrix into equivalent system of equations:
\[
\begin{split}
x_1 + 2\, x_2 + 3\, x_3 + 4\, x_4 + 5\, x_5 &= 15 , \\
0\,x_1 + 0\,x_2 + 6\, x_3 - 3\,x_4 - 5\,x_5 &= -25 , \\
0\, x_1 -0\,x_2 -0\, x_3 +3\,x_4 + 1\,x_5 &= -7 , \\
0\,x_1 + 0\,x_2 + 0\,x_3 -0\,x_4 + 0\,x_5 &= 0 .
\end{split}
\]
Obviously, the last equation can be dropped because it does not impose any constraint on unknown variables, so we get
\[
\begin{split}
x_1 + 2\, x_2 + 3\, x_3 + 4\, x_4 + 5\, x_5 &= 15 , \\
0\,x_1 + 0\,x_2 + 6\, x_3 - 3\,x_4 - 5\,x_5 &= -25 , \\
0\, x_1 -0\,x_2 -0\, x_3 +3\,x_4 + 1\,x_5 &= -7 .
\end{split}
\]
WE can make a conclusion based on the latter: the given system of equations has infinite many solutions with two free variables.
End of Example 9
Please note that the converse of Theorem 3 is false. There exist systems of equations S₁ and S₂ such that S₁ is equivalent to S₂, but the corresponding augmented coefficient matrices,
M₁ and M₂, are not row equivalent. For an example of this phenomenon, let S₁ be the system
The solution set of each system is the empty set, so the two systems are equivalent. The corresponding augmented coefficient matrices
\( \displaystyle \left[ \begin{array}{cc|c} 1&2&0 \\ 0&0&3 \end{array} \right] \quad \) and
\( \displaystyle \quad \left[ \begin{array}{cc|c} 1&3&5 \\ 0&0&1 \end{array} \right] \quad \)
are not row equivalent however.
Now rather than manipulating the equations, we can instead manipulate the rows of this augmented matrix. The main idea of row operations is to transfer the augmented matrix to another equivalent matrix that contains as many zeroes as possible.
Row operations can be performed with the aid of elementary matrices---this topic will be covered in Part 2.
Theorem 4:
If a finite sequence of elementary row operations reduces a square matrix A to B, then the same sequence reduces the identity matrix I to a matrix P such that P A = B. Equivalently, if a sequence of elementary row operations reduces the augmented matrix [A | I] to [B | P], then P A = B.
In the following proof, we will use the formula
\[
\mbox{Row}_i \left( {\bf A} \right) = \left( \mbox{Row}_i ({\bf I}) \right) {\bf A} .
\]
We must show that
Similarly, we eliminate x₃ from the third equation by subtracting the first row times (6 + 3j) from the third row times 7. This algorithm is written as R₃ ← 7·R₃ − (6 + 3j)·R₁.
Example 12:
We consider the 2 × 3 system of linear equations with coefficients depending on a parameter:
\[
\begin{split}
2\,x + \left( 2 +t \right) y + \left( t-4 \right) z &= 3 , \\
t\,x + \left( t-5 \right) y + 2t\,z &= 2 .
\end{split}
\]
The corresponding augmented matrix also depends on parameter t
\[
{\bf M} = \left[ {\bf A} \mid {\bf b} \right] = \left[ \begin{array}{ccc|c} 2 & 2+t & t-4 & 3 \\ t&t-5 & 2t & 2 \end{array} \right] .
\]
Of course, Mathematica can easily solve this system
{{x -> -((-19 + t - 20 z + 13 t z + t^2 z)/(10 + t^2)),
y -> -((4 - 3 t - 8 t z + t^2 z)/(10 + t^2))}}
\[
x = \frac{-19 + t - 20 z + 13 t z + t^2 z}{10 + t^2} , \qquad y = \frac{4 - 3 t - 8 t z + t^2 z}{10 + t^2} .
\]
However, our objective is to demonstrate the elimination procedure. First, we eliminate x from the second equation using elementary operation: R2 ← 2·R2 − t·R1.
u = {2, 2 + t, t - 4, 3}; v = {t, t - 5, 2*t, 2};
t*u - v*2
This leads to
\[
{\bf M} \,\underset{R}{\sim}\, \left[ \begin{array}{ccc|c} 2\left( 10 + t^2 \right) & 0 &
2 \left( t^2 + 13 t -20 \right) & 38 -2t \\
0&10 + t^2 & t\left( t - 8 \right) & 3t-4 \end{array} \right] .
\]
Upon division by 2, the first equation reads as
\[
\left( 10 + t^2 \right) x = 19 - t - z \left( t^2 + 13 t -20 \right) .
\]
From last row, we get
\[
\left( 10 + t^2 \right) y = 3t-4 - t \left(t-8 \right) .
\]
End of Example 12
Example 13:
Suppose we need to determine whether the following three matrices are linearly independent:
\[
{\bf A}_1 = \begin{bmatrix} 1&2 \\ 3&4 \end{bmatrix} , \quad {\bf A}_2 = \begin{bmatrix} \phantom{-}1&\phantom{-}0 \\ -2&-1 \end{bmatrix} , \quad {\bf A}_3 = \begin{bmatrix} \phantom{-}0&3 \\ -2&1 \end{bmatrix} .
\]
These matrices are linearly dependent when the equation
\[
x_1 {\bf A}_1 + x_2 {\bf A}_2 + x_3 {\bf A}_3 = {\bf 0}_{2\times 2}
\tag{12.1}
\]
has at least one nontrivial solution.
In order for the numbers x₁, x₂, and x₃ to satisfy the equation (12.1), we must have
\[
\begin{split}
x_1 + x_2 +0\, x_3 &= 0 , \\
2\,x_1 + 0\, x_2 + 3\, x_3 &= 0, \\
3\, x_1 -2\, x_2 - 2\, x_3 &= 0, \\
4\, x_1 - x_2 + x_3 &= 0 .
\end{split}
\tag{12.2}
\]
Although we can ask Mathematica for help
The last equation reads
\[
-\frac{23}{4}\, x_3 = 0 \qquad \Longrightarrow \qquad x_3 = 0 .
\]
Hence, other unknown values are also zeroes, and we conclude that our three matrices are linearly independent.
End of Example 13
These observations form the motivation behind a method to solve systems of linear equations, known as Gaussian elimination, called after its inventor Carl Friedrich Gauss (1777--1855) from Germany.
Anton, Howard (2005), Elementary
Linear Algebra (Applications Version) (9th ed.), Wiley International