This section is devoted to basic manipulations with matrices that have only two dimensions: rows × columns, or matrices that you could print out and lay ﬂat on a table. Although the number of dimensions in which a matrix lives is more open to interpretation compared to what we use in this section, matrices in higher dimensions (say that would require a 3D printer) could be found elsewhere. We discuss later matrices in physical space, which are called tensors.

Matrices are ubiquitous in mathematics, physics, and engineering because of their diverse intentions. To give you an idea of their versatility, below is a non-exhaustive list of some applications of matrices.

Representing a linear transformation or mapping;
Using as operators acting on vectors;
Representing a system of equations;
Storing data (e.g., features × observations);
Storing kernels used in ﬁltering or convolution;
Representing ﬁnance information from diﬀerent sectors of an economy or business;
Deriving parameters for a model that predicts changes in the spread of an infectious disease.

When referring to matrix sizes, and when referring to indices in a matrix, it is assumed that you refer first to rows then to columns. When a matrix has m rows and n columns, we speak about m-by-n or m × n matrix; so this matrix has two dimensions, and we call it m × n or (m, n).

An m-by-n matrix is a rectangular array or list of lists containing numbers, symbols, expressions, or algebraic objects, arranged in m rows and n columns.

There are two common notations for matrices, either in square brackets or parentheses. For example,

\[ %{\bf A} = \begin{bmatrix} a_{i,j} \end{bmatrix} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdoys & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} \quad \mbox{or} \quad %{\bf A} = \begin{pmatrix} a_{i,j} \end{pmatrix} = \begin{pmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdoys & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{pmatrix} . \]

This may be abbreviated by writing only a single generic term, possibly along with indices, as in

\[ {\bf A} = \begin{bmatrix} a_{i,j} \end{bmatrix} \quad \mbox{or} \quad {\bf A} = \begin{pmatrix} a_{i,j} \end{pmatrix} . \]

Note: Since matrices can be considered as operators acting on either column vectors from left or on row vectors from right, we use bracket notation for latter and embrace matrices in parentheses when they act on row vectors from right. ■

Although entries of a matrix could be arbitrary algebraic objects, we consider in this section only naïve entries that belong to one of the following fields: set of integers, ℤ, or rational numbers, ℚ, or real numbers, ℝ, or complex numbers, ℂ. The set of all m × n matrices over field 𝔽 is denoted by 𝔽^m×n. We use capital bold letters, such as A or M, to denote matrices.

Since mathematica does not display naturally (for humans) matrices, it is convenient to force this software package to do this job for you by entering the following code:

Manipulation of Matrices

The element in the i-th row, j-th column in matrix A is denoted A[i, j] or A_{i, j} or 𝑎_{i, j} if A = [𝑎_{i, j}]. There is no standard notation for specifying a particular row or column of a matrix. Following matlab, we use the notation A[i, :] for the i-th row of A (a row vector) and A[: , j] for the j-th column. (matlab uses parentheses rather than square brackets, however). Mathematica uses notation A[i, j] for (i, j) element of matrix A, but notation for extracting corresponding rows or columns will be given later.

Linear Combinations of Matrices

We show that matrices of the same size form a vector space. Algebraically, matrices can be added and scalar-multiplied, just like vectors, numeric and geometric vectors.

The sum C = [c_{i, j}] of two m × n matrices A = [𝑎_{i, j}] and B = [b_{i, j}] is again an m × n matrix. We get its entries from the simple rule c_{i, j} = 𝑎_{i, j} + b_{i, j}. Addition requires that both matrices have the same size (the same numbers of rows and columns).

The difference of two m × n matrices A = [𝑎_{i, j}] and B = [b_{i, j}] is the m × n matrix D = [d_{i, j}] with entries d_{i, j} = 𝑎_{i, j} − b_{i, j}.

Suppose we have two matrices of the same size

\[ {\bf A} = \begin{bmatrix} a_{i,j} \end{bmatrix} \qquad \mbox{and} \qquad {\bf B} = \begin{bmatrix} b_{i,j} \end{bmatrix} \qquad 1 \leqslant i \leqslant m , \quad 1 \leqslant j \leqslant n . \]

This way, A and B can now be added, element by element:

\[ {\bf A} + {\bf B} = {\bf B} + {\bf A} = \begin{bmatrix} a_{i,j} + b_{i, j}\end{bmatrix}_{1 \leqslant i \leqslant m , \ 1 \leqslant j \leqslant n } . \]

Observation 1: Matrix addition is commutative and associative: For any matrices A, B, and C of the same size, we have \[ {\bf A} + {\bf B} = {\bf B} + {\bf A} , \qquad {\bf A} + \left( {\bf B} + {\bf C} \right) = \left( {\bf A} + {\bf B} \right) + {\bf C} . \]

The product of a matrix A = [𝑎_{i, j}] with a scalar r is denoted by r A or A r. The entry in row i, column j of r A is simply r 𝑎_{i, j}, so that r A = [r 𝑎_{i, j}].

To multiply matrix A by a scalar (number) r ∈ 𝔽 (either from the left or from the right), just multiply every element of the matrix by the number (scalar):

\[ {\bf A} \,r = r\, {\bf A} = \begin{bmatrix} r\,a_{i,j} \end{bmatrix} . \]

Moreover, (λμ)A = λ(μA) for any scalars λ and μ.

With these two operations (addition/subtraction and scalar multiplication) we can define a linear combination of matrices:

\[ c_1 {\bf A}_1 + c_2 {\bf A}_2 + \cdots + c_k {\bf A}_k \]

for some scalars c₁, c₂, … , c_k and some matrices A₁, A₂, … , A_k of the same size. Then this linear combination is again a matrix of the same size. This means that the set 𝔽^m×n of all m × n matrices is a vector space. Furthermore, the above operations are also distributive in two senses. On the one hand, you can push the same matrix A into parentheses:

\[ \left( r + q \right) {\bf A} = r\,{\bf A} + q\,{\bf A} . \]

On the other hand, you can push the same scalar r into parentheses:

\[ r \left( {\bf A} + {\bf B} \right) = r\, {\bf A} + r\,{\bf B} . \]

Observation 2: For every m × n matrix A ∈ 𝔽^m×n, there is a unique m × n matrix M such that A + M = A. This matrix is called the zero matrix and it is denoted by 0.

Observation 3: For every m × n matrix A ∈ 𝔽^m×n, there is a unique m × n matrix M such that A + M = 0. This matrix is called the additive inverse of A and it is denoted by −A.

Example 1: Let \[ {\bf A} = \begin{bmatrix} 1& \phantom{-}2&3&\phantom{-}4 \\ 4&-3&2&-1 \\ 1&-1&1&-1 \end{bmatrix} , \qquad {\bf B} = \begin{bmatrix} 2&\phantom{-}1&3&\phantom{-}1 \\ 3&-2&1&-2 \\ 4&-2&3&-3 \end{bmatrix} \] Their sum and difference are \[ {\bf A} + {\bf B} = \begin{bmatrix} 3& \phantom{-}3& 6& \phantom{-}5 \\ 7& -5& 3& -3 \\ 5& -3& 4& -4 \end{bmatrix} , \qquad {\bf A} - {\bf B} = \begin{bmatrix} -1& \phantom{-}1& \phantom{-}0& 3 \\ \phantom{-}1& -1& \phantom{-}1& 1 \\ -3& \phantom{-}1& -2& 2 \end{bmatrix} . \]

A linear combination of these two matrices, for instance, is \[ 2\,{\bf A} -3\, {\bf B} = \begin{bmatrix} -4& 1& -3& 5 \\ -1& 0& \phantom{-}1& 4 \\ -10& 4& -7& 7 \end{bmatrix} . \]

■

End of Example 1

Matrix times Vector---Operator

In the computer algebra system Mathematica, a matrix is considered a list of lists. In case of our naïve definition of matrices, this list is filled with scalars from field 𝔽, either ℤ (which is preferred by the laziest people in academia, like me), integers, or ℚ, the rational numbers, ℝ, the real numbers, or ℂ, the set of complex numbers. Therefore, an m-by-n matrix can be considered as a list of m rows:

\[ {\bf A} = \begin{bmatrix} a_{i,j} \end{bmatrix} = \begin{bmatrix} {\bf r}_1 ({\bf A}) \\ {\bf r}_2 ({\bf A}) \\ \vdots \\ {\bf r}_m ({\bf A}) \end{bmatrix} , \]

where (in different, but equivalent, representations by different authors and software) r_i(A) = e_iA = A[i,:] is the i-th row of matrix A,

\[ {\bf r}_i ( {\bf A }) = \left[ a_{i,1} \ a_{i, 2} \ \cdots \ a_{i, n} \right] , \qquad i = 1,2, \ldots , m ; \]

with e_i = [ δ_{i, j} ] (j = 1, 2, … , n) being unit row vectors in 𝔽^1×n:

\[ {\bf e}_1 = \left[ 1, 0, 0, \ldots , 0 \right] , \quad {\bf e}_2 = \left[ 0, 1, 0, 0, \ldots , 0 \right] , \quad \ldots , \quad {\bf e}_n = \left[ 0, 0, \ldots , 0, 1 \right] . \]

Here δ_i,j is the Kronecker delta. To define a product A x of vector x = (x₁. x₂, … , x_n) by matrix A = [𝑎_{i, j}] from left, we need two things. First, we need to convert vector x ∈ 𝔽ⁿ from n-tuple into column vectors from 𝔽^n×1. This transformation is essential for matrix multiplication from left. Otherwise, you use vector/matrix multiplication for matrix acting on vector x from right; in this case, you need to convert the n-tuple vector into row vector. Since these three vector spaces, 𝔽ⁿ (Cartesian product of n copies of field 𝔽), 𝔽^n×1 (space of column vectors), and 𝔽^1×n (space of row vectors), are isomorphic, 𝔽ⁿ ≌ 𝔽^n×1 ≌ 𝔽^1×n, there exists a one-to-one and onto linear mapping that transfers one space into another.

Second, we need a special notation that abbreviates specific calculations. The dot product of two n-dimensional vectors from 𝔽ⁿ is computed by multiplying corresponding components, and then adding all these products. That is,

\begin{equation} \label{EqDot.1} {\bf x} \bullet {\bf y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n \in \mathbb{F} , \end{equation}

where x = (x₁. x₂, … , x_n), y = (y₁. y₂, … , y_n) ∈ 𝔽ⁿ. We indicate it by a large solid dot, •, because it has many different purposes of applications, not only in matrix/vector multiplication. The dot product is not defined for vectors of different dimensions.

Note: Although dot product defines a metric (distance) in real spaces, it has some weird properties in complex spaces; for instance, (1, j) • (1, j) = 0, where j is the imaginary unit in complex plane ℂ, so j² = −1. Therefore, the dot product is not suitable for metric definition in complex spaces--there is an inner product instead. So we use dot product in complex spaces only formally, as a short-cut for linear combinations. ■

When written this way, we can use the “two-hands" method to compute the product: the left hand moves from left to right along each row of the matrix while the right hand moves from top to bottom along the column vector.

\[ \left[ \begin{array}{cccccc} ◯ && ◯ && \cdots & & ◯ \\ ⬤ & \rightarrow & ⬤ &\rightarrow & \cdots & \rightarrow & ⬤\\ ◯ && ◯ && \cdots & & ◯ \\ \vdots && \vdots && \ddots && \vdots \\ ◯ && ◯ && \cdots & & ◯ \end{array} \right] \cdot \left[ \begin{array}{c} ⬤ \\ \downarrow \\ ⬤ \\ \downarrow \\ \vdots \\ \downarrow \\ ⬤ \end{array} \right] = \left[ \begin{array}{ccc} ⬡ \\ ⬣ \\ ⬡ \\ \vdots \\ ⬡ \end{array} \right] \]

Multiplication: row by column.

This allows us to define matrix/vector multiplication as the column m vector:

\[ {\bf A}\,{\bf x} = \begin{bmatrix} {\bf r}_1 ({\bf A}) \bullet {\bf x} \\ {\bf r}_2 ({\bf A}) \bullet {\bf x} \\ \vdots \\ {\bf r}_m ({\bf A}) \bullet {\bf x} \end{bmatrix} . \]

On another hand, we can write an m × n matrix A column by column (array of column vectors):

\[ {\bf A} = \begin{bmatrix} {\bf c}_1 ({\bf A}) & {\bf c}_2 ({\bf A}) & \cdots & {\bf c}_n ({\bf A}) \end{bmatrix} , \]

where c_j(A) = A e^T_j = A[:, j] is the j-th column of matrix A, j = 1, 2, … , n. here e^T_j is column vector obtained from row vector e_j by writing it vertically (T indicates transposition). This way, each column of A is in 𝔽^m×1. Let us consider now a different column vector, not in 𝔽^m×1 but rather in 𝔽^n×1:

\[ {\bf x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \in \mathbb{F}^{n \times 1} . \]

Since there are exactly n components in matrix A representation via columns, we can now scan A column by column, multiplying each column by the corresponding component from vector x, and sum up

\begin{equation} \label{EqDot.2} {\bf A}\,{\bf x} = x_1 {\bf c}_1 ({\bf A}) + x_2 {\bf c}_2 ({\bf A}) + \cdots + x_n {\bf c}_n ({\bf A}) = \sum_{i=1}^n x_i {\bf c}_i ({\bf A}) . \end{equation}

This indeed represents A x as a linear combination of the columns of A, with coefficients taken as components of vector x. Hence, matrix multiplication defines a linear transformation:

\begin{equation} \label{EqDot.3} \mathbb{F}^{n\times 1} \ni {\bf x} \,\mapsto \, {\bf A}\,{\bf x} \in \mathbb{F}^{m\times 1} . \end{equation}

A matrix A as an operator between column spaces, generates a linear transformation

\[ T_A \, : \ \mathbb{F}^n \mapsto \mathbb{F}^m , \qquad \mbox{where} \quad {\bf A} = [\![ T_A ]\!] , \]

because 𝔽^n×1 is isomorphic to 𝔽ⁿ. Such multiplication of matrix and column vector given in Eq,(3) is called matrix/vector operation, or more precisely, matrix/column multiplication.

We can also define a dual multiplication on a vector from right If y = [y₁, y₂, … , y_m] ∈ 𝔽^1×m is a row vector of length m, we define its product with m×n matrix A as follows:

\begin{equation} \label{EqDot.4} \mathbb{F}^{1\times m} \ni {\bf y} \, \mapsto \,{\bf y}\,{\bf A} \in \mathbb{F}^{1\times n} . \end{equation}

This rule (4) is known as vector/matrix operation or more precisely, row/matrix multiplication.

Observation 4: Every m × n matrix defines a linear transformation 𝔽^n×1 ⇾ 𝔽^m×1 with mapping column vector x into column vector A x upon multiplication from left by matrix A.

Similarly, this matrix acts on row vectors y of size m via multiplication from right, transferring it into row vectors y A of size n.

The important features of matrix/vector and vector/matrix multiplications are that they provide the connection between linear transformations and matrices: To apply a transform to a vector, you convert that transform into a matrix, and then you multiply the vector by that matrix, either from left considering vector as a column or from right representing a vector as a row. Multiplying m-by-n matrix A by vector x depends on the size pf the vector. If x has length n, then you have to convert it into column vector and multiply matrix A from left. However, if x ∈ 𝔽^m, then you have no choice as to consider this vector as a row and multiply matrix A from right.

You can always convert column vectors into rows and vice versa by using transposition operation (see below) because

\[ {\bf A}\,{\bf x} = \left( {\bf x}^{\mathrm T} {\bf A}^{\mathrm T} \right)^{\mathrm T} . \]

Example 2: We consider a rectangular matrix \[ {\bf A} = \begin{bmatrix} 2&3&-1&\phantom{-}2 \\ 5&1&-3&-2 \\ 4&2&-3&-1 \end{bmatrix} . \] Upon multiplying it on row vector from right, we get \[ \begin{bmatrix} 2& -3& 5 \end{bmatrix} \begin{bmatrix} 2&3&-1&\phantom{-}2 \\ 5&1&-3&-2 \\ 4&2&-3&-1 \end{bmatrix} = \begin{bmatrix} 9 & 13 & -8 & 5 \end{bmatrix} . \]

On the other hand, multiplying matrix A by column vector from right, we obtain \[ \begin{bmatrix} 2&3&-1&\phantom{-}2 \\ 5&1&-3&-2 \\ 4&2&-3&-1 \end{bmatrix} \cdot \begin{bmatrix} -1 \\ \phantom{-}2 \\ \phantom{-}3 \\ -4 \end{bmatrix} = \begin{bmatrix} -7\\ -4 \\ -5 \end{bmatrix} . \]

x = {-1 , 2 , 3 , -4};
A.x

{-7, -4, -5}

■

End of Example 2

Example 3: According to the USDA, 100 grams of watermelon contains

Calories: 30 Kcal
Carbs: 7.6 grams
Sugar: 6.2 grams
Protein: 0.6 grams
Potassium: 0.17 grams
Fiber: 0.6 grams

100 grams of strawberries contains

Calories: 32 Kcal
Sugar: 4.89 grams
Protein: 0.67 grams
Potassium: 0.153 grams
Magnesium: 0.013 grams
Fiber: 2 grams

100 grams of kiwi contains

Calories: 61 Kcal
Sugar: 8.99 grams
Protein: 1.14 grams.
Potassium: 0.312 grams.
Magnesium: 0.017 grams.
Fiber: 3 grams.

If one eats w kg of watermelon, s kg of strawberries, and k kg of kiwi, then the amount of protein p, the amount of fiber f, and the amount of sugar s they will have consumed is given by the system of linear equations \begin{align*} 0.6\,w + 0.67\, s + 1.14\,k &= 10\,p , \\ 0.6\,w + 2\,s + 3\,k &= 10\,f , \\ 6.2\, w + 4.89\,s + 8.99\, k &= 10\, s . \end{align*} We can rewrite this system in matrix/vector form: \[ \begin{bmatrix} 0.6 & 0,67 & 1.14 \\ 0.6 & 2 & 3 \\ 5.2 & 4.89 & 8.99 \end{bmatrix} \cdot \begin{bmatrix} w \\ s \\ k \end{bmatrix} = 10 \begin{bmatrix} p \\ f \\ s \end{bmatrix} . \] Observe that we can write the amount of protein being consumed as the dot product \[ p = 0.1 \left[ 0.6 \ 0.67 \ 1.14 \right] \bullet \left[ w \ s \ k \right] . \]

protein = .1 fruit[[1]] . weights

0.1 (1.14 k + 0.67 s + 0.6 w)

■

End of Example 3

Example 4: Let us consider 2 × 2 matrix, which is multiplied by a vector from left and right: \[ \begin{bmatrix} a&b \\ c&d \end{bmatrix} \cdot \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} a\,x + b\,y \\ c\,x + d\,y \end{bmatrix} \]

On the other hand, \[ \begin{bmatrix} x & y \end{bmatrix} \cdot \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \begin{bmatrix} a\,x + c\,y \\ b\,x + d\,y \end{bmatrix} \]

{x, y} . A

{a x + c y, b x + d y}

These mathematica scripts confirm that this computer algebra system is very smart and it knows when apply column vector and when row vector!! So we conclude that these two matrix.vector products match only when matrix A is symmetric: A = A^T, i.e., b = c, ■

End of Example 4

Matrix times Matrix

Before learning how (standard) matrix multiplication works, the ﬁrst thing you need to know about this topic is that the matrix product exists only for compatible matrices: when matrix A is multiplied by matrix B from right, their dimensions must match. In other words the number of columns in matrix A should be equal to the number of rows in matrix B in order for the product A B to exist. Therefore, the existence of product A B does not guarantee that reverse product B A exists; but when both products hold true, it is commutative only in rare cases and generally speaking A B ≠ B A.

The following ﬁve statements provide useful terminology for saying that product A B is valid for matrices A and B:

A times B (which is denoted by A B),
A pre-multiplies B,
A left-multiplies B,
B right-multiplies A
B post-multiplies A.

There are four ways to implement matrix multiplication of two matrices that provide insights into matrix computations in diﬀerent contexts and for diﬀerent problems. We denote by C [c_i,j] the product of two matrices

\[ {\bf C}_{p\times q} = {\bf A}_{p \times n} {\bf B}_{n\times q} , \]

where C_p×q means a matrix of dimensions p × q (similar description is valid for matrices A and B).

(I) The "dot-product" prospective

Each element c_i,j in product C = A B is the dot product between the i-th row in A and the j-th column in B:

\[ c_{i,j} = {\bf r}_i ({\bf A}) \bullet c_j ({\bf B}) = A[i;] \bullet B[;j] = a_{i,1} b_{1,j} + a_{i, 2} b_{2, j} + \cdots + a_{i, n} b_{n, j} , \\ i = 1, 2, \ldots ,p, \quad j = 1, 2, \ldots , q . \]

Example 5: We consider the product of two matrices \[ {\bf A}_{3 \times 2} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} , \qquad {\bf B}_{2 \times 2} = \begin{bmatrix} a&b \\ c&d \end{bmatrix} . \] Their product is \[ {\bf A}_{3 \times 2} {\bf B}_{2 \times 2} = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} . \]

A = {{0, 1}, {2, 3}, {4, 5}};
B = {{a, b}, {c, d}};
A . B

{{c, d}, {2 a + 3 c, 2 b + 3 d}, {4 a + 5 c, 4 b + 5 d}}

When 𝑎 =1, b = −1, c = 3, d = −2, we get \[ \left. {\bf A} \, {\bf B} \right\vert_{a\to 1, b \to -1, c \to 3, d \to -2} = \begin{bmatrix} 3& -2 \\ 11& -8 \\ 19& -14 \end{bmatrix} . \]

% /. {a -> 1, b -> -1, c -> 3, d -> -2}

{{3, -2}, {11, -8}, {19, -14}}

Note that the product B A does not exist. ■

End of Example 5

(II) The "outer product" prospective

Recall that the outer product (also known as tensor product) of two vectors x ∈ 𝔽^m×1 and y ∈ 𝔽^1×n is their matrix product whose entries are all products of an element in the first vector with an element from the second vector:

\[ {\bf x} \otimes {\bf y} = \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_m \end{array} \right] \,\begin{bmatrix} y_1 & y_2 & \cdots & y_n \end{bmatrix} = \begin{bmatrix} x_1 y_1 & x_1 y_2 & \cdots & x_1 y_n \\ x_2 y_1 & x_2 y_2 & \cdots & x_2 y_n \\ \vdots & \vdots & \ddots & \vdots \\ x_m y_1 & x_m y_2 & \cdots & x_m y_n \end{bmatrix} \in \mathbb{F}^{m\times n} . \]

For complex vectors, it is customary to use the conjugate transpose of one of the vectors. However, we do not follow this rule because we just use tensor notation for shortcut definition of matrix multiplications. Thus, the entries of m × n matrix x ⊗ y can be written as

\[ \left( {\bf x} \otimes {\bf y} \right)_{i,j} = x_i y_j . \]

An m × p matrix A can be considered as an array of column vectors

\[ {\bf A} = \begin{bmatrix} {\bf c}_1 ({\bf A}) & {\bf c}_2 ({\bf A}) & \cdots & {\bf c}_p ({\bf A}) \end{bmatrix} , \]

where every column vector has size m. We represent another multiplier B as an p column vector having rows of matrix B as entries:

\[ {\bf B} = \begin{bmatrix} {\bf r}_1 ({\bf B}) \\ {\bf r}_2 ({\bf B}) \\ \vdots \\ {\bf r}_p ({\bf B}) \end{bmatrix} \in \mathbb{F}^{p \times n} . \]

Every column (having size m) of matrix A is outer multiplied by every row (having size n) of matrix B; this result in a matrix of dimensions m × n,

\[ {\bf A}\,{\bf B} = \sum_{i=1}^p {\bf c}_i \left( {\bf A} \right) \otimes {\bf r}_i \left( {\bf B} \right) . \]

As such our product can be considered as a layer. Then combining all layers, we obtain the result.

Example 6: We again consider two matrices from the previous example and their product \[ {\bf A}\,{\bf B} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} , \] which becomes \[ \lim_{a\to 1, b\to -1, c\to 3, d\to -2} {\bf A}\,{\bf B} = \begin{bmatrix} 3& -2 \\ 11& -8 \\ 19& -14 \end{bmatrix} , \] when we set numerical values to parameters.

Now we outer multiply every column of matrix A with each row vector of matrix B: \[ \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \begin{bmatrix} 0\cdot a&0 \cdot b \\ 2\cdot a&2\cdot b \\ 4\cdot a&4\cdot b \end{bmatrix} + \begin{bmatrix} 1\cdot c&1\cdot d \\ 3\cdot c&3\cdot d \\ 5\cdot c&5\cdot d \end{bmatrix} = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} . \]

{A[[All, 1]] B[[1, 1]], A[[All, 1]] B[[1, 2]]}\[Transpose]
{A[[All, 2]] B[[2, 1]], A[[All, 2]] B[[2, 2]]}\[Transpose]
% + %%

\( \displaystyle \quad \begin{pmatrix} 0&0 \\ 2\,a& 2\,b \\ 4\, a & 4\, b \end{pmatrix} \)

\( \displaystyle \quad \begin{pmatrix} c&d \\ 3\,c & 3\,d \\ 5\,c & 5\,d \end{pmatrix} \)

\( \displaystyle \quad \begin{pmatrix} c&d \\ 2\,a + 3\,c & 2\,b + 3\, d \\ 4\,a + 5\, c & 4\, b + 5\,d \end{pmatrix} \)

R has a dedicated command for evaluation of outer product:

However, its output is a two column matrix containing all entries of outer product. ■

End of Example 6

(III) The "row" prospective

Each row in the product matrix is the weighted sum of all rows in the right matrix, where the weights are given by the elements in each row of the left matrix.

\[ {\bf A}\,{\bf B} = \begin{bmatrix} a_{1,1} {\bf r}_1 \left( {\bf B} \right) + a_{1,2} {\bf r}_2 \left( {\bf B} \right) + \cdots + a_{1,p} {\bf r}_p \left( {\bf B} \right) \\ \vdots \\ a_{m,1} {\bf r}_1 \left( {\bf B} \right) + a_{m,2} {\bf r}_2 \left( {\bf B} \right) + \cdots + a_{m,p} {\bf r}_p \left( {\bf B} \right) \end{bmatrix} . \]

See the following example for illustration.

Example 7: We again consider two matrices from previous examples and their product \[ {\bf A}\,{\bf B} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} , \] which becomes \[ \lim_{a\to 1, b\to -1, c\to 3, d\to -2} {\bf A}\,{\bf B} = \begin{bmatrix} 3& -2 \\ 11& -8 \\ 19& -14 \end{bmatrix} . \] Now we present the product matrix as a collection of weighted rows of right matrix: \[ \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \left[ \begin{array}{c} 0 \begin{bmatrix} a&b \end{bmatrix} + 1 \begin{bmatrix} c&d \end{bmatrix} \\ 2 \begin{bmatrix} a&b \end{bmatrix} + 3 \begin{bmatrix} c&d \end{bmatrix} \\ 4 \begin{bmatrix} a&b \end{bmatrix} + 5 \begin{bmatrix} c&d \end{bmatrix} \end{array} \right] = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} \] We verify the calculations with R:

■

End of Example 7

(IV) The "column-wise" prospective

First, we represent n × p matrix B as a list (array) of p columns:

\[ {\bf B} = \begin{bmatrix} \vert & \vert && \vert \\ {\bf c}_1 ({\bf B}) & {\bf c}_2 ({\bf B}) & \cdots & {\bf c}_p ({\bf B}) \\ \vert & \vert && \vert \end{bmatrix} , \]

where c_j(B) is the j-th column of matrix B. Since each column has size n, we can apply to it m × n matrix A from left and obtain

\[ {\bf A} \,{\bf B} = \begin{bmatrix} {\bf A}\,{\bf c}_1 ({\bf B}) & {\bf A}\,{\bf c}_2 ({\bf B}) & \cdots & {\bf A}\,{\bf c}_p ({\bf B}) \end{bmatrix} . \]

In other words, every j-th column of the product (A B) is

\[ (AB)\left[ :,j \right] = {\bf c}_j \left({\bf A} \,{\bf B} \right) = {\bf A} \,{\bf c}_j \left( {\bf B} \right) , \qquad j=1,2,\ldots , p . \]

Example 8: We again consider two matrices from previous examples and their product \[ {\bf A}\,{\bf B} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} . \] Representing matrix B as a list of columns \[ {\bf B} = \left[ \begin{bmatrix} a \\ c \end{bmatrix} \ \begin{bmatrix} b \\ c \end{bmatrix} \right] , \] we apply matrix A from left: \[ {\bf A}\,{\bf B} = \left[ {\bf A}\,\begin{bmatrix} a \\ c \end{bmatrix} \ {\bf A}\,\begin{bmatrix} b \\ c \end{bmatrix} \right] , \] Calculations show that \[ {\bf A}\,\begin{bmatrix} a \\ c \end{bmatrix} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a \\ c \end{bmatrix} = \begin{bmatrix} c \\ 2a+3c \\ 4a + 5c \end{bmatrix} \]

A = {{0, 1}, {2, 3}, {4, 5}};
A . {a, c}

{c, 2 a + 3 c, 4 a + 5 c}

and \[ {\bf A}\,\begin{bmatrix} b \\ d \end{bmatrix} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} b \\ d \end{bmatrix} = \begin{bmatrix} d \\ 2b + 3d \\ 4b + 5d \end{bmatrix} . \]

A . {b, d}

{d, 2 b + 3 d, 4 b + 5 d}

■

End of Example 8

From the column perspective, all matrices (the multiplying matrices and the product matrix) are thought of as sets of column vectors. Then the product matrix is created one column at a time.

The ﬁrst column in the product matrix is a linear combination of all columns in the left matrix, where the weights are deﬁned by the elements in the ﬁrst column of the right matrix. The second column in the product matrix is again a weighted combination of all columns in the left matrix, except that the weights now come from the second column in the right matrix. And so on for all n columns in the right matrix. The following example demonstrates this approach.

Example 9: We again consider two matrices from previous examples and their product \[ {\bf A}\,{\bf B} = \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} , \] which becomes \[ \lim_{a\to 1, b\to -1, c\to 3, d\to -2} {\bf A}\,{\bf B} = \begin{bmatrix} 3& -2 \\ 11& -8 \\ 19& -14 \end{bmatrix} . \] Now we represent the same product from column prospective and represent the product A B as a linear combination of columns of matrix A: \[ \begin{bmatrix} 0&1 \\ 2&3 \\ 4&5 \end{bmatrix} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} = \left[ a \begin{bmatrix} 0 \\ 2 \\ 4 \end{bmatrix} + c \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} \quad b \begin{bmatrix} 0 \\ 2 \\ 4 \end{bmatrix} + d \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix}\right] , \] which ends up to \[ {\bf A}\,{\bf B} = \left[ \begin{bmatrix} c \\2\,a + 3\, c \\ 4\,a+ 5\, c \end{bmatrix} \quad \begin{bmatrix} d \\ 2\, b +3\, d \\ 4\, b + 5\, d \end{bmatrix} \right] = \begin{bmatrix} c& d \\ 2 a + 3 c& 2 b + 3 d \\ 4 a + 5 c& 4 b + 5 d \end{bmatrix} . \]

A = {{0, 1}, {2, 3}, {4, 5}};
B = {{a, b}, {c, d}};
AB = A . B;
c1 = A[[All, 1]]

{0, 2, 4}

c2 = A[[All, 2]]

{1, 3, 5}

We build a linear combination of columns:

col1 = a*c1 + c*c2
% /. {a -> 1, b -> -1, c -> 3, d -> -2}

{c, 2 a + 3 c, 4 a + 5 c}
{3, 11, 19}

col2 = b*c1 + d*c2
% /. {a -> 1, b -> -1, c -> 3, d -> -2}

{d, 2 b + 3 d, 4 b + 5 d}
{-2, -8, -14}

Compare with the product matrix:

ab = {col1,col2};
Transpose[ab] == AB /. {a -> 1, b -> -1, c -> 3, d -> -2}

True

Now we work with numerical values:

ab = Transpose[{col1, col2}] /. {a -> 1, b -> -1, c -> 3, d -> -2}

\( \displaystyle \quad \begin{pmatrix} 3&-2 \\ 11 & -8 \\ 19&-14 \end{pmatrix} \)

A . B /. {a -> 1, b -> -1, c -> 3, d -> -2}

\( \displaystyle \quad \begin{pmatrix} 3&-2 \\ 11 & -8 \\ 19&-14 \end{pmatrix} \)

TrueQ[% == ab]

True

■

End of Example 9

Before we go to advanced applications, we consider a simple, but very important example of multiplication by diagonal matrix:

\[ \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0&\lambda_2 & \cdots & 0 \\ \vdots& \vdots & \ddots & \vdots \\ 0&0&\cdots & \lambda_n \end{bmatrix} \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots& \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2}& \cdots & a_{n,n} \end{bmatrix} = \begin{bmatrix} \lambda_1 a_{1,1} & \lambda_1 a_{1,2} & \cdots & \lambda_1 a_{1,n} \\ \lambda_2 a_{2,1} & \lambda_2 a_{2,2} & \cdots & \lambda_2 a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_n a_{n,1} & \lambda_n a_{n,2} & \cdots & \lambda_n a_{n,n} \end{bmatrix} , \]

and

\[ \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots& \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2}& \cdots & a_{n,n} \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0&\lambda_2 & \cdots & 0 \\ \vdots& \vdots & \ddots & \vdots \\ 0&0&\cdots & \lambda_n \end{bmatrix} = \begin{bmatrix} \lambda_1 a_{1,1} & \lambda_2 a_{1,2} & \cdots & \lambda_n a_{1,n} \\ \lambda_1 a_{2,1} & \lambda_2 a_{2,2} & \cdots & \lambda_n a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_1 a_{n,1} & \lambda_2 a_{n,2} & \cdots & \lambda_n a_{n,n} \end{bmatrix} . \]

A diagonal matrix having diagonal entries λ₁, λ₂, … , λ_n (as in the preceding equality) may safely be abbreviated by diag (λ₁, λ₂, … , λ_n) because such matrix is isomorphic to a vector. Mathematica has a dedicated command:

n = 5; (* size of the matrix *)
elements = Table[Subscript[\[Lambda], i], {i, n}];
matrix = DiagonalMatrix[elements]

\( \displaystyle \quad \begin{pmatrix} \lambda_1 &0&0&0&0 \\ 0& \lambda_2 &0&0&0 \\ 0&0&\lambda_3 &0&0 \\ 0&0&0& \lambda_4 &0 \\ 0&0&0&0& \lambda_5 \end{pmatrix} \)

The identity matrix I_n of size n (usually, index n is omitted when size of a matrix is clearly stated) is the diagonal square matrix that contains on its main diagonal elements with value of one, while the rest of the matrix elements are equal to zero.

We will widely use the main property of the identity matrix: it acts as multiplicative unity (either from left or from right) on any rectangular matrix. Multiplication by the identity matrix does not change the output: I_mA = A I_n = A ∈ 𝔽^m×n.

Example 10: We consider two 3 × 3 matrices \[ {\bf A} = \begin{bmatrix} 0&1&2 \\ 3&4&5 \\ 6&7&8 \end{bmatrix} , \qquad {\bf B} = \begin{bmatrix} 9&8&7 \\ 6&5&4 \\ 3&2&1 \end{bmatrix} . \] Their products are \[ {\bf A}\, {\bf B} = \begin{bmatrix} 12&9&6 \\ 66&65&42 \\ 120&99&78 \end{bmatrix} , \qquad {\bf B}\,{\bf A} = \begin{bmatrix} 66&90&114 \\ 39&54769 \\ 12&18&24 \end{bmatrix} . \]

A = {{0, 1, 2}, {3, 4, 5}, {6, 7, 8}};
B = {{9, 8, 7}, {6, 5, 4}, {3, 2, 1}};
A . B

{{12, 9, 6}, {66, 54, 42}, {120, 99, 78}}

B . A

{{66, 90, 114}, {39, 54, 69}, {12, 18, 24}}

A . B == B . A

False

■

End of Example 10

Theorem 1: If A, B, C are matrices over the field 𝔽 such that the products B C and A(B C) are defined, then so are the products A B, (A B)C and \[ {\bf A} \left( {\bf B\,C} \right) = \left( {\bf A\,B} \right) {\bf C} . \]

Suppose B is an n × k matrix. Since B C is defined, C is a matrix with k rows, and B C has n rows. Because A(B C) is defined we may assume A is an m × n matrix. Thus the product A B exists and is an m × k matrix, from which it follows that the product (A B)C exists. To show that A(B C) = (A B)C means to show that \[ \left[ {\bf A} \left( {\bf B}\,{\bf C} \right) \right]_{i,j} = \left[ \left( {\bf A}\,{\bf B} \right) {\bf C} \right]_{i,j} \] for each index i, j. By definition \begin{align*} \left[ {\bf A} \left( {\bf B}\,{\bf C} \right) \right]_{i,j} &= \sum_r {\bf A}_{i, r} \left( {\bf B}\,{\bf C} \right)_{r, j} \\ &= \sum_r {\bf A}_{i, r} \sum_s {\bf B}_{r, s} {\bf C}_{s, j} \\ &= \sum_r \sum_s {\bf A}_{i, r} {\bf B}_{r, s} {\bf C}_{s, j} = \sum_s \sum_r {\bf A}_{i, r} {\bf B}_{r, s} {\bf C}_{s, j} \\ &= \sum_s \left( \sum_r {\bf A}_{i, r} {\bf B}_{r, s} \right) {\bf C}_{s, j} = \sum_s \left( {\bf A}\,{\bf B} \right)_{i,s} {\bf C}_{s, j} \\ &= \left[ \left( {\bf A}\,{\bf B} \right) {\bf C} \right]_{i, j} . \end{align*}

This theorem tells us that in finite sequence of matrix products you can distribute parenthesis as you want, so the order in which individual products are evaluated does not matter:

\[ {\bf A}_{1} {\bf A}_{2} \cdots {\bf A}_{n} = \left( {\bf A}_{1} \left( {\bf A}_{2} {\bf A}_{3} \right) \right) \left( {\bf A}_{4} \cdots {\bf A}_{n-11} \right) {\bf A}_{n} . \]

However, as final answer of a sequence of matrix products does not depend on the order in which pairwise products are executed, this order may substantially effect the time of matrix product evaluation (see Dobrushkin's book).

Other properties of Matrix Multiplication are formulated in the following theorem.

Theorem 2: If A, B, and C are matrices, of appropriate dimensions so that the required operations are defined, and λ is a scalar, then

A(B ± C) = A B ± A C;
(A ± B)C = A C ± B C;
λ(A B) = (λA)B = A(λB).

For the first equality we require A to be of size m × n and B, C to be of size n x p, in which case \begin{align*} \left[ \mathbf{A} \left( \mathbf{B} + \mathbf{C} \right) \right]_{i,j} &= \sum_{k=1}^n a_{i,k} \left( b_{k,j} + c_{k,j} \right) \\ &= \sum_{k=1}^n a_{i,k} b_{k,j} + \sum_{k=1}^n a_{i,k} c_{k,j} = \left[ {\bf A}\,{\bf B} \right]_{i, j} + \left[ {\bf A}\,{\bf C} \right]_{i, j} \\ &= \mathbf{A}\,\mathbf{B} + \mathbf{A}\,\mathbf{C} \end{align*}
For the second equality, in which we require B, C to be of size m x n and A to be of size n x p, a similar argument applies.
The (i,j)-th elements of the three mixed products are \[ \lambda \left( \sum_{k=1}^n a_{i,k} b_{k,j} \right) = \sum_{k=1}^n \left( \lambda\,a_{i,k} \right) b_{k,j} = \sum_{k=1}^n a_{i,k} \left( \lambda\,b_{k,j} \right) , \] from which the result follows.

Note that generally speaking the cancellation law is not valid for matrix multiplication. This means that from the identity A C = B C does not always follow that A = B.

Theorem 3: If matrices A, B ∈ 𝔽^m×n are such that A x = B x for every column vector x ∈ 𝔽^n×1, then A = B.

It is sufficient to prove that if matrix C = A − B is such that C x = 0 for every column vector x ∈ 𝔽^n×1, then matrix C is a zero matrix.

Since the dimension of the vector space 𝔽^m×n of all m-by-n matrices is mn, this space has a basis of matrices \[ \mathbf{M}_{i,j} , \qquad i=1,2,\ldots , m, \quad j-1,2,\ldots , n , \] where each M_i,j has only one nonzero entry in position (i, j), which is convenient to choose as 1. Then any matrix C can be uniquely extended with respect to this standard basis: \[ \mathbf{C} = \sum_{i=1}^m \sum_{j=1}^n c_{i,j} \mathbf{M}_{i,j} . \] Upon choosing the first vector e₁ - [1, 0, 0, … , 0]^T, we apply matrix C to this vector from left: \[ \mathbf{C}\,\mathbf{e}_1 = \left( \sum_{i=1}^m \sum_{j=1}^n c_{i,j} \mathbf{M}_{i,j} \right) \mathbf{e}_1 = \sum_{i=1}^m c_{i,1} \mathbf{e}_i . \] Since we know that this column vector must be zero, we conclude that the entries in the first column of matrix C are all zeroes: \[ \mathbf{C}\,\mathbf{e}_1 = \sum_{i=1}^m c_{i,1} \mathbf{e}_i = \begin{bmatrix} c_{1,1} \\ c_{2,1} \\ \vdots \\ c_{m,1} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} . \] Similarly, we apply matrix C to all other unit vectors e_i, i = 2, 3, … , m, to obtain the zero matrix.

Example 11: Let \[ {\bf A} = \begin{bmatrix} 1&1 \\ 2&2 \end{bmatrix} , \quad {\bf B} = \begin{bmatrix} 1&0 \\ 0&1 \end{bmatrix} = {\bf I} , \quad {\bf C} = \begin{bmatrix} 0&1 \\ 1&0 \end{bmatrix} . \] Then \[ {\bf A}\, {\bf I} = {\bf A} \qquad\mbox{and} \qquad {\bf A}\, {\bf C} = {\bf A} . \]

A = {{1, 1}, {2, 2}};
A . {{1, 0}, {0, 1}}

{{1, 1}, {2, 2}}

A . {{0, 1}, {1, 0}}

{{1, 1}, {2, 2}}

■

End of Example 11

Motivation: matrix multiplication

The concept of matrix multiplication is due to the German mathematician Gotthold Eisenstein (1823--1852) of Jewish descent. He introduced the idea of matrix multiplication around 1844 (this year, he published alone 23 papers) to simplify the process of making substitutions in linear systems. He suffered various health problems throughout his life, including meningitis as an infant, a disease that took the lives of all five of his brothers and sisters.

Gotthold's idea was further expanded on and formalized by the British mathematician Arthur Cayley (1821--1895) in his Memoir on the Theory of Matrices that was published in 1858. Eisenstein was a pupil of Carl Friedrich Gauss who ranked him as the equal of Issac Newton and Archimedes.

Example 12: A linear fractional transformation is a function of the form \[ f(x) = \frac{a\,x + b}{c\,x + d} , \qquad a\,d \ne b\,c . \tag{12.1} \] Here 𝑎, b, c, and d are some given real numbers. The condition 𝑎 d ≠ b c implies that c and d are not both 0, hence the denominator c x + d is not identically 0. Moreover it also shows that \( \frac{a}{c} \ne \frac{b}{d} , \quad \) so that numerator and denominator are not proportional. The function f is not a constant, and is well-defined except where the denominator vanishes. When c ≠ 0, the point x = −d/c is not in the domain of f. Hence, this point is a vertical asymptote for its graph. (There is a horizontal asymptote y = 𝑎/c; linear fractional transformations are bijective maps if we add a point “infinity” to the line, and extend in a natural way the definition of f.)

Since each of first degree polynomials in the numerator and denominator has two coefficients (𝑎, b for numerator and c, d for denominator), we can define a map: \[ \varphi \, : \ \frac{a\,x +b}{c\,x + d} \ \mapsto \ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \mathbb{R}^{2\times 2} \setminus \left\{ \mathbf{0} \right\} , \tag{12.2} \] where it is a custom to write a polynomial as an ordered sum of monomials in decreasing powers of independent variable. This function φ becomes a bijection (one-to-one and onto) from its domain when "bad" points are excluding into the range, which is ℝ^2×2 without the zero matrix. Therefore, the 2 × 2 matrix uniquely identifies the fraction of two first degree polynomials and vice versa.

Next, we make a change of variable for similar rule: \[ x = g(t) = \frac{\alpha\,t + \beta}{\gamma\,t + \delta} \, \stackrel{\varphi}{\mapsto} \, \begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix} , \qquad \alpha\delta \ne \beta\gamma . \] Their composition becomes \begin{align*} y &= \left( f \circ g \right) (t) = f \left( g(t) \right) = \frac{a\,\frac{\alpha\,t + \beta}{\gamma\,t + \delta} + b}{c\, \frac{\alpha\,t + \beta}{\gamma\,t + \delta} + d} \\ &= \frac{\left( a\alpha + b\gamma \right) t + \left( a\beta + b\delta \right) }{\left( c\alpha + d\gamma \right) t + \left( c\beta + d\delta \right)} . \end{align*}

f[x_] := (a x + b)/(c x + d);
f[x]

\( \displaystyle \quad \frac{b + a\,x}{d + c\,x} \)

Solve[f[x] == 0, #] & /@ {a, b, c, d}

{{{a -> -(b/x)}}, {{b -> -a x}}, {}, {}}

Simplify[f[x], Assumptions -> a d != b c]

\( \displaystyle \quad \frac{b + a\,x}{d + c\,x} \)

Refine[f[x], Assumptions -> a d != b c]

\( \displaystyle \quad \frac{b + a\,x}{d + c\,x} \)

g[t_] := (\[Alpha] t + \[Beta])/(\[Gamma] t + \[Delta]);
f[g[t]]

When you copy and paste this code into your Mathematica notebook, the computer algebra system will display this rational function in pleasant way:

\( \displaystyle \quad \frac{b + \frac{a \left( \alpha\, t + \beta \right)}{\gamma\, t + \delta}}{d + \frac{c \left( t\,\alpha + \beta \right)}{t\,\gamma + \delta}} \)

% // Simplify

\( \displaystyle \quad \frac{a\,t\,\alpha + a\,\beta + b\, t\,\gamma + b\,\delta}{c\,t\,\alpha + c\,\beta + d\,t\,\gamma + d\,\delta} \)

Using Composition

h = f@*g;
h[t] // FullSimplify

\( \displaystyle \quad \frac{a\,t\,\alpha + a\,\beta + b\, t\,\gamma + b\,\delta}{c\,t\,\alpha + c\,\beta + d\,t\,\gamma + d\,\delta} \)

The latter expression is again a ratio of two polynomials of the first degree. The numerator consists of two terms: one is a multiple of t, which is (𝑎α + bγ), and a free term (𝑎β + bδ).

Collect[Numerator[FullSimplify[h[t]]], t]
%[[2]]
%%[[2, 2]]

a \[Beta] + t (a \[Alpha] + b \[Gamma]) + b \[Delta]
t (a \[Alpha] + b \[Gamma])
a \[Alpha] + b \[Gamma]

The denominator has a similar structure: the multiple of t is (cα + dγ) and the free term (cβ + dδ).

Collect[Denominator[FullSimplify[h[t]]], t]
%[[2]]
%%[[2, 2]]

c \[Beta] + t (c \[Alpha] + d \[Gamma]) + d \[Delta]
t (c \[Alpha] + d \[Gamma])
c \[Alpha] + d \[Gamma]

These four terms (two in the numerator and two in the denominator) can be placed into matrix form: \[ \begin{bmatrix} a\,\alpha + b\,\gamma & a\,\beta + b\,\delta \\ c\,\alpha + d\,\gamma & c\,\beta + d\,\delta \end{bmatrix} . \] The input functions f and g can also be represented by 2 × 2 matrices that are built from coefficients of polynomials in their numerators and denominators: \[ f(x) = \frac{a\,x + b}{c\,x + d} \,\stackrel{\varphi}{\mapsto} \, \begin{bmatrix} a&b \\ c&d \end{bmatrix} , \qquad g(t) = \frac{\alpha\,t + \beta}{\gamma\,t + \delta} \, \stackrel{\varphi}{\mapsto} \, \begin{bmatrix} \alpha & \beta \\ \gamma & \delta\end{bmatrix} . \] As a result, the composition formula suggests the following rule for multiplication of 2 × 2 matrices: \[ \begin{bmatrix} a& b \\ c&d \end{bmatrix} \cdot \begin{bmatrix} \alpha&\beta \\ \gamma &\delta \end{bmatrix} = \begin{bmatrix} a\alpha + b\gamma & a\beta + b\delta \\ c\alpha + d\gamma & c\beta + d\delta \end{bmatrix} . \]

Clear[A, B, a, b, c, d, \[Alpha], \[Beta], \[Gamma], \[Delta]];
{{a, b}, {c, d}} . {{\[Alpha], \[Beta]}, {\[Gamma], \[Delta]}}

\( \displaystyle \quad \begin{pmatrix} a\,\alpha + b\,\gamma & a\,\beta + b\,\delta \\ a\,\alpha + d\,\gamma & c\,\beta + d\,\gamma \end{pmatrix} \)

■

End of Example 12

Let us consider two finite dimensional vector spaces, U and V over field 𝔽. Basically, this means that these vector spaces are almost the same as direct products 𝔽ⁿ and 𝔽^m, for some positive integers n and m. In this case, mathematicians say that these vector spaces are isomorphic and use a special notation U ≌ 𝔽ⁿ, V ≌ 𝔽^m, where symbol "≌" means congruence.

Let T : 𝔽ⁿ ⇾ 𝔽^m be a linear transformation, where m and n are some positive integers. Suppose we know two ordered bases in in each vector space:

\[ \alpha = \begin{bmatrix} {\bf e}_1 & {\bf e}_2 & \cdots & {\bf e}_n \end{bmatrix} , \qquad {\bf e}_i \in \mathbb{F}^n , \quad i=1,2,\ldots , n , \]

and

\[ \beta = \begin{bmatrix} \varepsilon_1 & \varepsilon_2 & \cdots & \varepsilon_m \end{bmatrix} , \qquad \varepsilon_j \in \mathbb{F}^m , \quad j =1,2,\ldots , m, \]

respectively. It does not matter what particular vectors in ordered bases α and β have been chosen---they don't need to be standard bases. Every basis vector from α is mapped by T into 𝔽^m, then it can be expanded with respect to basis β:

\[ T \left( {\bf e}_i \right) = a_{i,1} \varepsilon_1 + a_{i,2} \varepsilon_2 + \cdots + a_{i,m} \varepsilon_m = \sum_{j=1}^m a_{i,j} \varepsilon_j , \quad i =1, 2, \ldots , n , \]

for some constants 𝑎_i,j. These coefficients may be organized into matrix

\begin{equation} \label{EqDot.5} [\![ T ]\!]_{\alpha \to \beta} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} \in \mathbb{F}^{m\times n} . \end{equation}

Matrix ⟦T⟧_{α
→β} is called the transformation matrix of mapping T with respect to ordered bases α and β. This matrix \eqref{EqDot.5}, which we denote by A for simplicity, becomes an operator upon transferring all vectors into columns (since they are isomorphic: 𝔽ⁿ ≌ 𝔽^n×1). Then matrix A becomes an operator acting on column vectors as multiplication from left:

\[ \mathbb{F}^{n\times 1} \ni {\bf x} \mapsto {\bf A}\,{\bf x} \in \mathbb{F}^{m\times 1} . \]

Theorem 4: Let U be an n-dimensional vector space over field 𝔽 and let V be an m-dimensional vector space over 𝔽. For each pair of ordered bases α and β of U and V, respectively, the function that assigns to a linear transformation T : U ≌ 𝔽^n×1 ⇾ V ≌ 𝔽^m×1 its matrix ⟦T⟧_{α →β} relative to α and β is an isomorphism between the space of all linear maps ℒ(U, V) and the space 𝔽^m×n of all m × n matrices over field 𝔽.

We observed above that the function in question is linear, and this function is one-one and maps vector space of all linear transformations ℒ(U, V) onto the set 𝔽^m×n of m × n matrices.

Example 13: We consider the differential operator \[ L[x, \texttt{D}] = x\,\texttt{D} - 2\texttt{I} , \qquad \texttt{D} = \frac{\text d}{{\text d}x} , \quad \texttt{I} = \texttt{D}^0 \quad (\mbox{identity operator}), \] acting in the vector space of polynomials of degree at most three: ℝ_≤3[x]. Note that any operator raised to zero power is assumed to be an identity operator. Are you surprised that Mathematica knows this?

D[fctx, {x, 0}]

Let α = [1, x, x², x³] be the ordered basis in ℝ_≤3[x]. The corresponding matrix of this differential operator is obtained by applying the differential operator to each element of the basis: \begin{align*} L[x, \texttt{D}] \, 1 &= -2 + 0 \cdot x + 0\cdot x^2 + 0\cdot x^3 , \\ L[x, \texttt{D}] \, x &= x - 2\,x = 0\cdot 1 -1 \cdot x + 0\cdot x^2 + 0\cdot x^3 , \\ L[x, \texttt{D}] \, x^2 &= 2\, x^2 - 2\,x^2 = 0 = 0\cdot 1 +0 \cdot x + 0\cdot x^2 + 0\cdot x^3 , \\ L[x, \texttt{D}] \, x^3 &= 3\, x^3 - 2\,x^3 = x^3 = 0\cdot 1 +0 \cdot x + 0\cdot x^2 + 1\cdot x^3 . \end{align*} Then the corresponding matrix for this differential operator consists of columns of vectors calculated by applying L to basis elements: \[ [\![ L[x, \texttt{D}] ]\!]_{\alpha \to \alpha} = \begin{bmatrix} -2&0&0&0 \\ 0&-1&0&0 \\ 0&0&0&0 \\ 0&0&0&1 \end{bmatrix} . \] It is a coincidence that the matrix for this differential operator is a diagonal matrix. However, we use this property by entering the following commands into Mathematica notebook:

Clear[L, x];
L[fctx_] := x D[fctx, x] - 2 D[fctx, {x, 0}];
DiagonalMatrix[L[#] & /@ {1, x, x^2, x^3}]
% /. {x -> 1, x^3 -> 1}

\( \displaystyle \quad \begin{pmatrix} -2&0&0&0 \\ 0&-1&0&0 \\ 0&0&0&0 \\ 0&0&0&1 \end{pmatrix} \)

■

End of Example 13

Let T : U ≌ 𝔽^n×1 ⇾ V ≌ 𝔽^m×1 be a linear transformation from U into V and S : V ≌ 𝔽^m×1 ⇾ W ≌ 𝔽^p×1 a linear transformation from V into W. Their composition is the transformation S ◦ T : U ⇾ W defined by

\[ \left( S \circ T \right) ({\bf x}) = \left( S(T({\bf x})) \right) , \qquad {\bf x} \in U . \]

Suppose we have ordered bases

\[ \alpha = \begin{bmatrix} {\bf a}_1 & {\bf a}_2 & \cdots & {\bf a}_n \end{bmatrix} , \quad \beta = \begin{bmatrix} {\bf b}_1 & {\bf b}_2 & \cdots & {\bf b}_n \end{bmatrix} , \quad \gamma = \begin{bmatrix} {\bf c}_1 & {\bf c}_2 & \cdots & {\bf c}_p \end{bmatrix} \]

for the respective spaces U, V, and W . Let A = ⟦T⟧_{α
→β} be the matrix of T relative to the pair α, β and let B = ⟦S⟧_{β →γ} be the matrix of S relative to the pair β, γ.

Theorem 5: Let T : U ≌ 𝔽^n×1 ⇾ V ≌ 𝔽^m×1 and S : V ≌ 𝔽^m×1 ⇾ W ≌ 𝔽^p×1 be linear transformations of finite dimensional vector spaces U, V, and W over field 𝔽, and let A = ⟦T⟧_{α
→β} be m × n matrix transformation for map T and B = ⟦S⟧_{β →γ} be n × p matrix transformation for map S. Then their composition S ◦ T : U ≌ 𝔽^n×1 ⇾ W ≌ 𝔽^p×1 is a linear transformation, and its standard matrix is the product B A: \[ [\![ S \circ T ]\![_{\alpha \to \gamma} = [\![ S ]\![_{\beta \to \gamma} [\![ T ]\![_{\alpha \to \beta} . \]

First, we verify that S ◦ T is linear. Let x, y be vectors in U. Then \begin{align*} S \circ T \left( {\bf x} + {\bf y} \right) &= S \left( T \left( {\bf x} + {\bf y} \right) \right) = S \left( T({\bf x}) + T({\bf y}) \right) \\ &= S \left( T({\bf x}) \right) + S \left( T({\bf y}) \right) = S \circ T \left( {\bf x} \right) + S \circ T \left( {\bf y} \right) . \end{align*} If k is a scalar, then \[ S \circ T \left( k\,{\bf x} \right) = S \left( T \left( k\,{\bf x} \right) \right) = S \left( k\, T \left( {\bf x} \right) \right) = k\, S \left( T \left( {\bf x} \right) \right) . \] Therefore, the composition S ◦ T is a linear transformation.

Now that we know that S ◦ T is linear, it makes sense to compute its standard matrix. Let C = ⟦S ◦ T⟧_{α →γ} be the standard matrix of composition S ◦ T with respect to ordered bases α, γ, so it is equivalent to matrix multiplication: \begin{align*} S \circ T \left( {\bf x} \right) &= S \left( T \left( {\bf x}\right) \right) \\ &= S \left( T \left( \sum_{i=1}^n x_i {\bf a}_i \right) \right) = \sum_{i=1}^n S \left( T \left( x_i {\bf a}_i \right) \right) \\ &= \sum_{i=1}^n x_1 S \left( T \left( {\bf a}_i \right) \right) \end{align*} Hence, it is sufficient to prove the statement just for one vector a_i from basis α. So we have \begin{align*} S \circ T \left( {\bf a}_i \right) &= S \left( T \left( {\bf a}_i \right) \right) \\ &= S \left( \sum_{j=1}^m A_{i,j} {\bf b}_j \right) \\ &= \sum_{j=1}^m A_{i,j} S \left( {\bf b}_j \right) \\ &= \sum_{j=1}^m A_{i,j} \sum_{k=1}^p B_{j,k} {\bf c}_k \\ &= \sum_{k=1}^p \left( \sum_{j=1}^m B_{j,k} A_{i,j} \right) {\bf c}_k . \end{align*} So we showed that \[ {\bf C}_{i,j} = \sum_{k=1}^m B_{i,k} A_{k,j} = \left( {\bf B}\,{\bf A} \right)_{i,j} . \]

The multiplication law for matrices can be motivated more generally as the follow example shows.

Example 14: Let us consider a linear transformation (which is closely related to linear systems of equations): \[ \begin{split} y_1 &= a_{1,1} x_1 + a_{1,2} x_2 + \cdots + a_{1,n} x_n , \\ y_2 &= a_{2,1} x_1 + a_{2,2} x_2 + \cdots + a_{2,n} x_n , \\ \cdots & \qquad \cdots \\ y_m &= a_{m,1} x_1 + a_{m,2} x_2 + \cdots + a_{m,n} x_n . \end{split} \] Suppose there exists a linear transformation for unknowns x₁, x₂, … , x_n: \begin{align*} x_1 &= b_{1,1} t_1 + b_{1,2} t_2 + \cdots + b_{1,p} t_p , \\ x_2 &= b_{2,1} t_1 + b_{2,2} t_2 + \cdots + b_{2,p} t_p , \\ \cdots & \qquad \cdots \\ x_n &= b_{n,1} t_1 + b_{n,2} t_2 + \cdots + b_{n,p} t_p . \end{align*} For the ith variable yi we have \begin{align*} y_i &= a_{i,1} x_1 + a_{i,2} x_2 + \cdots + a_{i,n} x_n \\ &= a_{i,1} \sum_{k=1}^p b_{1,k} t_k + a_{i,2} \sum_{k=1}^p b_{2,k} t_k + \cdots + a_{i,n} \sum_{k=1}^p b_{n,k} t_k \\ &= \left( \sum_{k=1}^n a_{i,k} b_{k,1} \right) t_1 + \left( \sum_{k=1}^n a_{i,k} b_{k,2} \right) t_2 + \cdots + \left( \sum_{k=1}^n a_{i,k} b_{k,p} \right) t_p . \end{align*} Hence we have found the expressions \begin{align*} y_1 &= c_{1,1} t_1 + c_{1,2} t_2 + \cdots + c_{1,p} t_p , \\ y_2 &= c_{2,1} t_1 + c_{2,2} t_2 + \cdots + c_{2,p} t_p , \\ \cdots & \qquad \cdots \\ y_m &= c_{m,1} t_1 + c_{m,2} t_2 + \cdots + c_{m,p} t_p , \end{align*} with the coefficients \[ c_{i,j} = \sum_{k=1}^n a_{i,k} b_{k,j} \qquad \left( 1 \leqslant i \leqslant m, \quad 1 \leqslant j \leqslant p \right) . \] In short, the result of the linear change of variables is \[ y_i = \sum_{j=1}^p c_{i,j} t_j , \qquad c_{i,j} = \sum_{k=1}^n a_{i,k} b_{k,j} . \] ■

End of Example 14

Polynomials of a Matrix

When A is an n × n (square) matrix, the product A A is defined. We shall denote this matrix by A². By Theorem 1, (A A)A = A(A A) or A²A = A A², so that the product A A A is unambiguously defined. This product we denote by A³. In general, the product A A ⋯ A (p times) is unambiguously defined, and we shall denote this product by A^p.

Example 15: In graph theory and computer science, an adjacency matrix, also called the connection matrix, is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. The adjacency matrix for an undirected graph is symmetric. This indicates the value in the ith row and jth column is identical with the value in the jth row and ith column. If the simple graph has no self-loops, then the vertex matrix should have 0s in the diagonal.

If A is the adjacency matrix of the directed or undirected graph G, then the matrix Aⁿ (i.e., the matrix product of n copies of A) has an interesting interpretation: the element (i, j) gives the number of (directed or undirected) walks/steps of length n from vertex i to vertex j. If n is the smallest nonnegative integer, such that for some i, j, the element (i, j) of Aⁿ is positive, then n is the distance between vertex i and vertex j.

The diagram indicates how six web pages are connected by hyperlinks. For example, the single directional arrow from A3 to A6 indicates there is a hyperlink on page A3 that takes the user to page A6 . The two arrows between A4 and A5 indicates that there is a hyperlink on page A4 to page A5 and a hyperlink on page A5 to A4.

graph1 = Graph[ { DirectedEdge["A1", "A2"], DirectedEdge["A2", "A1"], DirectedEdge["A1", "A3"], DirectedEdge["A2", "A4"], DirectedEdge["A2", "A5"], DirectedEdge["A3", "A1"], DirectedEdge["A3", "A6"], DirectedEdge["A4", "A5"], DirectedEdge["A5","A4"], DirectedEdge["A6", "A1"] }, VertexLabels -> "Name" ]

Here is an Adjacency Matrix for that graph using the built-in Wolfram function.

matAdj = AdjacencyMatrix[graph1];
MatrixForm[Normal[matAdj]]

\( \displaystyle \quad \begin{pmatrix} 0&1&1&0&0&0 \\ 1&0&0&1&1&0 \\ 1&0&0&0&0&1 \\ 0&0&0&0&1&0 \\ 0&0&0&1&0&0 \\ 1&0&0&0&0&0 \end{pmatrix} \)

(*Export[FileNameJoin[{NotebookDirectory[],"matAdj.png"}],normalMat,\ "PNG"];*)

(*FileNameJoin[{NotebookDirectory[],"matAdj.png"}]*)

Export[FileNameJoin[{NotebookDirectory[], "matAdj.png"}], normalMat, "PNG"];*)

Below is the matrix in Table format with headings. Next to it is the related graph. It is useful to think of this construction as follows. The arrows indicate the permitted direction. Conversely, travel against the arrow in the opposite direction is not allowed. If the numeral 1 appears in an intersection this means the probability is 100% that you may travel from the node represented by the vertical heading label to the node represented by the horizontal heading label. So, for example, the 1 in the first row, second column means there is a 100% probability that you may travel from A2 to A1. Any travel between any two nodes with a zero in their intersection means that travel between them is not possible. Try going from A5 to A1 on the graph to prove it to yourself.

tbl1 = TableForm[matAdj, TableHeadings -> {{A1, A2, A3, A4, A5, A6}, {A1, A2, A3, A4, A5, A6}}]; Grid[{{tbl1, Spacer [40], graph1}}]

\( \displaystyle \quad \begin{array}{c|cccccc} & A1 & A2 & A3 & A4 & A5 & A6 \\ \hline A1 & 0&1&1&0&0&0 \\ A2 & 1&0&0&1&1&0 \\ A3 & 1&0&0&0&0&1 \\ A4 & 0&0&0&0&1&0 \\ A5 & 0&0&0&1&0&0 \\ A6 & 1&0&0&0&0&0 \end{array} \)

Stated differently, the matrix has dimensions 6 x 6, corresponding to the six nodes in the graph. Each row and column corresponds to a node (in the order A1, A2, A3, A4, A5, A6). If there is a link from node i to node j, then the element in the i-th row and j-th column of the matrix is 1; otherwise, it is 0. For example, the first row of the matrix represents links from A1 to other pages. The 2nd and 3rd elements in this row are 1, indicating that there are links from A1 to A2 and A3. The rest of the elements in this row are 0, indicating no links from A1 to the other pages.

The diagonal elements of the matrix are all 0, indicating that there is a link only back to itself and that there are no links from any other page to itself. Mathematica considers this a Sparse Array. See Wolfram Documentation for details.

Normal[SparseArray[Automatic, {6, 6}, 0, {1, {{0, 2, 5, 7, 8, 9, 10}, {{2}, {3}, {1}, {4}, {5}, {1}, {6}, {5}, {4}, {1}}}, {1, 1, 1, 1, 1, 1, 1, 1, 1, 1}}]] % == matAdj

\( \displaystyle \quad \begin{pmatrix} 0&1&1&0&0&0 \\ 1&0&0&1&1&0 \\ 1&0&0&0&0&1 \\ 0&0&0&0&1&0 \\ 0&0&0&1&0&0 \\ 1&0&0&0&0&0 \end{pmatrix} \)
True

The adjacency matrix A for this graph represents the network by putting a 1 in the (i, j)-th entry of A if webpage A_i has a hyperlink to page A_j. For our graph,

Webpage A₁ links to itself, A₂ and A₃

- Web page A₂ links to A₄ and A₅

- Web page A₃ links back to A₁ and to A₆

- Web page A₄ has a two-way link with A₅

- Web page A₅ has no additional links - Web page A₆ links back to A₁.

For instance, because of the connection from A₃ to A₆, we get [A]_3,6 = 1. Since A₆ does not have a hyperlink to A₃, we have that [A]_6,3 = 0. And so on. We build this matrix into Mathematica:

A = {{0,1,1,0,0,0}, {1,0,0,1,1,0}, {1,0,0,0,0,1}, {0,0,0,0,1,0}, {0,0,0,1,0,0}, {1,0,0,0,0,0}}

Then we ask Mathematica to determine some of this matrix powers:

A2 = A.A

\( \displaystyle \quad \begin{pmatrix} 2&0&0&1&1&1 \\ 0& 1& 1& 0& 1& 0 \\ 1& 1& 1& 0& 0& 0 \\ 0& 0& 0& 1& 0& 0 \\ 0& 0& 0& 0& 1& 0 \\ 0& 1& 1& 0& 0& 0 \end{pmatrix} \)

A3 = A.A2

\( \displaystyle \quad \begin{pmatrix} 1&2&2&1&1&0 \\ 2& 0& 0& 2& 2& 1 \\ 2& 1& 1& 1& 1& 1 \\ 0& 0& 0& 0& 1& 0 \\ 0& 0& 0& 1& 0& 0 \\ 2& 0& 0& 1& 1& 1 \end{pmatrix} \)

So the number of steps of length 3 from node A₁ to node A₂ is 2 because there are two such paths: A₁ ↣ A₃ ↣ A₁ ↣ A₂ and A₁ ↣ A₂ ↣ A₁ ↣ A₂. ■

End of Example 15

Knowing powers of a square matrix, we can define a matrix polynomial

\[ P\left( {\bf A} \right) = {\bf A}^n + p_{n-1} {\bf A}^{n-1} + p_{n-2} {\bf A}^{n-2} + \cdots + p_{1} {\bf A} + p_{0} {\bf A}^{0} , \]

where p₀, p₁, … , p_n-1 are some scalars, n is a positive integer, and A⁰ = I is the identity matrix.

Example 16: Let us consider a Wilkinson's polynomial of degree five: \[ w(\lambda ) = \left( \lambda -1 \right) \left( \lambda -2 \right) \left( \lambda -3 \right) \left( \lambda -4 \right) \left( \lambda -5 \right) . \] Using Mathematica, we expand this polynomial into standard form: \[ w(\lambda ) = -120 + 274 \,\lambda - 225 \,\lambda^2 + 85 \,\lambda^3 - 15 \,\lambda^4 + \lambda^5 . \]

Clear[wilkPoly];
wilkPoly[x_] := Expand[(x - 1)*(x - 2)*(x - 3)*(x - 4)*(x - 5)]
wilkPoly[x]

-120 + 274 x - 225 x^2 + 85 x^3 - 15 x^4 + x^5

Wilkinson's polynomial for matrix \[ {\bf A} = \begin{bmatrix} -30& 2& 13 \\ -131& 9& 55 \\ -62& 4& 27 \end{bmatrix} \] is \[ w \left( {\bf A} \right) = \begin{bmatrix} -38955840& 0& 95040 \\ -43191973440& 6720& 379501200 \\ -1158917760& 0& 7893600 \end{bmatrix} . \]

matA = {{-30, 2, 13}, {-131, 9, 55}, {-62, 4, 27}} wilkPoly[matA] w[A]

{ {-38955840, 0, 95040}, {-43191973440, 6720, 379501200}, {-1158917760, 0, 7893600} }

■

End of Example 16

If a square matrix A is invertible, we can also define its negative powers A⁻ⁿ for any positive integer n. Therefore, we can define a meromorphic function for such square matrix. For instance, let

\[ f(\lambda ) = \frac{b_n}{\lambda^n} + \frac{b_{n-1}}{\lambda^{n-1}} + \cdots + \frac{b_1}{\lambda} + a_0 + a_1 \lambda + a_2 \lambda^2 + \cdots + a_m \lambda^m , \]

for some positive integers n and m. This function in on variable becomes a polynomial in two variables:

\[ f(\lambda ) = p(\lambda ) + q\left( \lambda^{-1} \right) = p(\lambda ) + q\left( \mu \right) , \qquad \mu = \lambda^{-1} , \]

where p(λ) = 𝑎₀ + 𝑎₁λ + ⋯ + 𝑎_mλ^m and q(μ) = b₁μ + b₂μ² + ⋯ + b_nμⁿ. Then for an invertible square matrix A, the following matrix function is well-defined:

\[ f\left( {\bf A} \right) = b_n {\bf A}^{-n} + b_{n-1} {\bf A}^{-n+1} + \cdots + b_1 {\bf A}^{-1} + a_0 {\bf I} + a_1 {\bf A} + \cdots + a_m {\bf A}^m , \]

where I = A⁰ is the identity matrix. Actually,

Example 17: We consider a function that is a sum of two polynomials: \[ f(\lambda ) = p(\lambda ) + p \left( \lambda^{-1} \right) , \] where p(λ) = (1 + λ)³. We consider an invertible matrix \[ {\bf A} = \begin{bmatrix} \phantom{-} 27& 16& 6 \\ -36& -21& -8 \\ -12& -8& -3 \end{bmatrix}, \qquad {\bf B} = {\bf A}^{-1} = \begin{bmatrix} 1/3& 0& 2/3 \\ 4& 3& 0 \\ -12& -8& -3 \end{bmatrix} . \]

A = {{27, 16, 6}, {-36, -21, -8}, {-12, -8, -3}}
B = Inverse[A]

{{1/3, 0, 2/3}, {4, 3, 0}, {-12, -8, -3}}

Then \[ \left( {\bf I} + {\bf A} \right)^3 + \left( {\bf I} + {\bf A}^{-1} \right)^3 =\begin{bmatrix} \frac{15238}{27} & \frac{3008}{9} & \frac{3152}{27} \\ - \frac{7232}{9} & -\frac{1408}{3} & -\frac{1504}{9} \\ -96 & - 64 & -16 \end{bmatrix} . \]

B1 = IdentityMatrix[3] + A;
B13 = B1 . B1 . B1

{{592, 352, 120}, {-864, -512, -176}, {-48, -32, -8}}

B2 = IdentityMatrix[3] + Inverse[A];
B23 = B2 . B2 . B2

{{-(656/27), -(160/9), -(88/27)}, {544/9, 128/3, 80/ 9}, {-48, -32, -8}}

B13 + B23

{{15328/27, 3008/9, 3152/ 27}, {-(7232/9), -(1408/3), -(1504/9)}, {-96, -64, -16}}

■

End of Example 17

Hadamard product

Hadamard multiplication is probably what you would have answered if someone asked you to guess what it means to multiply two matrices.

For two matrices A and B of the same dimension m × n, the Hadamard product, A ⊙ B, is a matrix of the same dimension as the operands, with elements given by \[ \left( {\bf A} \odot {\bf B} \right)_{ij} = \left( {\bf A} \right)_{ij} \left( {\bf B} \right)_{ij} . \]

For example, the Hadamard product of two vectors from 𝔽ⁿ is

\[ \left( a_1 , a_2 , \ldots , a_n \right) \odot \left( b_1 , b_2 , \ldots , b_n \right) = \left( a_1 b_ 1 , a_2 b_2 , \ldots , a_n b_n \right) . \]

The concept and notation is the same for matrices as for vectors. Hadamard multiplication involves multiplying each element of one matrix by the corresponding element in the other matrix.

R uses regular multiplication sign "*" for Hadamard product:

A = {{1, 2}, {3, 4}}; B = {{-1, 2}, { -2, 3}};
A*B

{{-1, 4}, {-6, 12}}

\[ \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \odot \begin{bmatrix} -1 & 2 \\ -2 & 3 \end{bmatrix} = \begin{bmatrix} -1 & 4 \\ -6 & 12 \end{bmatrix} . \]

Because Hadamard multiplication is implemented element-wise, it obeys the commutative law just like individual numbers (scalars). That is,

\[ {\bf A} \odot {\bf B} = {\bf B} \odot {\bf A} . \]

Element-wise matrix multiplication in computer applications facilitates convenient and eﬃcient coding (e.g., to avoid using for-loops), as opposed to utilizing some special mathematical properties of Hadamard multiplication. That said, Hadamard multiplication does have applications in linear algebra. For example, it is key to one of the algorithms for computing the matrix inverse. ⊙ ⊙

Example 18: First, we generate ramdomly two 3 × 3 matrices:

SeedRandom[39857];
mat1 = Array[RandomInteger[{0, 9}] &, {3, 3}];
mat2 = Array[RandomInteger[{10, 19}] &, {3, 3}];
Grid[{{"\nmat1", "\nmat2", "Hadamard\nProduct"}, MatrixForm[#] & /@ {mat1, mat2, mat1*mat2}}]

\[ \begin{bmatrix} 7&5&0 \\ 1&2&3 \\ 4&3&3 \end{bmatrix} \odot \begin{bmatrix} 15&13&15 \\ 19 & 11 & 18 \\ 18&12&17 \end{bmatrix} = \begin{bmatrix} 105 & 65 & 0 \\ 19 & 22 & 54 \\ 72 & 36 & 51 \end{bmatrix} . \] ■

End of Example 18

The transpose operation

For a given m × n matrix A, its transpose is the n × m matrix, denoted either by
\( {\bf A}^T \) or by A^t or just by \( {\bf A}' , \) whose entries are formed by interchanging the rows with
the columns; that is, \( \left( {\bf A} \right)_{i,j} = \left( {\bf A}' \right)_{j,i} . \)

When matrix A is considered as an operator, its transposed is usually denoted by prime, A′. However, in linear algebra, transposed matrix is identitied with letter "T."

Example 19: Let us consider the 3-by-4 matrix

\[ \begin{bmatrix} 34 &22& 19& 12 \\ 72& 87& 162& 122 \\ 69& 69& 420& 89 \end{bmatrix} . \]

Its transpose will be

\[ \begin{bmatrix} 34 & 72 & 69 \\ 22& 87 & 69 \\ 19& 162 & 420 \\ 12 & 122 & 89 \end{bmatrix} . \]

R confirms

A = {{34, 22, 19, 12}, {72, 87, 162, 122}, {69, 69, 420, 89}} Transpose[A] // MatrixForm

Out[2]= \( \displaystyle \quad \begin{pmatrix} 34& 72&69 \\ 22& 87& 69 \\ 19& 162&420 \\ 12& 122& 89 \end{pmatrix} \)

■

End of Example 19

Theorem 6: Let A and B denote matrices whose sizes are appropriate for the following operations.

\( \left( {\bf A}^{\mathrm T} \right)^{\mathrm T} = {\bf A} ; \)
\( \left( {\bf A} + {\bf B} \right)^{\mathrm T} = {\bf A}^{\mathrm T} + {\bf B}^{\mathrm T} ; \)
For any scalar s, \( \left( s{\bf A} \right)^{\mathrm T} = s\,{\bf A}^{\mathrm T} ; \)
\( \left( {\bf A}\,{\bf B} \right)^{\mathrm T} = {\bf B}^{\mathrm T} {\bf A}^{\mathrm T} ; \)
If A is a square matrix, then its trace is \( \mbox{tr}\left( {\bf A}^{\mathrm T} \right) = \mbox{tr}\left( {\bf A} \right) ; \)
If A is a square nonsingular matrix, then \( \left( {\bf A}^{\mathrm T} \right)^{-1} = \left( {\bf A}^{-1} \right)^{\mathrm T} ; \)
If A is a square matrix, then \( \det\left( {\bf A}^{\mathrm T} \right) = \det\left( {\bf A} \right) . \)

This identity follows immediately from the definition.
It follows from the definition.
It follows from the definition.
Suppose that A = [𝑎_i,j]_m×n and B = [b_i,j]_n×p. Then (A B)′ and B′A′ are each of size p × m. Since \[ \left[ \mathbf{B}' \mathbf{A}' \right]_{i,j} = \sum_{k=1}^n b_{k, i} a_{j,k} = \sum_{k=1}^n a_{j,k} b_{k,i} = \left[ {\bf A}\,{\bf B} \right]_{i,j} \] we deduce that (A B)′ = B′A′.
If A is a square matrix, then \( \mbox{tr}\left( {\bf A}^{\mathrm T} \right) = \mbox{tr}\left( {\bf A} \right) . \)
If A is a square nonsingular matrix, then \( \left( {\bf A}^{\mathrm T} \right)^{-1} = \left( {\bf A}^{-1} \right)^{\mathrm T} . \)
If A is a square matrix, then \( \det\left( {\bf A}^{\mathrm T} \right) = \det\left( {\bf A} \right) . \)

Here is a list of basic matrix manipulations with Mathematica:

Example 20: Let us consider the 3-by-4 matrix

First we generate 3-by-4 matrix:

Clear[A, B, subA, subB, matA, matB];
A = Range@12~Partition~4
B = Range[13, 24]~Partition~4

Out[2]= \( \displaystyle \quad \begin{pmatrix} 1& 2& 3& 4 \\ 5& 6& 7& 8 \\ 9& 10& 11& 12 \end{pmatrix} \)

Out[3]= \( \displaystyle \quad \begin{pmatrix} 13& 14& 15& 16 \\ 17& 18& 19& 20 \\ 21& 22& 23& 24 \end{pmatrix} \)

Theorem 6 has 7 sub-parts.

part (a)

Transpose[A]

\( \displaystyle \quad \begin{pmatrix} 1&5&9 \\ 2&6&10 \\ 3&7&11 \\ 4&8&12 \end{pmatrix} \)

Transpose[%]

\( \displaystyle \quad \begin{pmatrix} 1& 2& 3& 4 \\ 5& 6& 7& 8 \\ 9& 10& 11& 12 \end{pmatrix} \)

part (b)

Transpose[A] + Transpose[B]

\( \displaystyle \quad \begin{pmatrix} 14& 22& 30 \\ 16& 24& 32 \\ 18& 26& 34 \\ 20 &28 & 36 \end{pmatrix} \)

% == Transpose[A+B]

True

part (c)

s*A

\( \displaystyle \quad \begin{pmatrix} s & 2\,s & 3\, s & 4\, s \\ 5\,s & 6\, s & 7\, s & 8\, s \\ 9\, s & 10\, s & 11\, s & 12\, s \end{pmatrix} \)

Transpose[s*A]

\( \displaystyle \quad \begin{pmatrix} s & 5\, s &9\, s \\ 2\,s & 6\,s & 10\, s \\ 3\, s & 7\, s & 11\, s \\ 4\, s & 8\,s & 12\, s \end{pmatrix} \)

% == s*Transpose[A]

True

part (d)

Transpose[A.B];
% == Transpose[B].Transpose[A]

True

part (e)

subA = A[[All, 1 ;; 3]]

\( \displaystyle \quad \begin{pmatrix} 1&2&3 \\ 5&6&7 \\ 9&10&11 \end{pmatrix} \)

subB = B[[All, 1 ;; 3]]

\( \displaystyle \quad \begin{pmatrix} 13&14&15 \\ 17&18&19 \\ 21&220&23 \end{pmatrix} \)

Tr[subA\[Transpose]]

% == Tr[A]

True

part (f) Here is a non-singular matrix, as its Determinant is non-zero

matA = {{2, 3, 4}, {5, 7, 8}, {9, 11, 13}}

\( \displaystyle \quad \begin{pmatrix} 2&3&4 \\ 5&7&8 \\ 9&11&13 \end{pmatrix} \)

Det[matA]

-5

Inverse[Transpose[matA]]

\( \displaystyle \quad \begin{pmatrix} -\frac{3}{5}& -\frac{7}{5}&\frac{8}{5} \\ -1&2&-1 \\ \frac{4}{5}& -\frac{4}{5}& \frac{1}{5} \end{pmatrix} \)

% == Inverse[matA]\[Transpose]

True

part (g)

Det[Transpose[matA]]

-5

% == Det[matA]

True

■

End of Example 20

The adjoint operation

For an m × n matrix A ∈ ℂ^m×n, its adjoint or the conjugate transpose, also known as the Hermitian transpose, is the n × m matrix, denoted by A^✶ obtained by transposing A and applying complex conjugation to each entry (the complex conjugate of 𝑎 + j b being 𝑎 − j b, for real numbers 𝑎 and b).

Theorem 7: Let A and B denote matrices whose sizes are appropriate for the following operations.

\( \left( {\bf A}^{\ast} \right)^{\ast} = {\bf A} ; \)
\( \left( {\bf A} + {\bf B} \right)^{\ast} = {\bf A}^{\ast} + {\bf B}^{\ast} ; \)
For any complex number s, \( \left( s{\bf A} \right)^{\ast} = s^{\ast}{\bf A}^{\ast} , \quad \) where s^✶ is complex conjugate of s;
\( \left( {\bf A}\,{\bf B} \right)^{\ast} = {\bf B}^{\ast} {\bf A}^{\ast} ; \)
If A is a square matrix, then \( \mbox{tr}\left( {\bf A}^{\ast} \right) = \mbox{tr}\left( {\bf A} \right)^{\ast} ; \)
If A is a square nonsingular matrix, then \( \left( {\bf A}^{\ast} \right)^{-1} = \left( {\bf A}^{-1} \right)^{\ast} ; \)
If A is a square matrix, then \( \det\left( {\bf A}^{\ast} \right) = \det\left( {\bf A} \right)^{\ast} . \)

\( \left( {\bf A}^{\ast} \right)^{\ast} = \mathbf{A} \)
\( \left( {\bf A} + {\bf B} \right)^{\ast} = {\bf A}^{\ast} + {\bf B}^{\ast} \)
For any complex number s, \( \left( s{\bf A} \right)^{\ast} = s^{\ast}{\bf A}^{\ast} , \quad \) where s^✶ is complex conjugate of s;
\( \left( {\bf A}\,{\bf B} \right)^{\ast} = {\bf B}^{\ast} {\bf A}^{\ast} ; \)
If A is a square matrix, then \( \mbox{tr}\left( {\bf A}^{\ast} \right) = \mbox{tr}\left( {\bf A} \right)^{\ast} ; \)
If A is a square nonsingular matrix, then \( \left( {\bf A}^{\ast} \right)^{-1} = \left( {\bf A}^{-1} \right)^{\ast} ; \)
If A is a square matrix, then \( \det\left( {\bf A}^{\ast} \right) = \det\left( {\bf A} \right)^{\ast} . \)

Note that Mathematica uses the dagger (†) symbol instead of asterisk (✶) for complex conjugate and transpose.

Example 21: Let us consider the 3-by-4 matrix \[ {\bf A} = \begin{bmatrix} 8& {\bf j} & 3 - {\bf j}& -2 \\ -{\bf j}& {\bf j} + 1 & 2& 1 - {\bf j} \\ 4{\bf j} + 1& {\bf j}& -1& 2 + {\bf j} \end{bmatrix} \in \mathbb{C}^{3\times 4}. \] Its adjoint is \[ {\bf A}^{\ast} = \begin{bmatrix} 8& {\bf j} & 1 - 4 \,{\bf j} \\ -{\bf j} & 1 - {\bf j} & -{\bf j} \\ 3 + {\bf j} & 2 & -1 \\ -2 & 1 + {\bf j} & 2 - {\bf j} \end{bmatrix} \in \mathbb{C}^{4\times 3}. \]

Clear[A, B, s];
A = {{8, I , 3 - I, -2 }, {-I, I + 1, 2, 1 - I}, {4*I + 1, I, -1, 2 + I}};
B = {{5, I, 4 - I, -3}, {-I, I + 2, 3, 1 - I}, {5*I + 2, I, -4, 23 + I}};
s = 1 + 2 I

part (a)

ConjugateTranspose[A]

{{8, I, 1 - 4 I}, {-I, 1 - I, -I}, {3 + I, 2, -1}, {-2, 1 + I, 2 - I}}

A == (A\[ConjugateTranspose])\[ConjugateTranspose]

True

part (b)

A\[ConjugateTranspose] + B\[ConjugateTranspose]

\( \displaystyle \quad \begin{pmatrix} 13& 2 I& 3 - 9 I \\ -2 I& 3 - 2 I& -2 I \\ 7 + 2 I& 5& -5 \\ -5& 2 + 2 I& 25 - 2 I \end{pmatrix} \)

% == (A + B)\[ConjugateTranspose]

True

part (c)

(s*A)\[ConjugateTranspose]

\( \displaystyle \quad \begin{pmatrix} 8 - 16 I& 2 + I& -7 - 6 I \\ -2 - I& -1 - 3 I& -2 - I \\ 5 - 5 I& 2 - 4 I& -1 + 2 I \\ -2 + 4 I& 3 - I& -5 I \end{pmatrix} \)

% == s\[ConjugateTranspose]*A\[ConjugateTranspose]

{{8 - 16 I, 2 + I, -7 - 6 I}, {-2 - I, -1 - 3 I, -2 - I}, {5 - 5 I, 2 - 4 I, -1 + 2 I}, {-2 + 4 I, 3 - I, -5 I}} == {{8 ConjugateTranspose[1 + 2 I], I ConjugateTranspose[1 + 2 I], (1 - 4 I) ConjugateTranspose[ 1 + 2 I]}, {-I ConjugateTranspose[ 1 + 2 I], (1 - I) ConjugateTranspose[ 1 + 2 I], -I ConjugateTranspose[ 1 + 2 I]}, {(3 + I) ConjugateTranspose[1 + 2 I], 2 ConjugateTranspose[1 + 2 I], -ConjugateTranspose[ 1 + 2 I]}, {-2 ConjugateTranspose[ 1 + 2 I], (1 + I) ConjugateTranspose[ 1 + 2 I], (2 - I) ConjugateTranspose[1 + 2 I]}}

As you see, Mathematica cannot handle this pair of commands. The issue here is due to the nature of scalar multiplication because command ConjugateTranspose is applicable only to matrices. Therefore, we, modify slightly the input:

(s*A)\[ConjugateTranspose] == Conjugate[s]*A\[ConjugateTranspose]

True

If constant of multiplication is real, then Mathematica does its job.

ConjugateTranspose[3.14*A] == 3.14*ConjugateTranspose[A]

True

part (d)

(A B)\[ConjugateTranspose]

\( \displaystyle \quad \begin{pmatrix} 40& -1& -18 - 13 I \\ -1& 1 - 3 I& -1 \\ 11 + 7 I& 6& 4 \\ 6& 2 I& 45 - 25 I \end{pmatrix} \)

% == B\[ConjugateTranspose]*A\[ConjugateTranspose]

True

part (e)

subA = A[[All, 1 ;; 3]]

\( \displaystyle \quad \begin{pmatrix} 8& I& 3 - I \\ -I& 1 + I& 2 \\ 1 + 4 I& I& -1 \end{pmatrix} \)

subB = B[[All, 1 ;; 3]]

\( \displaystyle \quad \begin{pmatrix} 5& I& 4 - I \\ -I& 2 + I& 3 \\ 2 + 5 I& I& -4 \end{pmatrix} \)

Tr[subA\[ConjugateTranspose]]

8 - I

Conjugate[Tr[subA]] == Tr[ConjugateTranspose[subA]]

True

part (f)

Both subA and subB are non-singular that we verify with Mathematica:

isSingular[x_List] := MatrixRank[x] < Length[x];
isSingular[#] & /@ {subA, subB}

{False, False}

Det[subA]

8 - 41 I

Inverse[subA\[ConjugateTranspose]]

\( \displaystyle \quad \begin{pmatrix} \frac{131}{1745} + \frac{17 I}{1745} & -\frac{303}{1745} - \frac{26 I}{1745} & \frac{173}{1745} - \frac{204 I}{1745} \\ -\frac{172}{1745} - \frac{9 I}{1745} & \frac{571}{1745} + \frac{527 I}{1745} & \frac{319}{1745} + \frac{108 I}{1745} \\ \frac{32}{1745} + \frac{164 I}{1745} & \frac{259}{1745} + \frac{673 I}{1745} & -\frac{384}{1745} - \frac{223 I}{1745} \end{pmatrix} \)

% == Inverse[subA]\[ConjugateTranspose]

True

part (g)

Det[subA\[ConjugateTranspose]]

-8 + 41 I

% == Conjugate[(Det[subA])]

True

■

End of Example 21

Calculate the following matrix/vector products by expanding them as a linear combination of the of the matrix. \[ {\bf (a)\ \ \ } \begin{bmatrix} \phantom{-}3&2&1 \\ -1 & e & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix} , \qquad {\bf (b)\ \ \ } \begin{bmatrix} 2&-3&1 \\ 3&-1&5 \end{bmatrix} \begin{bmatrix} 3 \\ 1 \\ -2 \end{bmatrix} \]
Prove the following theorem.

Theorem: If A and B are m x n matrices then, for any scalars λ and μ.
1. λ(A + B) = λA + λB;
2. (λ + μ)A = λA + μA;
3. (−1)A = −A;
4. λ(μA = (λμ)A = (μλ)A;
5. 0 A = 0_m×n.

Axier, S., Linear Algebra Done Right. Undergraduate Texts in Mathematics (3rd ed.). Springer. 2015, ISBN 978-3-319-11079-0.
Beezer, R.A., A First Course in Linear Algebra, 2017.
Dobrushkin, V.A., Methods in Algorithmic Analysis, CRC Press, Boca Raton, FL, 2010.
Outer product.

Introduction to Linear Algebra

Systems of Linear Equations

Matrix Algebra

Vector Spaces

Eigenvalues, Eigenvectors

Euclidean Spaces

Matrix Decompositions

Applications

Functions of Matrices

Miscellany

Preliminaries

Glossary

Reference

Manipulation of Matrices