Matrix Algebra
Sage is a powerful system for studying and exploring many different
areas of mathematics. This page presents some topics from Linear Algebra needed for construction of solutions to systems of linear ordinary differential equations and some applications.
How to Define and Operate on Vectors
A vector usually describes a quantity that has both magnitude and direction. Geometrically, a vector can be represented by a directed line segment---that is by arrow---and is denoted either by a boldface lower case symbol or by a symbol with an arrow over it.
A vector whose initial point is A and whose terminal point is B is written \( \vec{AB} .\)
Vectors are said to be free, which means that a vector can be moved from one position to another provided that its magnitude
(denoted by \( \| \vec{AB} \| \) ) and direction are not changed.
When a basis in a vector space has been chosen, a vector can be expanded with respect to the basis vectors. For
example, when unit vectors in n-dimensional space \( \mathbb{R}^n \) have been chosen,
any vector can be uniquely expanded with respect to this basis.
In three dimensional space, it is custom to use the Cartesian coordinate system and denote these unit vectors by
i (abscissa or horizontal axis, usually denoted x), j (ordinate or vertical axis,
usually denoted by y), and k (applicate, usually denoted by z). With respect to these unit vectors, any vector can be
written as \( {\bf v} = x\,{\bf i} + y\,{\bf j} + z\,{\bf k} . \) Then the vector
v is uniquely defined by an ordered tripple of real numbers
\[
{\bf v} = \left[ x, y , z \right] .
\]
In general, a vector in
n-dimensional space is identified by
n-tuple (an ordered array of numbers), and in infinite dimensional space by a sequence of numbers.
We remember basic properties of vectors. First we define a vector over the field of real numbers:
sage: v = vector(RR, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1.00000000000000, 0.500000000000000, 0.333333333333333, 0.250000000000000)
However, if we define the same vector over the field of rational numbers, we get
sage: v = vector(QQ, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1, 1/2, 1/3, 1/4)
It is essential that each element knows what its classification.
sage: v.parent()
Vector space of dimension 4 over Rational Field
sage: v.degree()
4
sage: v[3]
1/4
Even though we typed v[3], the last command asks to display the fourth element in the vector v because Sage counts from zero.
The degree command provides the number of elements in the vector, which is its size.
However, if you try to define the same vector over the ring of natural numbers, Sage will complain:
sage: v = vector(ZZ, 4, [1, 1/2, 1/3, 1/4]); v
because entries are not intergers; on the other hand, the command below works
sage: v = vector(ZZ, 4, [1, 2, 3, -4])
sage: v
(1, 2, 3, -4)
To check, you may want to type:
sage: v.parent()
Ambient free module of rank 4 over the principal ideal domain
In mathematics and applications, it is a custom to distinguish column vectors
\[
{\bf v} = \left( \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_m \end{array} \right) \qquad \mbox{also written as } \qquad
{\bf v} = \left[ \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_m \end{array} \right] ,
\]
for which we use lowercase letters in boldface type, from row vectors (ordered
n-tuple)
\[
\vec{v} = \left[ v_1 , v_2 , \ldots , v_n \right] .
\]
Here entries
\( v_i \) are known as the component of the vector.
The column vectors and the row vectors can be defined using matrix command as an example of an
\( n\times 1 \) matrix and
\( 1\times n \) matrix, respectively:
sage: u = matrix()
output
sage: v = matrix()
output
Vectors in Sage are built, manipulated and interrogated similarly to matrices (see next subsection). However, as
simple lists (“one-dimensional,” not “two-dimensional” such as matrices that look more tabular), they are simpler
to construct and manipulate. Sage will print a vector across the screen, even if we wish to interpret it as a column
vector. It will be delimited by parentheses ( (,) ) which allows
us to distinguish a vector from a matrix with just one row, if we look carefully. The
number of “slots” in a vector is not referred to in Sage as rows or columns, but rather
by “degree.” Here are some examples (remember to start counting from zero):
sage: v = vector(QQ, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1, 1/2, 1/3, 1/4)
sage: v.degree()
4
sage: v.parent()
Vector space of dimension 4 over Rational Field
Sage is largely object-oriented, which means many commands apply to an
object by using the “dot” notation.
Mathematical objects in Sage often come from sets of similar objects. This set is
called the “parent” of the element. We can use this to learn how Sage deduces the
type of entries in a matrix.
It is possible to also skip specifying the type of numbers used for entries of a
matrix, however this is fraught with peril, as Sage will make an informed guess about
your intent, which may not be what you want. Consider when you enter the single character
“2” into a computer program like Sage. Is this the integer 2, the
rational number \( \frac{2}{1}, \)
the real number 2.00000, the complex number 2 + 0i, or the polynomial
\( p(x)=2? \)
In context, we can usually figure it out, but a literal-minded computer is not
so smart. It happens that the operations we can perform, and how they behave, are
influenced by the type of the entries in a matrix. So it is important to get this right
and our advice is to be explicit and be in the habit of always
specifying the type of the entries of a matrix you create.
Sage knows a wide variety of sets of numbers. These are known as “rings” or
“fields,” but we will call them “number systems” here. Examples include:
SR to enter symbolic entries,
ZZ
is the set of integers,
QQ
is the set of rationals,
RR
is the real numbers with reasonable precision, and
CC
is the complex numbers with reasonable precision.
We will present the theory of linear algebra over the complex numbers. However, in any computer
system, there will always be complications surrounding the inability of digital arithmetic to accurately
represent all complex numbers. In contrast, Sage can represent
rational numbers exactly as the quotient of two (perhaps very large) integers. So our
Sage examples will begin by using
QQ as our number system and we concentrate on understanding the key
concepts.
sage: z = zero_vector(QQ, 5)
sage: z
(0, 0, 0, 0, 0)
Let
S be a set of vectors
\( {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_k .\) A vector
v
is said to be a
linear combination of the vectors from
S if and only if there are scalars (not all zeroes)
\( c_1 , \ c_2 , \ \ldots , \ c_k , \) such that
\( {\bf v} = c_1 {\bf v}_1 + c_2 {\bf v}_2 + \cdots + c_k {\bf v}_k .\)
That is, a linear combination of vectors from
S is a sum of scalar multiples of those vectors.
Let
S be a nonempty subset of a vector space
V. Then the
span of
S in
V
is the set of all possible (finite) linear combinations of the vectors in
S (including zero vector). It is usually denoted by span(
S).
In other words, a span of a set of vectors in a vector space is the intersection of all subspaces containing that set.
Example. The vector [-2, 8, 5, 0] is a linear combination of the vectors [3, 1, -2, 2], [1, 0, 3, -1], and [4, -2, 1 0], because
\[
2\,[3,\, 1,\, -2,\,2] + 4\,[1,\,0,\,3,\,-1] -3\,[4,\,-2,\, 1,\, 0] = [-2,\,8,\, 5,\, 0] . \qquad ■
\]
Let
S be a subset of a vector space
V.
(1)
S is a
linearly independent subset of
V if and only if no vector in
S can be expressed as a linear combination of the other vectors in
S.
(2)
S is a
linearly dependent subset of
V if and only if some vector
v in
S
can be expressed as a linear combination of the other vectors in
S.
Theorem: A nonempty set \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_r \} \)
in a vector space V is linearly independent if and only if the only coefficients satisfying the vector equation
\[
k_1 {\bf v}_1 + k_2 {\bf v}_2 + \cdots + k_r {\bf v}_r = {\bf 0}
\]
are
\( k_1 =0, \ k_2 =0, \ \ldots , \ k_r =0 . \)
Theorem: A nonempty set \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_r \} \)
in a vector space V is linearly independent if and only if the matrix of the column-vectors from S has rank r. ■
If \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_n \} \) is a set of vectors in a finite-dimensional vector space
V, then S is called basis for V if:
- S spans V;
- S is linearly independent. ■
Sage has three multiplication commands for vectors: the dot and outer products (for arbitrary vectors), and
the cross product (for three dimensional vectors).
The dot product of two vectors of the same size
\( {\bf x} = \left[ x_1 , x_2 , \ldots , x_n \right] \) and
\( {\bf y} = \left[ y_1 , y_2 , \ldots , y_n \right] \) (independently whether they are columns or rows) is the number,
denoted either by \( {\bf x} \cdot {\bf y} \) or \( \left\langle {\bf x} , {\bf y} \right\rangle ,\)
\[
\left\langle {\bf x} , {\bf y} \right\rangle = {\bf x} \cdot {\bf y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n ,
\]
when entries are real, or
\[
\left\langle {\bf x} , {\bf y} \right\rangle = {\bf x} \cdot {\bf y} = \overline{x_1} y_1 + \overline{x_2} y_2 + \cdots + \overline{x_n} y_n ,
\]
when entries are complex. The dot product was first introduced by the American physicist and mathematician Josiah Willard Gibbs (1839--1903) in the 1880s.
An outer product is the tensor product of two coordinate vectors \( {\bf u} = \left[ u_1 , u_2 , \ldots , u_m \right] \) and
\( {\bf v} = \left[ v_1 , v_2 , \ldots , v_n \right] , \) denoted \( {\bf u} \otimes {\bf v} , \) is
an m-by-n matrix W such that its coordinates satisfy \( w_{i,j} = u_i v_j . \)
The outer product \( {\bf u} \otimes {\bf v} , \) is equivalent to a matrix multiplication
\( {\bf u} \, {\bf v}^{\ast} , \) (or \( {\bf u} \, {\bf v}^{\mathrm T} , \) if vectors are real) provided that u is represented as a
column \( m \times 1 \) vector, and v as a column \( n \times 1 \) vector. Here \( {\bf v}^{\ast} = \overline{{\bf v}^{\mathrm T}} . \)
For three dimensional vectors
\( {\bf a} = a_1 \,{\bf i} + a_2 \,{\bf j} + a_3 \,{\bf k} =
\left[ a_1 , a_2 , a_3 \right] \) and
\( {\bf b} = b_1 \,{\bf i} + b_2 \,{\bf j} + b_3 \,{\bf k} = \left[ b_1 , b_2 , b_3 \right] \) , it is possible to define special multiplication, called
cross-product:
\[
{\bf a} \times {\bf b} = \det \left[ \begin{array}{ccc} {\bf i} & {\bf j} & {\bf k} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3
\end{array} \right] = {\bf i} \left( a_2 b_3 - b_2 a_3 \right) - {\bf j} \left( a_1 b_3 - b_1 a_3 \right) + {\bf k} \left( a_1 b_2 - a_2 b_1 \right) .
\]
Example.
For instance, if m = 4 and n = 3, then
\[
{\bf u} \otimes {\bf v} = {\bf u} \, {\bf v}^{\mathrm T} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \begin{bmatrix} v_1 & v_2 & v_3 \end{bmatrix} =
\begin{bmatrix} u_1 v_1 & u_1 v_2 & u_1 v_3 \\ u_2 v_1 & u_2 v_2 & u_2 v_3 \\ u_3 v_1 & u_3 v_2 & u_3 v_3 \\ u_4 v_1 & u_4 v_2 & u_4 v_3 \end{bmatrix} .
\]
An inner product of two vectors of the same size, usually denoted by \( \left\langle {\bf x} , {\bf y} \right\rangle ,\) is a generalization of the dot product if it satisfies the following properties:
- \( \left\langle {\bf v}+{\bf u} , {\bf w} \right\rangle = \left\langle {\bf v} , {\bf w} \right\rangle + \left\langle {\bf u} , {\bf w} \right\rangle . \)
-
\( \left\langle {\bf v} , \alpha {\bf u} \right\rangle = \alpha \left\langle {\bf v} , {\bf u} \right\rangle \) for any scalar α.
- \( \left\langle {\bf v} , {\bf u} \right\rangle = \overline{\left\langle {\bf u} , {\bf v} \right\rangle} , \) where overline means complex conjugate.
-
\( \left\langle {\bf v} , {\bf v} \right\rangle \ge 0 , \) and equal if and only if
\( {\bf v} = {\bf 0} . \)
The fourth condition in the list above is known as the positive-definite condition. A vector space together with the inner product is called an inner product space. Every inner product space is a metric space. The metric or norm is given by
\[
\| {\bf u} \| = \sqrt{\left\langle {\bf u} , {\bf u} \right\rangle} .
\]
The nonzero vectors
u and
v of the same size are
orthogonal (or
perpendicular) when their inner product is zero:
\( \left\langle {\bf u} , {\bf v} \right\rangle = 0 . \) We abbreviate it as
\( {\bf u} \perp {\bf v} . \)
If
A is an
\( n \times n \) matrix and
u and
v are
n-vectors, then
\[
{\bf A} {\bf u} \cdot {\bf v} = {\bf u} \cdot {\bf A}^{\ast} {\bf v} \qquad\mbox{and} \qquad {\bf u} \cdot {\bf A} {\bf v} = {\bf A}^{\ast} {\bf u} \cdot {\bf v} .
\]
Example.
The Euclidean inner product and the weighted Euclidean inner product (when \( \left\langle {\bf u} , {\bf v} \right\rangle = \sum_{k=1}^n a_k u_k v_k , \)
for some positive numbers \( a_k , \ k=1,2,\ldots , n \) ) are special cases of a general class
of inner products on \( \mathbb{R}^n \) called matrix inner product. Let A be an
invertible n-by-n matrix. Then the formula
\[
\left\langle {\bf u} , {\bf v} \right\rangle = {\bf A} {\bf u} \cdot {\bf A} {\bf v}
\]
defines an inner product generated by
A.
Example.
In the set of integrable functions on an interval [a,b], we can define the inner product of two functions f and g as
\[
\left\langle f , g \right\rangle = \int_a^b \overline{f} (x)\, g(x) \, {\text d}x \qquad\mbox{or} \qquad
\left\langle f , g \right\rangle = \int_a^b f(x)\,\overline{g} (x) \, {\text d}x .
\]
Then the norm
\( \| f \| \) (also called the 2-norm) becomes the square root of
\[
\| f \|^2 = \left\langle f , f \right\rangle = \int_a^b \left\vert f(x) \right\vert^2 \, {\text d}x .
\]
In particular, the 2-norm of the function
\( f(x) = 5x^2 +2x -1 \) on the interval [0,1] is
\[
\| 2 x^2 +2x -1 \| = \sqrt{\int_0^1 \left( 5x^2 +2x -1 \right)^2 {\text d}x } = \sqrt{7} .
\]
Example.
Consider a set of polynomials of degree n. If
\[
{\bf p} = p(x) = p_0 + p_1 x + p_2 x^2 + \cdots + p_n x^n \quad\mbox{and} \quad {\bf q} = q(x) = q_0 + q_1 x + q_2 x^2 + \cdots + q_n x^n
\]
are two polynomials, and if
\( x_0 , x_1 , \ldots , x_n \) are distinct real numbers (called sample points), then the formula
\[
\left\langle {\bf p} , {\bf q} \right\rangle = p(x_0 ) q(x_0 ) + p_1 (x_1 )q(x_1 ) + \cdots + p(x_n ) q(x_n )
\]
defines an inner product, which is called the evaluation inner product at
\( x_0 , x_1 , \ldots , x_n . \) ■
sage: v = vector([1,2,3]); w = vector([0,5,-9])
sage: v.cross_product(v)
(0, 0, 0)
sage: u = v.cross_product(w); u
(-33, 9, 5)
sage: v.dot_product(u)
-17
With dot product, we can assign a length of a vector, which is also called the Euclidean norm or 2-norm:
\[
\| {\bf x} \| = \sqrt{ {\bf x}\cdot {\bf x}} = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} .
\]
In linear algebra, functional analysis, and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space—save for the zero vector, which is assigned a length of zero.
On an n-dimensional complex space
\( \mathbb{C}^n ,\) the most common norm is
\[
\| {\bf z} \| = \sqrt{ {\bf z}\cdot {\bf z}} = \sqrt{\overline{z_1} \,z_1 + \overline{z_2}\,z_2 + \cdots + \overline{z_n}\,z_n} = \sqrt{|z_1|^2 + |z_2 |^2 + \cdots + |z_n |^2} .
\]
A unit vector
u is a vector whose length equals one:
\( {\bf u} \cdot {\bf u} =1 . \) We say that two vectors
x and
y are perpendicular if their dot product is zero.
There are known many other norms, from which we mension Taxicab norm or Manhattan norm, which is also called 1-norm:
\[
\| {\bf x} \| = \sum_{k=1}^n | x_k | = |x_1 | + |x_2 | + \cdots + |x_n | .
\]
For any norm, the
Cauchy--Bunyakovsky--Schwarz (or simply CBS)
inequality holds:
\[
| {\bf x} \cdot {\bf y} | \le \| {\bf x} \| \, \| {\bf y} \| .
\]
The inequality for sums was published by Augustin-Louis Cauchy (1789--1857) in 1821, while the corresponding
inequality for integrals was first proved by Viktor Yakovlevich Bunyakovsky (1804--1889) in 1859. The modern proof
(which is actually a repetition of the Bunyakovsky's one) of the integral inequality was given by
Hermann Amandus Schwarz (1843--1921) in 1888.
With Euclidean norm, we can define the dot product as
\[
{\bf x} \cdot {\bf y} = \| {\bf x} \| \, \| {\bf y} \| \, \cos \theta ,
\]
where
\( \theta \) is the angle between two vectors. ■
How to Define Matrices
Matrices are both a very ancient and a very current mathematical concept.
References to matrices and systems of equations can be found in
Chinese manuscripts dating back to around 200 B.C. The term matrix was first used by the English mathematician James Sylvester (1814--1897), who defined the term in 1850. Over the years,
mathematicians and scientists have found many applications of matrices.
More recently, the advent of personal and large-scale computers has
increased the use of matrices in a wide variety of applications.
A matrix (plural matrices) is a rectangular array of numbers, functions, or any symbols. It can be written as
\[
{\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right] \qquad \mbox{or} \qquad {\bf A} = \left( \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right) .
\]
We denote this array by a single letter A (usually a capital boldfaced letter) or by \( \left( a_{i,j} \right) \)
or \( \left[ a_{i,j} \right] , \) depending what notation (parenthesis or brackets) is in use.
The symbol \( a_{i,j} ,\) or sometimes \( a_{ij} ,\) in the ith
row and jth column is called the \( \left( i, \, j \right) \) entry. We say that A has
m rows and n columns, and that it is an \( m \times n \) matrix. We also refer to A as a matrix of size
\( m \times n . \)
Any \( m \times n \) matrix can be considered as an array of \( n \) columns
\[
{\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right] = \left[
\left( \begin{array}{c} a_{1,1} \\
a_{2,1} \\
\vdots \\
a_{m,1} \end{array} \right) , \ \left( \begin{array}{c} a_{1,2} \\
a_{2,2} \\
\vdots \\
a_{m,2} \end{array} \right) , \ \cdots \left( \begin{array}{c} a_{1,n} \\
a_{2,n} \\
\vdots \\
a_{m,n} \end{array} \right) \right] = \left[ {\bf c}_1 , {\bf c}_2 , \ldots {\bf c}_n \right] ,
\]
or as a collection of
\( m \) rows
\[
{\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right] = \left(
\begin{array}{cccc} \langle a_{1,1} , a_{1,2} , \cdots , a_{1,n} \rangle \\
\langle a_{2,1} , a_{2,2} , \cdots , a_{2,n} \rangle \\
\vdots \\
\langle a_{m,1} , a_{m,2} , \cdots , a_{m,n} \rangle \end{array} \right) = \left[
\begin{array}{c} {\bf r}_1 \\ {\bf r}_2 \\ \vdots \\ {\bf r}_m \end{array} \right] .
\]
Here the column vector
\( {\bf c}_i = \langle a_{1,i} , a_{2,i} , \ldots , a_{m,i} \rangle^T \) in
ith row contains entries of matrix
A in
ith column.
Correspondingly, the row vector
\( {\bf r}_j = \langle a_{j,1} , a_{j,2} , \ldots , a_{j,n} \rangle \) in
jth column contains entries of matrix
A in
jth row.
Before we can discuss arithmetic operations for matrices, we have to define equality
for matrices. Two matrices are equal if they have the same size and their corresponding
elements are equal. A matrix with elements that are all 0’s is called a zero or null matrix. A null matrix usually is indicated as 0.
Another very important type of matrices are square matrices that have the same number of rows as columns. In particular, a square matrix having all elements equal to zero except those on the principal diagonal is called a diagonal matrix.
Matrices are fundamental objects in linear algebra and in Sage, so
there are a variety of ways to construct a matrix in Sage.
Generally, you need to specify what types of entries the matrix
contains (more on that to come), the number of rows and columns,
and the entries themselves. First, let’s dissect an example:
sage: B = matrix(QQ, [[1, 2, 3], [4, 5, 6]]); B
[ 1 2 3 ]
[ 4 5 6 ]
Here QQ
is the set of all rational numbers (fractions with an integer numerator and
denominator),
2
is the number of rows,
3
is the number of columns. Sage understands
a list of items as delimited by brackets (
[,]) and the items in the list can again be lists themselves. So
[[1, 2, 3], [4, 5, 6]]
is a list of lists, and in this context
the inner lists are rows of the matrix.
There are various shortcuts you can employ when creating a matrix. For example,
Sage is able to infer the size of the matrix from the lists of entries.
Now, let's enter a matrix.
sage: M=matrix(QQ,[[2,4,0,8],[-1,3,3,-2],[0,1,1,0]]); M
Or you can specify how many rows the matrix will have and provide one big grand
list of entries, which will get chopped up, row by row, if you prefer.
sage: C = matrix(QQ, 2, [1, 2, 3, 4, 5, 6]); C
[1 2 3]
[4 5 6]
Matrices with symbolic entries:
sage: matrix(SR, 2, 2, range(4))
[0 1]
[2 3]
sage: matrix(SR, 2, 2, var('t'))
[t 0]
[0 t]
The function matrix() or Matrix() is used to do this. For now the
first argument to the function should be "QQ", this means that the matrix will
contain either integers or rational numbers (fractions).
sage: C.parent()
Full MatrixSpace of 2 by 3 dense matrices over Integer Ring
sage: A = matrix(2, 3, [[1, cos(3.14), 3], [4, 5, 6]])
sage: A.parent()
Full MatrixSpace of 2 by 3 dense matrices over
Real Field with 53 bits of precision
The matrix A is defined with two rows and three columns. Notice how the matrix
was specified row-by-row, with each row inside a pair of square brackets and all
three rows enclosed in another set of square brackets; commas are used to separate
matrix entries and rows.
Computer scientists and computer languages prefer to begin counting from zero,
while mathematicians and written mathematics prefer to begin counting at one.
Perhaps the most confusing thing about using Sage for matrix work is that rows and
columns are numbered starting at 0 rather than 1 as is usually done for matrices
in mathematical work. This means that the rows in M are numbered 0, 1, and 2
while the columns are numbered 0, 1, 2, and 3. For example, we can access the
first and second rows with:
sage: M.row(0); M.row(1)
sage: B.nrows(), B.ncols()
(2, 3)
sage: B.base_ring()
Rational Field
sage: B[1,1]
5
Notice how the function row() is used; it is "attached" to the matrix varible with
a dot. This means that the row function operates on the matrix M.
A matrix can be defined by a formula using python command lambda :
sage: m = matrix(QQ, 3, 3, lambda i, j: i+j); m
[0 1 2]
[1 2 3]
[2 3 4]
sage: m = matrix(3, lambda i,j: i-j); m
[ 0 -1 -2]
[ 1 0 -1]
[ 2 1 0]
sage: matrix(QQ, 2, 3, lambda x, y: x+y)
[0 1 2]
[1 2 3]
sage: matrix(QQ, 5, 5, lambda x, y: (x+1) / (y+1))
[ 1 1/2 1/3 1/4 1/5]
[ 2 1 2/3 1/2 2/5]
[ 3 3/2 1 3/4 3/5]
[ 4 2 4/3 1 4/5]
[ 5 5/2 5/3 5/4 1]
Execute the following three compute cells in the Sage
notebook, and notice how the three matrices are constructed to have entries from the
integers, the rationals and the reals.
The norm of a matrix may be thought of as its size because it is a nonnegative number. Matrix norms are directly related to vector norms.
The definitions are summarized below for an \( n \times n \) matrix A.
\[
{\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n,1} & a_{n,2} & \cdots & a_{n,n} \end{array} \right] .
\]
The operator norm corresponding to the p-norm for vectors, p ≥ 1, is:
\[
\| {\bf A} \|_p = \sup_{{\bf x} \ne 0} \, \frac{\| {\bf A}\,{\bf x} \|_p}{\| {\bf x} \|_p} ,
\]
where
\( \| {\bf x} \|_p = \left( x_1^p + x_2^p + \cdots + x_n^p \right)^{1/p} .\)
The most important norms are
1-norm (is commonly known as the maximum column sum norm) of a matrix A may be computed as
\[
\| {\bf A} \|_1 = \max_{1 \le j \le n} \,\sum_{i=1}^n | a_{i,j} | .
\]
The infinity norm,
\( \infty - \) norm of matrix
A may be computed as
\[
\| {\bf A} \|_{\infty} = \max_{1 \le i \le n} \,\sum_{j=1}^n | a_{i,j} | ,
\]
which is simply the maximum absolute row sum of the matrix.
In the special case of
p = 2 we get the Euclidean norm (which is equal to the largest singular value of a matrix)
\[
\| {\bf A} \|_2 = \sup_{\bf x} \left\{ \| {\bf A}\, {\bf x} \|_2 \, : \quad \mbox{with} \quad \| {\bf x} \|_2 =1 \right\} .
\]
The Frobenius norm:
\[
\| {\bf A} \|_F = \left( \sum_{i=1}^m \sum_{j=1}^n |a_{i.j} |^2 \right)^{1/2} = \left( \mbox{tr}\, {\bf A} \,{\bf A}^{\ast} \right)^{1/2} .
\]
Some properties of the matrix norms are presented in the following
Theorem: Let A and B be \( n \times n \) matrices
and let \( k \) be a scalar.
- \( \| {\bf A} \| \ge 0 \) for any square matrix A.
- \( \| {\bf A} \| =0 \) if and only if the matrix A is zero: \( {\bf A} = {\bf 0}. \)
- \( \| k\,{\bf A} \| = |k| \, \| {\bf A} \| \) for any scalar \( k. \)
- \( \| {\bf A} + {\bf B}\| \le \| {\bf A} \| + \| {\bf B} \| .\)
- \( \| {\bf A} \, {\bf B}\| \le \| {\bf A} \| \, \| {\bf B} \| \)
The Frobenius norm:
\[
\| {\bf A} \|_F = \left( \sum_{i=1}^m \sum_{j=1}^n |a_{i.j} |^2 \right)^{1/2} = \left( 1+49+4+9 \right)^{1/2} = \sqrt{63} = \left( \mbox{tr}\, {\bf A} \,{\bf A}^{\ast} \right)^{1/2} .
\]
Example. Evaluate the norms of the matrix
\( {\bf A} = \left[ \begin{array}{cc} 1 & -7 \\ -2 & -3 \end{array} \right] . \)
The absolute column sums of A are \( 1 + | -2 | =3 \) and \( |-7| + | -3 | =10 . \)
The larger of these is 10 and therefore \( \| {\bf A} \|_1 = 10 . \)
The absolute row sums of A are \( 1 + | -7 | =8 \) and
\( | -2 | + |-3| = 5 , \) therefore, \( \| {\bf A} \|_{\infty} = 8 . \)
The Euclidean norm of A
is
\[
\| {\bf A} \|_2 = \sup_{\bf x} \left\{ \, \sqrt{(x_1 - 7\,x_2 )^2 + (2\,x_1 + 3\,x_2 )^2} \, : \quad \mbox{with} \quad x_1^2 + x_2^2 =1 \right\} .
\]
To find its exact value, we evaluate the product
\[
{\bf A}\,{\bf A}^{\ast} = \left[ \begin{array}{cc} 1 & -7 \\ -2 & -3 \end{array} \right] \,
\left[ \begin{array}{cc} 1 & -2 \\ -7 & -3 \end{array} \right] =
\left[ \begin{array}{cc} 50 & 19 \\ 19 & 13 \end{array} \right] .
\]
This matrix \( {\bf A}\,{\bf A}^{\ast} \) has two eigenvalues
\( \frac{1}{2} \left( 63 \pm \sqrt{2813} \right) . \) Hence, the Euclidean norm of the matrix A is
\( \sqrt{\frac{1}{2} \left( 63 + \sqrt{2813} \right)} \approx 7.61701, \) and its Frobenius norm is
\( \sqrt{63} \approx 7.93725 . \)
sage: M=matrix(QQ,[[2,4,0,8],[-1,3,3,-2],[0,1,1,0]]); M
Basic Operations with Matrices
Our first example deals with economics. Let us consider two families Anderson (A) and Boichuck (B) that have expenses every month
such as: utilities, health, entertainment, food, etc... . Let us restrict ourselves to: food, utilities, and health.
How would one represent the data collected? Many ways are available but one of them has an advantage of combining the data so that it is easy to manipulate them.
Indeed, we will write the data as follows:
\[
\mbox{Month} = \begin{bmatrix} \mbox{Family} & \mbox{Food} & \mbox{Utilities} & \mbox{Entertainment} \\
\mbox{A} & f_1 & u_1 & e_1 \\
\mbox{B} & f_2 & u_2 & e_2 \end{bmatrix} .
\]
If we have no problem confusing the names of the families and what the expenses are, then we may record them in matrix form:
\[
\mbox{Month} = \begin{bmatrix} f_1 & u_1 & e_1 \\
f_2 & u_2 & e_2 \end{bmatrix} .
\]
The size of the matrix, as a block, is defined by the number of Rows and the number of Columns.
In this case, the above matrix has 2 rows and 3 columns. In our case, we say that the matrix is a
\( m \times n \) matrix (pronounce m-by-n matrix).
Keep in mind that the first entry (meaning m) is the number of rows while the second entry (n) is the number of columns. Our above matrix is a (\( 2\times 3 \) ) matrix.
Let us assume, for example, that the matrices for the months of July, August, and September are
\[
{\bf J} = \begin{bmatrix} 650 & 125 & 50 \\
600 & 150 & 60 \end{bmatrix} , \qquad {\bf A} = \begin{bmatrix} 700 & 250 & 150 \\
650 & 200 & 80 \end{bmatrix} , \qquad \mbox{and} \qquad {\bf S} = \begin{bmatrix} 750 & 300 & 200 \\
650 & 275 & 120 \end{bmatrix} ,
\]
respectively. The next question may sound easy to answer, but requires a new concept in the matrix context.
What is the matrix-expense for the two families for the summer? The idea is to add the three matrices above by adding the corresponding entries:
\begin{align*}
\mbox{Summer} &= {\bf J} + {\bf A} + {\bf S}
= \begin{bmatrix} 21000 & 675 & 400 \\
20000 & 775 & 260 \end{bmatrix} .
\end{align*}
We can summarize addition or subtraction of matrices of the same size by using arithmetic operations on the corresponding elements:
\[
{\bf A} \pm {\bf B} = \left[ a_{i,j} \right] \pm \left[ b_{i,j} \right] = \left[ a_{i,j} \pm b_{i,j} \right] .
\]
Clearly, if you want to double a matrix, it is enough to add the matrix to itself. So we have
\[
\mbox{double of } \, \begin{bmatrix} f_1 & u_1 & e_1 \\
f_2 & u_2 & e_2 \end{bmatrix} = \begin{bmatrix} f_1 & u_1 & e_1 \\
f_2 & u_2 & e_2 \end{bmatrix} + \begin{bmatrix} f_1 & u_1 & e_1 \\
f_2 & u_2 & e_2 \end{bmatrix} = 2\, \begin{bmatrix} f_1 & u_1 & e_1 \\
f_2 & u_2 & e_2 \end{bmatrix} = \begin{bmatrix} 2\,f_1 & 2\,u_1 & 2\,e_1 \\
2\,f_2 & 2\,u_2 & 2\,e_2 \end{bmatrix} .
\]
Therefore, to multiply by a scalar, one needs to multiply by this scalar every entry. What about subtracting two matrices?
It is easy, since subtraction is a combination of the two above rules. Indeed, if A and B are two matrices of the same size,
then we will write
\[
{\bf A} - {\bf B} = {\bf A} + (-1)\, {\bf B} .
\]
The negative of a matrix M, denoted by \( -{\bf M} ,\) is a matrix with elements that are
the negatives of the elements in M. ■
Now we are going to introduce matrix multiplication that may at first seem rather
strange. We don't know exactly who or when the multiplication of matrices was invented. At least we know that the work of 1812 by Jacques Philippe Marie Binet
(1786--1856) contains the definition of the product of matrices.
If A is an \( m \times r \) matrix and B is \( r \times n \) matrix, then the product
\( {\bf A}\,{\bf B} \) is the \( m \times n \) matrix whose entries are determined as follows.
To find the entry in row i and column j of \( {\bf A}\,{\bf B} \) , single out row i
from the matrix A and column j from the matrix B. Take the dot product of the corresponding vectors of size r
and put it into \( (i,j) \) spot of product \( {\bf A}\,{\bf B} .\)
Example.
Consider the matrices
\[
{\bf A} = \begin{bmatrix} -1 & 0 \\ 2 & 3 \end{bmatrix} \qquad \mbox{and} \qquad {\bf B} = \begin{bmatrix} 1 & 2 \\ 3 & 0 \end{bmatrix} .
\]
Multiplying gives
\[
{\bf A}\, {\bf B} = \begin{bmatrix} -1 & -2 \\ 11 & 4 \end{bmatrix} \qquad \mbox{and} \qquad {\bf B}\,{\bf A} = \begin{bmatrix} 3 & 6 \\ -3 & 0 \end{bmatrix} .
\]
Thus, \( {\bf A}\,{\bf B} \ne {\bf B}\,{\bf A} . \) Note that one of the products (\( {\bf A}\,{\bf B} \) or \( {\bf B}\,{\bf A} \) ) may exit, but another not. ■
The n ext important operation is transposition, which changes row into columns.
sage: A = matrix(QQ,3,3,[2,-1,1,1,-3,3,5,-2,-3])
sage: A
[ 2 -1 1]
[ 1 -3 3]
[ 5 -2 -3]
sage: A.transpose()
[ 2 1 5 ]
[ -1 -3 -2]
[ 1 3 -3]
.T is a convenient shortcut for the transpose:
sage: A.T
[ 2 1 5 ]
[ -1 -3 -2]
[ 1 3 -3]
Theorem: If the sizes of the matrices are such that the stated operations can be performed, then:
- \( \left({\bf A}^{\mathrm T} \right)^{\mathrm T} = {\bf A} \) for any matrix A;
- \( \left( {\bf A} + {\bf B} \right)^{\mathrm T} = {\bf A}^{\mathrm T} + {\bf B}^{\mathrm T} .\)
- \( \left( {\bf A} - {\bf B} \right)^{\mathrm T} = {\bf A}^{\mathrm T} - {\bf B}^{\mathrm T} .\)
- \( \left( k\,{\bf A} \right)^{\mathrm T} = k\, {\bf A}^{\mathrm T} .\)
- \( \left( {\bf A} \, {\bf B}\right)^{\mathrm T} = {\bf B}^{\mathrm T} \, {\bf A}^{\mathrm T} . \)
sage: m = identity_matrix(3)
sage: m
[1 0 0]
[0 1 0]
[0 0 1]
sage: m2= m.insert_row(3, [1,2,3])
sage: m2
[1 0 0]
[0 1 0]
[0 0 1]
[1 2 3]
sage: m.insert_row(2, [1,2,3])
[1 0 0]
[0 1 0]
[1 2 3]
[0 0 1]
sage: m.swap_rows(1,2); m
[1 0 0]
[0 0 1]
[0 1 0]
sage: m.swap_columns(0,1); m
[0 1 0]
[1 0 0]
[0 0 1]
Often you want to give matrices names:
sage: M = matrix([[1,2,3],[4,5,6]])
sage: I = identity_matrix(3)
If a matrix A has complex entries, its complex conjugate is denoted by \( \overline{\bf A} .\)
The conjugate transpose of A, denoted by \( {\bf A}^{\ast} , \) is defined by
\[
{\bf A}^{\ast} = \overline{\bf A}^{\mathrm T} = \overline{{\bf A}^{\mathrm T}} .
\]
The matrix \( {\bf A}^{\ast} , \) is called adjoint to the matrix A. If \( {\bf A}^{\ast} = {\bf A}, \) then matrix A is called
self-adjoint or Hermitian. ■
Example.
The matrix
\[
{\bf A} = \begin{bmatrix} 1 & {\bf j} \\ {\bf j} & 2 \end{bmatrix} ,
\]
where j is a unit vector on the complex plane in positive vertical direction (so \( {\bf j}^2 =- 1 \) ), is symmetric but not self-adjoint.
Its adjoint is
\[
{\bf A}^{\ast} = \begin{bmatrix} 1 & -{\bf j} \\ -{\bf j} & 2 \end{bmatrix} ,
\]
and the products
\[
{\bf A}^{\ast} {\bf A} = \begin{bmatrix} 2 & -{\bf j} \\ {\bf j} & 5 \end{bmatrix} \qquad\mbox{and}\qquad {\bf A}\, {\bf A}^{\ast} = \begin{bmatrix} 2 & {\bf j} \\ -{\bf j} & 5 \end{bmatrix}
\]
are self-adjoint matrices. ■
Sage has a matrix method,
.augment()
, that will join two matrices side-by-side
provided they both have the same number of rows. The same method will allow you
to augment a matrix with a column vector.
Some methods allow optional input, typically using keywords. Matrices can track
subdivisions, making breaks between rows and/or columns. When augmenting, you
can ask for the subdivision to be included. Evalute the compute cell above if you
have not already, so that
A and
b are defined, and then evaluate:
sage: M = A.augment(b, subdivide=True)
sage: M
[1-1 2|1]
[2 1 1|8]
[1 1 0|5]
sage: v = vector(QQ, 3, [1, 2, 3])
sage: m3=m.augment(v)
sage: m3
[1 0 0 1]
[0 1 0 2]
[0 0 1 3]
sage: m3[1,:]
[0 1 0 2]
sage:m3[:,3]
[1]
[2]
[3]
sage:mm3[:,2]
[0]
[0]
[1]
To insert a column, take the transpose, insert a row, and take the
transpose again.
sage: m3=m2.matrix_from_columns([0,1,3])
sage: m3
sage:m2=m2.transpose(); m2
[1 0 0 1]
[0 1 0 2]
[0 0 1 3]
sage: m = identity_matrix(3)
sage: m2= m.insert_row(3, [1,2,3])
To substitute the first column in a matrix with a given column-vector,
use the following command:
sage: m[:,0] = vector([5,4,3])
sage: m
[5 0 0]
[4 1 0]
[3 0 1]
Similarly,
sage: m = identity_matrix(3)
sage: m[:,1] = vector([5,4,3]); m
[1 5 0]
[0 4 0]
[0 3 1]
It is frequently necessary to deal separately with various groups of elements or blocks/submatrices within a larger matrix.
This situation can asire when the size of a matrix becomes too large for convenient handling, and it becomes imperative to work with only a portion of the matrix.
Also, there will be cases in which one part of a matrix will have a physical significance that is different from the remainder,
and it is instructive to isolate that portion and identify it by a special symbol.
Any matrix can be interpreted as having been broken into sections called blocks or submatrices.
A matrix interpreted as a block matrix can be visualized as the original matrix with inserting horizontal and vertical lines between selected rows and columns, which break it up, or partition it, into a collection of smaller matrices.
For example, the following are three possible partitions of a general \( 3 \times 4 \) matrix---the first
is a partition of A into four submatrices \( {\bf A}_{11}, \ {\bf A}_{12}, \ {\bf A}_{21} , \ \mbox{and} \ {\bf A}_{22} ; \)
the second is a partition of A into its row vectors \( {\bf r}_1, \ {\bf r}_2 , \ \mbox{and}\ {\bf r}_3 ; \) and
the third is a partition of A into its column vectors \( {\bf c}_1 , \ {\bf c}_2 , \ {\bf c}_3 , \ \mbox{and}\ {\bf c}_4 : \)
\begin{align*}
{\bf A} &= \left[ \begin{array}{ccc|c} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ \hline
a_{31} & a_{32} & a_{33} & a_{34} \end{array} \right] = \left[ \begin{array}{cc} {\bf A}_{11} & {\bf A}_{12} \\
{\bf A}_{21} & {\bf A}_{22} \end{array} \right] ,
\\
{\bf A} &= \left[ \begin{array}{cccc} a_{11} & a_{12} & a_{13} & a_{14} \\ \hline
a_{21} & a_{22} & a_{23} & a_{24} \\ \hline
a_{31} & a_{32} & a_{33} & a_{34} \end{array} \right] = \left[ \begin{array}{c} {\bf r}_1 \\ {\bf r}_2 \\ {\bf r}_3 \end{array} \right] ,
\\
{\bf A} &= \left[ \begin{array}{c|c|c|c} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\
a_{31} & a_{32} & a_{33} & a_{34} \end{array} \right] = \left[ \begin{array}{cccc} {\bf c}_1 & {\bf c}_2 & {\bf c}_3 & {\bf c}_4 \end{array} \right] .
\end{align*}
Example. Multiply two matrices using partition procedure.
\[
{\bf A} = \left[ \begin{array}{cc|c} 3&0&-1 \\ -5&2&4 \\ \hline -2&-6&3 \end{array} \right] =
\begin{bmatrix} {\bf A}_{11} & {\bf A}_{12} \\ {\bf A}_{21} & {\bf A}_{22} \end{bmatrix} \qquad \mbox{and} \qquad {\bf B} =
\left[ \begin{array}{cc} 1&2 \\ -3&4 \\ \hline 2&1 \end{array} \right] = \begin{bmatrix} {\bf B}_1 \\ {\bf B}_2 \end{bmatrix} .
\]
Using partition, we have
\[
{\bf A} \, {\bf B} = \begin{bmatrix} {\bf A}_{11} & {\bf A}_{12} \\ {\bf A}_{21} & {\bf A}_{22} \end{bmatrix} \times
\begin{bmatrix} {\bf B}_1 \\ {\bf B}_2 \end{bmatrix} = \begin{bmatrix} {\bf A}_{11} \,{\bf B}_1 + {\bf A}_{12}\, {\bf B}_2 \\
{\bf A}_{21} \, {\bf B}_1 + {\bf A}_{22} \, {\bf B}_2 \end{bmatrix} ,
\]
where
\begin{align*}
{\bf A}_{11} \, {\bf B}_1 &= \begin{bmatrix} 3&0 \\ -5& 2 \end{bmatrix} \times \begin{bmatrix} 1&2 \\ -3&4 \end{bmatrix} = \begin{bmatrix} 3\times 1 + 0 \times (-3) & 3\times 2 + 0 \times 4 \\ -5\times 1 + 2\times (-3) & -5\times 2 + 2\times 4 \end{bmatrix} = \begin{bmatrix} 3&6 \\ -11& -2 \end{bmatrix} , \\
\\
{\bf A}_{12} \, {\bf B}_2 &= \begin{bmatrix} -1 \\ 4 \end{bmatrix} \times \begin{bmatrix} 2 & 1 \end{bmatrix} = \begin{bmatrix} -1\times 2 & -1\times 1 \\ 4\times 2 & 4\times 1 \end{bmatrix} = \begin{bmatrix} -2& -1 \\ 8 & 4 \end{bmatrix} , \\
\\
{\bf A}_{21} \, {\bf B}_1 &= \begin{bmatrix} -2 & -6 \end{bmatrix} \times \begin{bmatrix} 1& 2 \\ -3& 4 \end{bmatrix} = \begin{bmatrix} -2\times 1 - 6 \times (-3) & -2 \times 2 -6 \times 4 \end{bmatrix} = \begin{bmatrix} 16 & -28 \end{bmatrix} , \\
\\
{\bf A}_{22} \, {\bf B}_2 &= \begin{bmatrix} 3\end{bmatrix} \times \begin{bmatrix} 2& 1\end{bmatrix} = \begin{bmatrix} 3\times 2 & 3\times 1\end{bmatrix} = \begin{bmatrix} 6& 3 \end{bmatrix} , \\
\\
{\bf A}_{11} \, {\bf B}_1 + {\bf A}_{12} \, {\bf B}_2 &= \begin{bmatrix} 3& 6 \\ -11& 2 \end{bmatrix} + \begin{bmatrix} -2& -1 \\ 8&4 \end{bmatrix} = \begin{bmatrix} 1& 5 \\ -3 & 2 \end{bmatrix} ,
\\
{\bf A}_{21} \, {\bf B}_1 + {\bf A}_{22} \, {\bf B}_2 &= \begin{bmatrix} 16 & -28 \end{bmatrix} + \begin{bmatrix} 6&3\end{bmatrix} = \begin{bmatrix} 22& -25\end{bmatrix} .
\end{align*}
Finally, we get
\[
{\bf A} \, {\bf B} = \left[ \begin{array}{cc} 1&5 \\ -3&2 \\ \hline 22&-25 \end{array} \right] . \qquad ■
\]
Partition has many uses, some of them are for finding particular rows or columns of a matrix product \( {\bf A}\,{\bf B} \)
without computing the entire product, or determing the inverse matrix. It is possible to use a block partitioned matrix product that involves only algebra on submatrices of the factors.
The partitioning of the factors is not arbitrary, however, and the dimensions of partition matrices
A and B should match up such that all submatrix products that will be used are defined. Given an
\( m \times n \) matrix A with q row partitions and s column partitions
\[
{\bf A} = \left[ \begin{array}{cccc} {\bf A}_{11} & {\bf A}_{12} & \cdots & {\bf A}_{1s} \\
{\bf A}_{21} & {\bf A}_{22} & \cdots & {\bf A}_{2s} \\
\vdots & \vdots & \ddots & \vdots \\
{\bf A}_{q1} & {\bf A}_{q2} & \cdots & {\bf A}_{qs} \end{array} \right]
\]
and an \( n \times k \) matrix B with s row partitions and rr
column partitions
\[
{\bf B} = \left[ \begin{array}{cccc} {\bf B}_{11} & {\bf B}_{12} & \cdots & {\bf B}_{1r} \\
{\bf B}_{21} & {\bf B}_{22} & \cdots & {\bf B}_{2s} \\
\vdots & \vdots & \ddots & \vdots \\
{\bf B}_{s1} & {\bf B}_{s2} & \cdots & {\bf B}_{sr} \end{array} \right]
\]
that are compatible with the partitions of A, the matrix product
\[
{\bf C} = {\bf A}\,{\bf B}
\]
can be formed blockwise, yielding C as an \( m \times n \) matrix with
q row partitions and r column partitions. The matrices in the resulting matrix C are calculated
by multiplying:
\[
{\bf C}_{i,j} = \sum_{\gamma =1}^s {\bf A}_{i,\gamma} {\bf B}_{\gamma , j} .
\]
Let us consider two matrices A and B, where A has m rows and B has n columns.
We assume that dimensions of these two matrices allow us to multiply them. Then we partition matrix A into row vectors and B into column vectors:
\[
{\bf A} = \left[ \begin{array}{c} {\bf a}_1 \\ {\bf a}_2 \\ \vdots \\ {\bf a}_m \end{array} \right] \qquad\mbox{and} \qquad {\bf B} =
\left[ \begin{array}{cccc} {\bf b}_1 & {\bf b}_2 & \cdots & {\bf b}_n \end{array} \right] .
\]
Now we compute their product as
\[
{\bf A} \,{\bf B} = {\bf A} \left[ {\bf b}_1 , \ {\bf b}_2 , \ \cdots , \ {\bf b}_n \right] =
\left[ {\bf A} \,{\bf b}_1 , \ {\bf A} \,{\bf b}_2 , \ \cdots , \ {\bf A} \,{\bf b}_n \right] ,
\]
or as
\[
{\bf A} \,{\bf B} = \left[ \begin{array}{c} {\bf a}_1 \\ {\bf a}_2 \\ \vdots \\ {\bf a}_m \end{array} \right] {\bf B} =
\left[ \begin{array}{c} {\bf a}_1 \,{\bf B} \\ {\bf a}_2 \,{\bf B} \\ \vdots \\ {\bf a}_m \,{\bf B} \end{array} \right] .
\]
Theorem: If A is an \( m \times n \) matrix, and if \( {\bf x} =
\left[ x_1 , x_2 , \ldots , x_n \right]^{\mathrm T} \) is an \( n \times 1 \) column
vector, then the product \( {\bf A}\,{\bf x} \) can be expressed as a linear combination of the column vectors of A
in which the coefficients are the entries of x:
\[
{\bf A} \,{\bf x} = x_1 \left( \mbox{column 1} \right) + x_2 \left( \mbox{column 2} \right) + \cdots + x_n \left( \mbox{column}\ n \right) .
\]
As a partial demonstration of manipulating subdivisions of matrices, we can reset
the subdivisions of M with the .subdivide()
method. We provide a list of rows to
subdivide before, then a list of columns to subdivide
before, where we remember that
counting begins at zero.
sage: M.subdivide([1,2],[1])
sage: M
[1|-1 2 1]
[--+--------]
[2|1 1 8]
[--+--------]
[1|1 0 5]
Sage will perform individual row operations on a matrix. This can get a bit tedious,
but it is better than doing the computations by hand, and it can
be useful when building up more complicated procedures for a matrix.
For each row operation, there are two similar methods. One changes the matrix
“in-place” while the other creates a new matrix that is a modified version of the
original. This is an important distinction that you should understand for every new
Sage command you learn that might change a matrix or vector.
Consider the first row operation, which swaps two rows. There are two matrix
methods to do this, a “with” version that will create a new, changed matrix, which
you will likely want to save, and a plain version that will change the
matrix it operates on “in-place.” The
copy()
function, which is a general-purpose command, is a way
to make a copy of a matrix before you make changes to it. Study the example below
carefully, and then read the explanation following. (Remember that
counting begins with zero.)
Sage has built-in commands that will solve a linear system of equations, given a
coefficient matrix and a vector of constants. We need to learn some more theory
before we can entirely understand this command, but we can begin to explore its use.
For now, consider these methods experimental and do not let it replace row-reducing
augmented matrices.
\[
{\bf A}\, {\bf x} = {\bf b} .
\]
The command A.solve_right(b)
will provide information about solutions
to the linear system of equations with coefficient matrix A and vector
of constants
b.
The reason for the “right” (and the corresponding command named with “left”) will
have to wait for a while. For now, it is generally correct in this course to use the
“right” variant of any Sage linear algebra command that has both “left” and “right”
variants. Lets apply the
.solve_right()
command to a system with no solutions:
An important use of Schur complements is the partitioned solution of linear systems. In fact this
is the most important application in Finite Element Method (FEM for short), in connection with superelement analysis. Suppose that
we have the linear system \( {\bf M}\,{\bf z} = {\bf r} , \) in which the given coefficient matrix M is partitioned
into 2 by 2 blocks: \( {\bf M} = \begin{bmatrix} {\bf A} & {\bf B} \\ {\bf C} & {\bf D} \end{bmatrix} . \)
Suppose that the rigth-hand side given vector is also broken as \( {\bf r} = [{\bf p}\ {\bf q}]^T , \)
while \( {\bf z} = [{\bf x}\ {\bf y}]^T \) is unknown. The system is equivalent to
the pair of vector equations:
\[
\begin{split}
{\bf A}\, {\bf x} + {\bf B}\, {\bf y} = {\bf p} , \\
{\bf C}\, {\bf x} + {\bf D}\, {\bf y} = {\bf q} . \end{split}
\]
If D is nonsingular, eliminating y and solving for x yields
\[
{\bf x} = \left( {\bf A} - {\bf B}\,{\bf D}^{-1} {\bf C} \right)^{-1} \left( {\bf p} - {\bf B} \,{\bf D}^{-1} {\bf q} \right) =
\left( {\bf M} / {\bf D} \right)^{-1} \left( {\bf p} - {\bf B} \,{\bf D}^{-1} {\bf q} \right) ,
\]
where \( {\bf M} / {\bf D} = {\bf A} - {\bf B}\,{\bf D}^{-1} {\bf C} \) is the Schur complement.
Similarly, if A is nonsingular, eliminating x and solving for y yields
\[
{\bf y} = \left( {\bf D} - {\bf C}\,{\bf A}^{-1} {\bf B} \right)^{-1} \left( {\bf q} - {\bf C} \,{\bf A}^{-1} {\bf p} \right) =
\left( {\bf M} / {\bf A} \right)^{-1} \left( {\bf q} - {\bf C} \,{\bf A}^{-1} {\bf p} \right) ,
\]
where \( {\bf M} / {\bf A} = {\bf D} - {\bf C}\,{\bf A}^{-1} {\bf B} . \)
Eigenvalues and Eigenvectors
If A is a square \( n \times n \) matrix and v is an \( n \times 1 \)
column vector, then the product \( {\bf A}\,{\bf v} \) is defined and is another \( n \times 1 \)
column vector. It is important in many applications to determine whether there exist nonzero column vectors v such
that the product vector \( {\bf A}\,{\bf v} \) is a constant multiple \( \lambda . \)
Let A be a square \( n \times n \) matrix. A number λ is said to be an eigenvalue of A if there exists a nonzero solution vector
v of the linear system
\[
{\bf A}\,{\bf v} = \lambda {\bf v} .
\]
The solution vector v is said to be an eigenvector corresponding to the eigenvalue λ. The set of all eigenvectors
corresponding to the eigenvalue together with the zero vector is called the eigenspace of the eigenvalue λ.
The set of all eigenvalues of a square matrix A is called spectrum of A and
usually denoted by \( \sigma ({\bf A}) . \) In other words, the eigenspace of the
eigenvalue λ is the null space (or kernel) of the square matrix \( \lambda {\bf I} - {\bf A} . \)
If A is an \( n \times n \) matrix, then λ is an eigenvalue of A if and only if it satisfies the equation
\[
\det \left( \lambda {\bf I} - {\bf A}\right) = 0 .
\]
This is called the characteristic equation and the polynomial \( \chi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A}\right) \)
of degree n is called the characteristic polynomial of square matrix A.
An eigenvalue \( \lambda_0 \)
of a square matrix A has algebraic multiplicity k if k is the largest integer such that \( (\lambda - \lambda_0 )^k \) is
a factor of the characteristic polynomial of A. In other words, k is the first derivative with respect to λ of the characteristic polynomial that is not zero at \( \lambda = \lambda_0 . \)
The geometric multiplicity of an eigenvalue \( \lambda_0 \) of a matrix A
is the dimension of the eigenspace of \( \lambda_0 . \) That is, the dimension of the null space of
\( \lambda_0 {\bf I} - {\bf A} . \) In other words, the geometrical multiplicity counts the independent eigenvectors for
\( \lambda_0 . \)
Theorem: A square matrix A is invertible if and only if \( \lambda =0 \) is not an eigenvalue.
Theorem: Let A be a nonsingular square matrix.
If λ is an eigenvalue of A with corresponding eigenvector v, then \( \lambda^{-1} = 1/\lambda \)
is an eigenvalue of \( {\bf A}^{-1} \) with the same corresponding eigenvector v.
Theorem: Let A be a square matrix and \( p(\lambda ) = p_m \,\lambda^m + p_{m-1} \,\lambda^{m-1} + \cdots + p_0 \)
be a polynomail of degree m.
If λ is an eigenvalue of A with corresponding eigenvector v, then \( p(\lambda ) \)
is an eigenvalue of \( p\left( {\bf A} \right) \) with the same corresponding eigenvector v.
Theorem: The eigenvalues of an upper triangular, lower triangular, and diagonal matrix are the main diagonal entries.
Theorem: The determinant of a square matrix A is equal to the product of its eigenvalues. The sum of all eigenvalues is equal to the trace of the matrix.
Theorem: If A is a self-adjoint or real symmetric matrix, then the eigenvalues of A are real.
Theorem: Let A be an \( n \times n \) self-adjoint matrix.
Then eigenvectors corresponding to distinct (different) eigenvalues are orthogonal.
Theorem: An \( n \times n \) matrix A is orthogonal
(that is, \( {\bf A}^{-1} = {\bf A}^T \) ) if and only if its columns
\( {\bf c}_1 , \ {\bf c}_2 , \ \ldots , \ {\bf c}_n \) form an orthonormal set.
■
Diagonalization
We show how to define a function of a square matrix using diagonalization procedure. This method is applicable only for such matrices, and
is not suitable for defective matrices. Recall that a matrix A is called diagonalizable if there exists a nonsingular matrix S such that
\( {\bf S}^{-1} {\bf A} {\bf S} = {\bf \Lambda} , \) a diagonal matrix. In other words, the matrix A is similar to a diagonal matrix.
An \( n \times n \) square matrix is diagonalizable if and only if there exist n linearly independent eigenvectors, so geometrical
multiplicities of each eigenvalue are the same as its algebraic multiplicities. Then the matrix S can be built from eigenvectors of A,
column by column.
Let A be a square \( n \times n \) diagonalizable matrix, and let Λ be the corresponding diagonal matrix of its eigenvalues:
\[
{\bf \Lambda} = \begin{bmatrix} \lambda_1 & 0 & 0 & \cdots & 0 \\ 0&\lambda_2 & 0& \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0&0&0& \cdots & \lambda_n \end{bmatrix} ,
\]
where \( \lambda_1 , \lambda_2 , \ldots , \lambda_n \) are eigenvalues (that may be equal) of the matrix A.
Let \( {\bf x}_1 , {\bf x}_2 , \ldots , {\bf x}_n \) be linearly independent eigenvectors, corresponding to the eigenvalues
\( \lambda_1 , \lambda_2 , \ldots , \lambda_n .\) We build the nonsingular matrix S from these eigenvectors (every column is an eigenvector):
\[
{\bf S} = \begin{bmatrix} {\bf x}_1 & {\bf x}_2 & {\bf x}_3 & \cdots & {\bf x}_n \end{bmatrix} .
\]
Using this matrix S of eigenvectors, we can diagonalize the matrix
\[
{\bf A} = {\bf S} \, {\bf \Lambda} \, {\bf S}^{-1} \qquad \Longrightarrow \qquad {\bf \Lambda} = {\bf S}^{-1} {\bf A} \, {\bf S}
. \]
For any reasonable (we do not specify this word, it is sufficient to be smooth) function defined on the spectrum (set of all eigenvalues) of the diagonalizable matrix
A, we define the function of this matrix by the formula:
\[
f \left( {\bf A} \right) = {\bf S} f\left( {\bf \Lambda} \right) {\bf S}^{-1} , \qquad \mbox{where } \quad f\left( {\bf \Lambda} \right) =
\begin{bmatrix} f(\lambda_1 ) & 0 & 0 & \cdots & 0 \\ 0 & f(\lambda_2 ) & 0 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0&0&0& \cdots & f(\lambda_n ) \end{bmatrix} .
\]
Example. Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1&4&16 \\ 18&20&4 \\ -12&-14&-7 \end{bmatrix} \) that has three distinct eigenvalues
A = {{1,4,16},{18,20,4},{-12,-14,-7}}
Eigenvalues[A]
Out[2]= 9, 4, 1
Eigenvectors[A]
sage: M.rank()
sage: M.right_nullity()
Using eigenvectors, we build the transition matrix of its eigenvectors:
\[
{\bf S} = \begin{bmatrix} 1&4&4 \\ -2&-5&-4 \\ 1&2&1 \end{bmatrix} , \quad\mbox{with} \quad {\bf S}^{-1} = \begin{bmatrix} -3&-4&-4 \\ 2&3&4 \\ -1&-2&-3 \end{bmatrix} .
\]
Then we are ready to construct eight square roots of this positive definite matrix:
\[
\sqrt{\bf A} = {\bf S} \sqrt{\Lambda} {\bf S}^{-1} = \begin{bmatrix} 1&4&4 \\ -2&-5&-4 \\ 1&2&1 \end{bmatrix} \begin{bmatrix} \pm 3&0&0 \\ 0&\pm 2&0 \\ 0&0&\pm 1 \end{bmatrix} \begin{bmatrix} -3&-4&-4 \\ 2&3&4 \\ -1&-2&-3 \end{bmatrix} ,
\]
with appropriate choice of roots on the diagonal. In particular,
\[
\sqrt{\bf A} = \begin{bmatrix} 3&4&8 \\ 2&2&-4 \\ -2&-2&1 \end{bmatrix} , \quad
\begin{bmatrix} 21&28&32 \\ -34&-46&-52 \\ 16&22&25 \end{bmatrix} , \quad
\begin{bmatrix} -11&-20&-32 \\ 6&14&28 \\ 0&-2&-7 \end{bmatrix} , \quad
\begin{bmatrix} 29&44&56 \\ -42&-62&-76 \\ 18&26&31 \end{bmatrix} .
\]
Example. Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} -20&-42&-21 \\ 6&13&6 \\ 12&24&13 \end{bmatrix} \) that has two distinct eigenvalues
A = {{-20, -42, -21}, {6, 13, 6}, {12, 24, 13}}
Eigenvalues[A]
Out[2]= 4, 1, 1
Eigenvectors[A]
Out[3]= {{ -7, 2, 4 }, {-1, 0, 1 }, {-2, 1, 0 }}
sage: M.rank()
sage: M.right_nullity()
Since to double eigenvalue \( \lambda_2 =1 \) corresponds two linearly independent eigenvectors, the given matrix is diagonalizable, and we are able to build the transition matrix:
\[
{\bf S} = \begin{bmatrix} -7&-1&-2 \\ 2&0&1 \\ 4&1&0 \end{bmatrix} , \quad\mbox{with} \quad {\bf S}^{-1} = \begin{bmatrix} 1&2&1 \\ -4&-8&-3 \\ -2&-3&-2 \end{bmatrix} .
\]
For three functions, \( f(\lambda ) = e^{\lambda \,t} , \quad \Phi (\lambda ) = \frac{\sin \left( \sqrt{\lambda} \,t \right)}{\sqrt{\lambda}} ,
\quad \Psi (\lambda ) = \cos \left( \sqrt{\lambda} \,t \right) \) we construct the corresponding matrix-functions:
\begin{align*}
f({\bf A}) &= {\bf S} e^{{\bf \Lambda}\,t} {\bf S}^{-1} = e^{2t} \begin{bmatrix} -7 & -14 & -7 \\ 2&4&2 \\ 4&8&4 \end{bmatrix} + e^t
\begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} ,
\\
\Phi ({\bf A}) &= {\bf S} \frac{\sin \left( \sqrt{\bf \Lambda} \,t \right)}{\sqrt{\bf \Lambda}} {\bf S}^{-1} =
\sin 2t \begin{bmatrix} -7/2 & -7 & -7/2 \\ 1&2&1 \\ 2&4&2 \end{bmatrix} + \sin t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} ,
\\
\Psi ({\bf A}) &= {\bf S} \cos \left( {\bf \Lambda}\,t \right) {\bf S}^{-1} = \cos 2t \begin{bmatrix} -7 & -14 & -7 \\ 2&4&2 \\ 4&8&4 \end{bmatrix} +
\cos t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} .
\end{align*}
Example. Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1 &2&3 \\ 2 &3&4 \\ 2&-6&-4 \end{bmatrix} \)
that has two complex conjugate eigenvalues \( \lambda = 1 \pm 2{\bf j} \) and one real eigenvalue
\( \lambda = -2 .\) Sage confirms:
A = {{1, 2, 3}, {2, 3, 4}, {2, -6, -4}}
Eigenvalues[A]
Eigenvectors[A]
sage: M.rank()
sage: M.right_nullity()
We build the transition matrix of its eigenvectors:
\[
{\bf S} = \begin{bmatrix} -1-{\bf j} & -1+{\bf j} &-7 \\ -2-{\bf j} & -2+{\bf j} &-6 \\ 2&2&1 \end{bmatrix} , \quad \mbox{with} \quad
{\bf S}^{-1} = \frac{1}{676} \begin{bmatrix} 13\left( 14 + 5{\bf j} \right) & 13\left( 9 + 7{\bf j} \right) & 13\left( 4 + 7{\bf j} \right) \\
82 -175 {\bf j} & -257 -245{\bf j} & -88 -245{\bf j} \\
4 \left( -12 + 5{\bf j} \right) & 4 \left( 17 + 7{\bf j} \right) & 4 \left( 17 + 7{\bf j} \right) \end{bmatrix} .
\]
Sylvester's Formula
Suppose that A is a diagonalizable \( n \times n \) matrix; this means that all its eigenvalues are not defective and there exists a basis of n
linearly independent eigenvectors. Now assume that its minimal polynomial (the polynomial of least possible degree that anihilates the matrix) is a product of simple terms:
\[
\psi (\lambda ) = \left( \lambda - \lambda_1 \right) \left( \lambda - \lambda_2 \right) \cdots \left( \lambda - \lambda_k \right) ,
\]
where \( \lambda_1 , \lambda_2 , \ldots , \lambda_k \) are distinct eigenvalues of the matrix A ( \( k \le n \) ).
Let \( f(\lambda ) \) be a function defined on the spectrum of the matrix A. The last condition means that
every eigenvalue λi is in the domain of f, and that every eigenvalue λi with multiplicity mi > 1 is in the interior of the domain, with f being (mi — 1) times differentiable at λi. We build a function
\( f\left( {\bf A} \right) \) of diagonalizable square matrix A according to James Sylvester, who was an English lawyer and music tutor before his appointment as a professor of mathematics in 1885. To define a function of a square matrix, we need to construct k
Sylvester auxiliary matrices for each distinct eigenvalue \( \lambda_i , \quad i= 1,2,\ldots k: \)
\[
{\bf Z}_{\lambda_i} = {\bf Z}_{i} = \dfrac{\left( {\bf A} - \lambda_1 {\bf I} \right) \cdots \left( {\bf A} - \lambda_{i-1}
{\bf I} \right) \left( {\bf A} - \lambda_{i+1} {\bf I} \right) \cdots \left( {\bf A} - \lambda_k {\bf I} \right)}{(\lambda_i - \lambda_1 ) \cdots (\lambda_i - \lambda_{i-1} )
(\lambda_i - \lambda_{i+1} ) \cdots (\lambda_i - \lambda_k )} .
\]
Now we define the function \( f\left( {\bf A} \right) \) according to the formula:
\[
f\left( {\bf A} \right) = \sum_{i=1}^k f(\lambda_i )\,{\bf Z}_i .
\]
Each Sylvester's matrix is a projection matrix on eigenspace of the corresponding eigenvalue.
Example.
Consider a 2-by-2 matrix
\[
{\bf A} = \begin{bmatrix} 4&5 \\ -1& -2 \end{bmatrix}
\]
that has two distinct eigenvalues \( \lambda_1 =3 \) and \( \lambda_2 =-1 . \)
The eigenvectors corresponding these eigenvalues are
\[
{\bf v}_1 = \left\langle -5 , \ 1 \right\rangle^T \qquad {\bf v}_2 = \left\langle 1 , \ -1 \right\rangle^T ,
\]
respectively. Since the minimal polynomail is \( \psi (\lambda ) = (\lambda -3)(\lambda +1) \) is a product of simple factors, we build Sylvester's auxiliary matrices:
\[
{\bf Z}_3 = \frac{{\bf A} +1}{3+1} = \frac{1}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} \quad\mbox{and} \quad
{\bf Z}_{-1} = \frac{{\bf A} -3}{-1-3} = \frac{1}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} .
\]
These matrices are projectors on eigenspaces:
\[
{\bf Z}_3^2 = {\bf Z}_3 \qquad \mbox{and} \qquad {\bf Z}_{-1}^2 = {\bf Z}_{-1} .
\]
Let us take an arbitrary vector, say \( {\bf x} = \langle 2, \ 2 \rangle^T , \) and expand it with respect to
eigenvectors:
\[
{\bf x} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} = a\, {\bf v}_1 + b\,{\bf v}_2 = a \begin{bmatrix} -5 \\ 1 \end{bmatrix} + b \begin{bmatrix} 1 \\ -1 \end{bmatrix} ,
\]
where coefficients a and b need to be determined. Of course, we can find them by solving the system of linear equations:
\[
\begin{split}
2 &= a(-5) + b , \\
2 &= a - b .
\end{split}
\]
However, the easiet way is to apply Sylvester's auxiliary matrices:
\begin{align*}
{\bf Z}_3 \, {\bf x} &= \frac{1}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} \, \begin{bmatrix} 2\\ 2 \end{bmatrix} =
\frac{1}{2} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} \, \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 10 \\ 2 \end{bmatrix}
= \begin{bmatrix} 5\\ -1 \end{bmatrix} = -{\bf v}_1 ,
\\
{\bf Z}_{-1}\, {\bf x} &= \frac{1}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} \, \begin{bmatrix} 2\\ 2 \end{bmatrix} =
\begin{bmatrix} -3 \\ 3 \end{bmatrix} = -3\, {\bf v}_2 .
\end{align*}
Therefore, coefficients are \( a =-1 \) and \( b =-3 . \) So we get the expansion:
\[
{\bf x} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} = - {\bf v}_1 - 3\,{\bf v}_2 = - \begin{bmatrix} -5 \\ 1 \end{bmatrix} -3 \begin{bmatrix} 1 \\ -1 \end{bmatrix} .
\]
We can define some functions, for instance, \( f(\lambda ) = e^{\lambda \,t} \) and \( g(\lambda ) = \cos \left( \lambda \,t\right) . \)
Indeed,
\begin{align*}
f({\bf A}) &= e^{3t}\, {\bf Z}_3 + e^{-t}\, {\bf Z}_{-1} = e^{3t}\, \frac{1}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} + e^{-t} \,
\frac{1}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} ,
\\
g({\bf A}) &= \cos (3t) \, {\bf Z}_3 + \cos\, t\, {\bf Z}_{-1} = \frac{\cos \,3t}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} +
\frac{\cos\, t}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix}
.\end{align*}
sage: M.rank()
sage: M.right_nullity()
Example 1.7.1: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1&4&16 \\ 18&20&4 \\ -12&-14&-7 \end{bmatrix} \) that has three distinct eigenvalues
A = {{1,4,16},{18,20,4},{-12,-14,-7}}
Eigenvalues[A]
Out[2]= 9, 4, 1
To determine a function of matrix, \( f ({\bf A} ) , \) we first find Sylvester's auxiliary matrices:
sage: M.rank()
sage: M.right_nullity()
Example 1.7.2: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} -20&-42&-21 \\ 6&13&6 \\ 12&24&13 \end{bmatrix} \) that has two distinct eigenvalues
Eigenvalues[A]
Out[2]= 4, 1, 1
Then we define the resolvent:
resolvent = FullSimplify[Inverse[lambda*IdentityMatrix[3] - A]]
Out[2]=
{{(-25 + lambda)/(
4 - 5 lambda + lambda^2), -(42/(4 - 5 lambda + lambda^2)), -(21/(
4 - 5 lambda + lambda^2))}, {6/(4 - 5 lambda + lambda^2), (
8 + lambda)/(4 - 5 lambda + lambda^2), 6/(
4 - 5 lambda + lambda^2)}, {12/(4 - 5 lambda + lambda^2), 24/(
4 - 5 lambda + lambda^2), (8 + lambda)/(4 - 5 lambda + lambda^2)}}
Using the resolvent, we define Sylvester's auxiliary matrices (that are actiually projector matrices on eigenspace)
Z1 = FullSimplify[(lambda - 1)*resolvent] /. lambda -> 1
Out[3]= {{8, 14, 7}, {-2, -3, -2}, {-4, -8, -3}}
Z4 = FullSimplify[(lambda - 4)*resolvent] /. lambda -> 4
Out[4]= {{-7, -14, -7}, {2, 4, 2}, {4, 8, 4}}
sage: M.rank()
sage: M.right_nullity()
Example 1.7.3:
sage: M.rank()
sage: M.right_nullity()
The resolvent method and its applications to partial differential equations was developed by Vladimir Dobrushkin
in 1980s. We show its utilization for defining a function of a square matrix, which will be called matrix function.
A general matrix function is a correspondence f that relates to each admissible matrix A of
order n a matrix, denoted by \( f({\bf A}) , \) of order n with elements in the complex filed.
Such correspondence is assumed to satisfy the following conditions:
- If \( f(\lambda ) = \lambda , \) then \( f({\bf A}) = {\bf A} ; \)
- If \( f(\lambda ) = k, \) a constant, then \( f({\bf A}) = k\,{\bf I} , \)
where I is the identity matrix;
- If \( f(\lambda ) = g(\lambda ) + h(\lambda ) , \) then \( f({\bf A}) = g({\bf A}) + h({\bf A}) ; \)
- If \( f(\lambda ) = g(\lambda ) \cdot h(\lambda ) , \) then \( f({\bf A}) = g({\bf A}) \, h({\bf A}) . \)
These requirements will insure that the definition, when applied to a polynomial \( p(\lambda ) , \)
will yield the usual matrix polynomial p(A) and that any rational identity in scalar functions
of a complex varibale will be fulfilled by the corresponding matrix functions. The above conditions are not sufficient
for most applications, and a fifth requirement would be highly desirable
- If \( f(\lambda ) = h \left( g(\lambda ) \right) , \) then \( f({\bf A}) = h\left( g({\bf A}) \right) . \)
for holomorphic functions and all admissible matrices. The extension of the concept of a function of a
complex variable to matrix functions has occupied the attention of a number of mathematicians since 1883. While there
known many approaches to define a function of a square matrix that could be found in the following references
- R. A. Frazer, W. J. Duncan, and A. R. Collar. Elementary Matrices and Some Applications
to Dynamics and Differential Equations. Cambridge University Press, 1938.
- R. F. Rinehart. The equivalence of definitions of a matric function. American Mathematical
Monthly, 62, No 6, 395--414, 1955.
- Cleve B. Moler and Charles F. Van Loan. Nineteen dubious ways to compute the
exponential of a matrix. SIAM Review, 20, No 4, 801--836, 1978.
- Nicholas J. Higham, Functions of Matrices. Theory and Computation. SIAM, 2008,
we present another method, which is understandable at undergraduate level.
Recall that the resolvent of a square matrix A is
\[
{\bf R}_{\lambda} \left( {\bf A} \right) = \left( \lambda {\bf I} - {\bf A} \right)^{-1} ,
\]
which is a matrix-function depending on a parameter λ. In general, the resolvent, after reducing all common multiples, is a ratio of a polynomial matrix
\( {\bf Q}(\lambda ) \) of degree \( k-1 , \) where k is the degree of the minimal polynomial
\( \psi (\lambda ): \)
\[
{\bf R}_{\lambda} \left( {\bf A} \right) = \left( \lambda {\bf I} - {\bf A} \right)^{-1} = \frac{1}{\psi (\lambda )} \, {\bf Q} (\lambda ) .
\]
It is assumed that polynomials in \( {\bf Q}(\lambda ) \) and \( \psi (\lambda ) \) are relatively prime.
The residue of the ratio \( f(\lambda ) = P(\lambda )/Q(\lambda ) \) of two polynomials (or entire functions) at the pole
\( \lambda_0 \) of multiplicity m is defined by
\[
\mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \left. \frac{1}{(m-1)!} \, \frac{{\text d}^{m-1}}{{\text d} \lambda^{m-1}} \, \frac{(\lambda - \lambda_0 )^{m} P(\lambda )}{Q(\lambda )} \right\vert_{\lambda = \lambda_0}
= \lim_{\lambda \mapsto \lambda_0} \left( \frac{1}{(m-1)!} \, \frac{{\text d}^{m-1}}{{\text d} \lambda^{m-1}} \, \frac{(\lambda - \lambda_0 )^{m} P(\lambda )}{Q(\lambda )} \right) .
\]
In particular, for simple pole \( m=1 , \) we have
\[
\mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \frac{P(\lambda_0 )}{Q'(\lambda_0 )} .
\]
For double pole, \( m=2 , \) we have
\[
\mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \left. \frac{{\text d}}{{\text d} \lambda} \, \frac{(\lambda - \lambda_0 )^2 \, P(\lambda )}{Q(\lambda )} \right\vert_{\lambda = \lambda_0} , \]
and for triple pole, \( m=3 , \) we get
\[
\mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \left. \frac{1}{2!} \, \frac{{\text d}^{2}}{{\text d} \lambda^{2}} \, \frac{(\lambda - \lambda_0 )^{3} P(\lambda )}{Q(\lambda )} \right\vert_{\lambda = \lambda_0} .
\]
sage: M.rank()
sage: M.right_nullity()
If a real-valued function \( f(\lambda ) \) has a pair of complex conjugate poles
\( a \pm b {\bf j} \) (here \( {\bf j}^2 =-1 \) ), then
\[
\mbox{Res}_{a+b{\bf j}} f(\lambda ) + \mbox{Res}_{a-b{\bf j}} f(\lambda ) = 2\, \Re \, \mbox{Res}_{a+b{\bf j}} f(\lambda ) ,
\]
where Re = \( \Re \) stands for the real part of a complex number.
Let \( f(\lambda ) \) be a function defined on the spectrum of a square matrix A. Then
\[
f({\bf A}) = \sum_{\mbox{all eigenvalues of {\bf A}}} \, \mbox{Res} \, f(\lambda ) \,{\bf R}_{\lambda} ({\bf A}) .
\]
Example.
Example.
Example.
Spectral Decomposition
Originally, spectral decomposition was developed for symmetric or self-adjoint matrices. Following tradition, we present this method for symmetric/self-adjoint matrices, and later expand it for arbitrary matrices.
A matrix A is said to be unitary diagonalizable if there is a unitary matrix U such that
\( {\bf U}^{\ast} {\bf A}\,{\bf U} = {\bf \Lambda} , \) where Λ is a diagonal matrix
and \( {\bf U}^{\ast} = {\bf U}^{-1} . \)
A matrix A is said to be orthogonally diagonalizable if there is an orthogonal matrix P such that
\( {\bf P}^{\mathrm T} {\bf A}\,{\bf P} = {\bf \Lambda} , \) where Λ is a diagonal matrix and
\( {\bf P}^{\mathrm T} = {\bf P}^{-1} . \)
Theorem: The matrix A is orthogonally diaginalisable if and only if A is symmetric
(\( {\bf A} = {\bf A}^{\mathrm T} \) ). ■
Theorem: The matrix A is unitary diaginalisable if and only if A is normal
(\( {\bf A}\, {\bf A}^{\ast} = {\bf A}^{\ast} {\bf A} \) ). ■
Example. The matrix
\[
{\bf A} = \begin{bmatrix} 1&{\bf j}&0 \\ {\bf j}&1&0 \\ 0&0&1 \end{bmatrix}
\]
is symmetric, normal, but not self-adjoint. Another matrix
\[
{\bf B} = \begin{bmatrix} 2&1 \\ -1&2 \end{bmatrix}
\]
is normal, but not self-adjoint. Therefore, both matrices are unitary diagonalizable but not orthogonally diagonalizable.
Example. Consider a symmetric matrix
\[
{\bf A} = \begin{bmatrix} 2&1&1 \\ 1&2&1 \\ 1&1&2 \end{bmatrix}
\]
that has the characteristic polynomial \( \chi_{A} (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \left( \lambda -1 \right)^2 \left( \lambda -4 \right) .\)
Thus, the distinct eigenvalues of A are \( \lambda_1 =1, \) which has geometrical multiplicity 2, and \( \lambda_3 =4. \)
The corresponding eigenvectors are
\[
{\bf u}_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} , \quad {\bf u}_2 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \qquad \mbox{and} \qquad
{\bf u}_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} .
\]
The vectors \( {\bf u}_1 , \ {\bf u}_2 \) form the basis for the two-dimensional eigenspace corresponding
\( \lambda_1 =1 , \) while \( {\bf u}_3 \) is the eigenvectors corresponding to
\( \lambda_3 =4 . \) Applying the Gram--Schmidt process to \( {\bf u}_1 , \ {\bf u}_2 \) yields
the following orthogonal basis:
\[
{\bf v}_1 = {\bf u}_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} \quad \mbox{and} \quad {\bf v}_2 = {\bf u}_2 - \frac{\langle {\bf u}_2 , {\bf v}_1 \rangle}{\| {\bf v}_1 \|^2} \, {\bf v}_1 =
\frac{1}{2} \begin{bmatrix} 1 \\ -2 \\ 1 \end{bmatrix}
\]
because \( \langle {\bf u}_2 , {\bf u}_1 \rangle = -1 \) and \( \| {\bf v}_1 \|^2 =2 . \)
Normalizing these vectors, we obtain orthonormal basis:
\[
{\bf q}_1 = \frac{{\bf v}_1}{\| {\bf v}_1 \|} = \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} , \quad
{\bf q}_2 = \frac{{\bf v}_2}{\| {\bf v}_2 \|} = \frac{1}{\sqrt{6}} \begin{bmatrix} 1 \\ -2 \\ 1 \end{bmatrix} , \quad
{\bf q}_3 = \frac{{\bf v}_3}{\| {\bf v}_3 \|} = \frac{1}{\sqrt{3}} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} .
\]
Finally, using \( {\bf q}_1 ,\ {\bf q}_2 , \ {\bf q}_3 \) as column vectors, we obtain the unitary matrix
\[
{\bf U} = \begin{bmatrix} \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\
0& \frac{-2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix} ,
\]
which orthogonally diagonalizes A. As a check, we confirm
\[
{\bf U}^{\mathrm T} {\bf A} \,{\bf U} = \begin{bmatrix} \frac{-1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{6}} & \frac{-2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{bmatrix}
\, \begin{bmatrix} 2&1&1 \\ 1&2&1 \\ 1&1&2 \end{bmatrix} \,
\begin{bmatrix} \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\
0& \frac{-2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix}
= \begin{bmatrix} 1 &0&0 \\ 0 &1&0 \\ 0 &0&4 \end{bmatrix} ,
\]
sage: M.rank()
sage: M.right_nullity()
Theorem: Let A be a symmetric or normal \( n \times n \)
matrix with eigenvalues \( \lambda_1 , \ \lambda_2 , \ \ldots , \ \lambda_n \) and
corresponding eigenvectors \( {\bf u}_1 , \ {\bf u}_2 , \ \ldots , \ {\bf u}_n . \) Then
\[
{\bf A} = \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ {\bf u}_1 & {\bf u}_2 & \cdots & {\bf u}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix} \,
\begin{bmatrix} \lambda_1 &&&0 \\ &\lambda_2 && \\ &&\ddots & \\ 0&&& \lambda_n \end{bmatrix} \,
\begin{bmatrix} \longleftarrow & {\bf u}_1 & \longrightarrow \\ \longleftarrow & {\bf u}_2 & \longrightarrow & \\
\vdots & \\ \longleftarrow & {\bf u}_n & \longrightarrow \end{bmatrix}
= \sum_{i=1}^n \lambda_i {\bf u}_i {\bf u}_i^{\ast} ,
\]
which is called spectral decomposition for a symmetric/ normal matrix A. ■
The term was cointed around 1905
by a German mathematician David Hilbert (1862--1942). Denoting the rank one projection matrix \( {\bf u}_i {\bf u}_i^{\ast} \)
by \( {\bf E}_i = {\bf u}_i {\bf u}_i^{\ast} , \) we obtain spectral decomposition of A:
\[
{\bf A} = \lambda_1 {\bf E}_1 + \lambda_2 {\bf E}_2 + \cdots + \lambda_n {\bf E}_n .
\]
This formular allows one to define a function of a square symmetric/self-adjoint matrix:
\[
f\left( {\bf A} \right) = f(\lambda_1 )\, {\bf E}_1 + f(\lambda_2 )\, {\bf E}_2 + \cdots + f(\lambda_n )\,{\bf E}_n
\]
because the matrices \( {\bf E}_k , \quad k=1,2,\ldots n, \) are projection matrices:
\[
{\bf E}_i {\bf E}_j = \delta_{i,j} {\bf E}_i = \begin{cases} {\bf E}_i , & \mbox{ if } i=j, \\
{\bf 0} , & \mbox{ if } i \ne j , \end{cases} \qquad i,j =1,2,\ldots n.
\]
Example. Consider a self-adjoint 2-by-2 matrix
\[
{\bf A} = \begin{bmatrix} 1 &2+{\bf j} \\ 2- {\bf j} & 5\end{bmatrix} ,
\]
where \( {\bf j}^2 =-1 . \) We have \( {\bf S}^{\ast} {\bf A} {\bf S} = {\bf S}^{-1} {\bf A} {\bf S} = {\bf \Lambda} \)
for the matrices
\[
{\bf S} = \begin{bmatrix} \left( 2+{\bf j} \right) / \sqrt{6} & \left( 2+{\bf j} \right) / \sqrt{30} \\ - 1/\sqrt{6} & 5\sqrt{30} \end{bmatrix}
\quad \mbox{and} \quad {\bf \Lambda} = \begin{bmatrix} 0&0 \\ 0& 6 \end{bmatrix} .
\]
Here column vectors in matrix S are normalized eigenvectors corresponding eigenvalues \( \lambda_1 =0 \quad \mbox{and} \quad \lambda_2 =6 . \)
Therefore, the spectral decomposition of A becomes \( {\bf A} = 0\,{\bf E}_1 + 6\,{\bf E}_2 , \) which is clearly matrix A itself.
In our case, projection matrices are
\[
{\bf E}_1 = \frac{1}{6} \begin{bmatrix} 5 & -2 - {\bf j} \\ -2+{\bf j} & 1 \end{bmatrix} , \qquad {\bf E}_2 = \frac{1}{6} \begin{bmatrix} 1 &2+{\bf j} \\ 2- {\bf j} & 5\end{bmatrix} = \frac{1}{6}\, {\bf A} .
\]
It is easy to check that
\[
{\bf E}_1^2 = {\bf E}_1 , \qquad {\bf E}_2^2 = {\bf E}_2 , \qquad \mbox{and} \qquad {\bf E}_1 {\bf E}_2 = {\bf 0} .
\]
The exponential matrix-function is
\[
e^{{\bf A}\,t} = {\bf E}_1 + e^{6t} \,{\bf E}_2 .
\]
sage: M.rank()
sage: M.right_nullity()
Example. Consider a symmetric 3-by-3 matrix from the previous example
\[
{\bf A} = \begin{bmatrix} 2&1&1 \\ 1&2&1 \\ 1&1&2 \end{bmatrix} .
\]
Its spectral decomposition is \( {\bf A} = 1\,{\bf E}_1 + 1\,{\bf E}_2 + 4\,{\bf E}_3 , \)
where the projection matrices \( {\bf E}_i = {\bf q}_i {\bf q}_i^{\ast} \) are obtained from
the orthonormal eigenvectors:
\begin{align*}
{\bf E}_1 &= \frac{1}{6} \begin{bmatrix} -1 \\ -1 \\ 2 \end{bmatrix} \left[ -1 \ -1 \ 2 \right] = \frac{1}{6} \begin{bmatrix} 1&1& -2 \\ 1&1& -2 \\ -2&-2& 4 \end{bmatrix} ,
\\
{\bf E}_2 &= \frac{1}{2} \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix} \left[ -1 \ 1 \ 0 \right] = \frac{1}{2} \begin{bmatrix} 1&-1&0 \\ -1&1&0 \\ 0&0&0 \end{bmatrix} ,
\\
{\bf E}_3 &= \frac{1}{3} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \left[ 1 \ 1 \ 1 \right] = \frac{1}{3} \begin{bmatrix} 1&1& 1 \\ 1&1& 1 \\ 1&1& 1 \end{bmatrix} .
\end{align*}
Indeed, these diagonalizable matrices satisfy the following relations:
\[
{\bf E}_1^2 = {\bf E}_1 , \quad {\bf E}_2^2 = {\bf E}_2 , \quad {\bf E}_3^2 = {\bf E}_3 , \quad {\bf E}_1 {\bf E}_2 = {\bf 0} , \quad {\bf E}_1 {\bf E}_3 = {\bf 0} , \quad {\bf E}_3 {\bf E}_2 = {\bf 0} ,
\]
and they all have eigenvalues \( \lambda = 1, 0, 0 . \)
Using this spectral decomposition, we define two matrix-functions corresponding to \( {\Phi}(\lambda ) = \cos \left( \sqrt{\lambda} \,t \right) \) and
\( {\Psi}(\lambda ) = \frac{1}{\sqrt{\lambda}} \,\sin \left( \sqrt{\lambda} \,t \right) \) that do not depend on the branch of the choisen square root:
\begin{align*}
{\bf \Phi} (t) &= \cos \left( \sqrt{\bf A} \,t \right) = \cos t\, {\bf E}_1 + \cos t\, {\bf E}_2 + \cos (2t) \,{\bf E}_3 = \frac{\cos t}{3} \, \begin{bmatrix} 2&-1&-1 \\ -1&2&-1 \\ -1&-1& 2 \end{bmatrix} + \frac{\cos 2t}{3} \begin{bmatrix} 1&1& 1 \\ 1&1& 1 \\ 1&1& 1 \end{bmatrix} ,
\\
{\bf \Psi} (t) &= \frac{1}{\sqrt{\bf A}} \,\sin \left( \sqrt{\bf A} \,t \right) = \sin t\, {\bf E}_1 + \sin t\, {\bf E}_2 + \frac{\sin (2t)}{2} \,{\bf E}_3 = \frac{\sin t}{3} \, \begin{bmatrix} 2&-1&-1 \\ -1&2&-1 \\ -1&-1& 2 \end{bmatrix} + \frac{\sin 2t}{6} \begin{bmatrix} 1&1& 1 \\ 1&1& 1 \\ 1&1& 1 \end{bmatrix} .
\end{align*}
These matrix-functions are solutions of the following initial value problems for the second order matrix differential equations:
\begin{align*}
& \ddot{\bf \Phi}(t) + {\bf A}\,{\bf \Phi} (t) ={\bf 0} , \qquad {\bf \Phi}(0) = {\bf I}, \quad \dot{\bf \Phi}(0) = {\bf 0},
\\
&\ddot{\bf \Psi}(t) + {\bf A}\,{\bf \Phi} (t) ={\bf 0} , \qquad {\bf \Phi}(0) = {\bf 0}, \quad \dot{\bf \Psi}(0) = {\bf I} .
\end{align*}
Since the given matrix A is positive definite, we can define four square roots:
\begin{align*}
{\bf R}_1 &= {\bf E}_1 + {\bf E}_2 + 2\,{\bf E}_3 = \frac{1}{3} \begin{bmatrix} 4&1&1 \\ 1&4&1 \\ 1&1&4 \end{bmatrix} ,
\\
{\bf R}_2 &= {\bf E}_1 + {\bf E}_2 - 2\,{\bf E}_3 = \begin{bmatrix} 0&-1&-1 \\ -1&0&-1 \\ -1&-1&0 \end{bmatrix} ,
\\
{\bf R}_3 &= {\bf E}_1 - {\bf E}_2 + 2\,{\bf E}_3 = \frac{1}{3} \begin{bmatrix} 1&4&1 \\ 4&1&1 \\ 1&1&4 \end{bmatrix} ,
\\
{\bf R}_4 &= -{\bf E}_1 + {\bf E}_2 + 2\,{\bf E}_3 = \begin{bmatrix} 1&0&1 \\ 0&1&1 \\ 1&1&0 \end{bmatrix} ,
\end{align*}
and four others are just negative of these four; so total number of square roots is 8. Note that we cannot obtain
\( {\bf R}_3 \) and \( {\bf R}_4 \) using neither Sylvester's method nor the Resolvent method because they are based on the minimal polynomial
\( \psi (\lambda ) = (\lambda -1)(\lambda -4) . \) ■
sage: M.rank()
sage: M.right_nullity()
We expand spectral decomposition for arbitary square matrices. Let \( f (\lambda ) \) be an analytic function in a neighborhood of the origin and A be a square \( n \times n \) matrix.
We choose the origin as an example; application of the spectral decomposition requirs functions to be expressed as convergent power series in neighborhoods of every eigenvalue.
Using a Maclaurin series
\[
f(\lambda ) = f_0 + f_1 \lambda + f_2 \lambda^2 + \cdots + f_k \lambda^k + \cdots = \sum_{k\ge 0} f_k \lambda^k ,
\]
we can define the matrix-valued function \( f ({\bf A} ) \) as
\[
f({\bf A} ) = \sum_{k\ge 0} f_k {\bf A}^k .
\]
Let \( \psi (\lambda ) \) be a minimal polynomial of degree m for the matrix A.
Then every power \( {\bf A}^p \) of matrix A can be expressed as a polynomial of degree not higher than \( m-1. \) Therefore,
\[
f({\bf A} ) = \sum_{j= 0}^{m-1} b_j {\bf A}^j ,
\]
where coefficients \( b_j , \quad j=0,1,\ldots , m-1; \) should satisfy the following equations
\[
f(\lambda_k ) = \sum_{j= 0}^{m-1} b_k \,\lambda_k^j ,
\]
for each eigenvalue \( \lambda_k , \quad k=1,2,\ldots , s , \) where s is the number of distinct eigenvalues.
If the eigenvalue \( \lambda_k \) is of multiplicity \( m_k \) in the minimal polynomial
\( \psi (\lambda ) , \) then we need to add \( m_k -1 \) auxiliary algebraic equations
\[
\left. \frac{{\text d}^p f(\lambda )}{{\text d} \lambda^p} \right\vert_{\lambda = \lambda_k} =
\left. \frac{{\text d}^p}{{\text d} \lambda^p} \, \sum_{j= 0}^{m-1} b_k \,\lambda_k^j \right\vert_{\lambda = \lambda_k} ,
\quad p=1,2,\ldots , m_k -1 .
\]
Example.
Consider a diagonalizable 4-by-4 matrix
\[
{\bf A} = \begin{bmatrix} -4&7&1&4 \\ 6&-16&-3&-9 \\ 12&-27&-4&-15 \\ -18&43&7&24 \end{bmatrix} .
\]
Its characteristic polynomial is
\[
\chi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \lambda \left( \lambda -2 \right) \left( \lambda +1 \right)^2 ,
\]
but its minimal polynomial is a product of simple terms:
\[
\psi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \lambda \left( \lambda -2 \right) \left( \lambda +1 \right) .
\]
Therefore, matrix A is diagonalizable because its minimal polynomail is a product of simple terms, and we don't need to find eigenspaces.
sage: M.rank()
sage: M.right_nullity()
Example.
Consider a nondiagonalizable 3-by-3 matrix
\[
{\bf A} = \begin{bmatrix} -4&7&1&4 \\ 6&-16&-3&-9 \\ 12&-27&-4&-15 \\ -18&43&7&24 \end{bmatrix} .
\]
sage: M.rank()
sage: M.right_nullity()
Applications