Minimal Polynomial

Recall that a monic polynomial \( p(\lambda ) = \lambda^s + a_{s-1} \lambda^{s-1} + \cdots + a_1 \lambda + a_0 \) is the polynomial with leading term to be 1. The Cayley--Hamilton theorem tells us that for any square n × n matrix A, there exists a polynomial p(λ) in one variable λ that annihilates A, namely, \( p({\bf A}) = {\bf 0} \) is zero matrix. This theorem specifies that the characteristic polynomial \( \chi (\lambda ) = \det \left(\lambda {\bf I} - {\bf A} \right) \) is an annihilator for A. Hence, there is a polynomial of least degree with this property of degree at most n. If g(λ) is such a polynomial, we can divide g(λ) by its leading coefficient to obtain another polynomial ψ(λ) of the same degree with leading coefficient 1, that is, ψ(λ) is a monic polynomial.

Let A be a square n × n matrix. The minimal polynomial ψ(λ) for A is the monic polynomial of least positive degree that annihilates the matrix: ψ(A) is zero matrix.

Theorem: Let V be a vector space of dimension n over the field of either real numbers \( \mathbb{R} \) or complex numbers \( \mathbb{C} . \) Let \( T\,:\,V \to V \) be a linear transformation. Then the minimal polynomial ψ(λ), that is the monic polynomial of minimum degree for which \( \psi (T) =0, \) is of degree less than or equal to n. ■

We prove theorem using mathematical induction on the dimension of V.

Suppose that dimension of V is 1. Then any nonzero vector from V is a constant multiple of a generating vector x. Since T is a linear transformation, \( T{\bf x} = k{\bf x} , \) for some scalar k. Hence, \( \left( k\,I - T \right) {\bf x} = {\bf 0} , \) where I is the identity map on V. This shows that ψ(λ) = λ - k is the minimal polynomial of T. Since the degree of ψ(λ) is 1, we see that the theorem is true for the case n=1.

Now, to make an induction on the dimension, we assume that the theorem is true for all spaces U of dimension less than n. Let dimV = n, and suppose that x is a non-zero vector in V. Then the n+1 vectors x, Tx, T²x, ... , Tⁿx are linearly dependent to that there is a set { a₀, a₁, ... , a_n } of scalars, not all of them zero, such that

\[ a_0 {\bf x} + a_1 T{\bf x} + \cdots + a_n T^n {\bf x} = {\bf 0} . \]

Writing \( g(\lambda ) = a_0 + a_1 \lambda + a_2 \lambda^2 + \cdots + a_n \lambda^n , \) we see that degree \( \mbox{deg}\,g(\lambda ) \le n \) and g(T)x = 0. If g(T)V = 0, then because ψ(λ) is the minimal polynomial, we have \( \mbox{deg}\,g(\lambda ) \le n \) and so the theorem holds.

Suppose now that \( g(T)\,V \ne 0 . \) Let \( U = \left\{ {\bf v} \,:\, g(T){\bf v} = {\bf v} \right\} \) be the set of vectors annihilated by g(T). Then U is a proper subspace of V and its dimension is r < n. By the inductional assumption, there is a polynomial m(λ) of degree less than or equal to r such that m(T)U = 0. Moreover, \( T\,U \subseteq U \) since g(T) commutes with T, so that U is an invariant subspace of V. Now consider the quotient space V/U. We have dimV/U = n-r and so, by the inductional hypothesis again, there is a polynomial m₂ of degree \( \le n-r \) such that \( m_2 (T) V/U =0. \) This means that \( m_2 (T) V \subseteq U , \) so that \( m(T)\,m_2 (T)\,V =0 . \)

Writing \( h(\lambda ) = m(\lambda )\, m_2 (\lambda ) , \) we have h(T)V = 0 so that is ψ(λ) is the minimal polynomial, so

\[ \mbox{deg} \,\psi (\lambda ) \le \mbox{deg}\, h(\lambda ) = \mbox{deg}\, m(\lambda ) + \mbox{deg}\, m_2 (\lambda ) \le r + n-r =n . \chi (\lambda ) = q(\lambda )\,\psi (\lambda ) = q(\lambda ) \cdot 0 = 0 . \]

Thus the theorem holds in all cases and the proof is complete.

Theorem: Let A be a square n × n matrix. If p(λ) is a nonconstant polynomial that annihilates matrix A, then the minimal polynomial ψ(λ) of matrix A divides the annihilating polynomial p(λ). ■

Let p(λ) be a polynomial for which \( p({\bf A}) = {\bf 0} . \) By division algorithm for polynomials (see section Polynomials), there exist polynomials q(x) and r(x) such that

\[ p (x ) = q(x) \,\psi (x) + r(x) , \]

where r(x) has degree less than the degree of ψ(x). Substituting A into the above equation and using the identity \( \psi ({\bf A}) = p({\bf A}) = {\bf 0} , \) we have \( r ({\bf A}) = {\bf 0} . \) Since r(x) has degree less than ψ(x) and ψ(x) is the minimal polynomial of A, r(x) must be the zero polynomial. Then we can simplify \( p (x ) = q(x) \,\psi (x) , \) proving the theorem.

Corollary: The minimal polynomial for any square matrix is unique. ■

Theorem: Let A be a square n × n matrix and ψ(λ) be its minimal polynomial. A scalar λ (real or complex) is an eigenvalue of A if and only if \( \psi (\lambda ) =0 . \) Hence, the characteristic polynomial and the minimal polynomial have the same zeroes. ■

Let χ(λ) be the characteristic polynomial of matrix A. Since the minimal polynomial divides χ(λ), we have χ(λ) = q(λ) ψ(λ) for some polynomial q(λ). Let λ be a zero of the minimal polynomial ψ(λ). Then

\[ \chi (\lambda ) = q(\lambda )\,\psi (\lambda ) = q(\lambda ) \cdot 0 = 0 . \]

So λ is a zero for the characteristic polynomial, that is, λ is an eigenvalue of matrix A.

Conversely, suppose that λ is an eigenvalue of A, and let \( {\bf x} \in V \) be an eigenvector corresponding to λ, so (x,λ) is an eigenpair. Then

\[ \psi ({\bf A} )\,{\bf x} = \psi (\lambda )\,{\bf x} = 0 . \]

Since \( {\bf x} \ne {\bf 0} , \) ψ(λ) = 0, and so λ is a zero of the minimal polynomial.

Example: Consider the matrix

\[ {\bf A} = \begin{bmatrix} -14&3&-36 \\ -20&5&-48 \\ 5&-1&13 \end{bmatrix} . \]

Since the characteristic polynomial for this matrix is

\[ \chi (\lambda ) = \det \left( \lambda {\bf I} -{\bf A} \right) = \left( \lambda -1 \right)^2 \left( \lambda -2 \right) , \]

the minimal polynomial for A must be either \( \left( \lambda -1 \right) \left( \lambda -2 \right) \) or \( \left( \lambda -1 \right)^2 \left( \lambda -2 \right) . \) Substituting A into \( \psi (\lambda ) = \left( \lambda -1 \right) \left( \lambda -2 \right) \) show that \( \psi ({\bf A} ) = {\bf 0} , \) and hence ψ is the minimal polynomial for A.

On another hand, matrix

\[ {\bf B} = \begin{bmatrix} 37&-8&86 \\ 24&-3&58 \\ -13&3&-30 \end{bmatrix} \]

has the same characteristic polynomial \( \chi (\lambda ) = \left( \lambda -1 \right)^2 \left( \lambda -2 \right) . \) It could be verified that

\[ \left( {\bf B} - {\bf I} \right) \left( {\bf B} - 2{\bf I} \right) = \begin{bmatrix} -50&10&-120 \\ -10&2&-24 \\ 20&-4&48 \end{bmatrix} , \quad\mbox{but} \quad \left( {\bf B} - {\bf I} \right)^2 \left( {\bf B} - 2{\bf I} \right) = \begin{bmatrix} 0&0&0 \\ 0&0&0 \\ 0&0&0 \end{bmatrix} . \]

Therefore, the minimal polynomial for matrix B is its characteristic polynomial: \( \psi (\lambda ) = \chi (\lambda ) = \left( \lambda -1 \right)^2 \left( \lambda -2 \right) . \)

Example: Let \( \texttt{D} \) be the linear derivative operator on the space P_n of polynomials of degree up to n, with basis of monomials \( 1, x, x^2 , \ldots , x^n . \) Then in this basis for P_n, the derivative operator is expressed via matrix

\[ \texttt{D} = \begin{bmatrix} 0&1&0&0& \cdots &0 \\ 0&0&2&0& \cdots &0 \\ 0&0&0&3&\cdots &0 \\ \vdots&\vdots&&&\ddots&\vdots \\ 0&0&0&0&\cdots &0 \end{bmatrix} . \]

Its characteristic polynomial is \( \chi (\lambda ) = \lambda^n \) because \( \lambda{\bf I} - \texttt{D} \) is an upper triangular matrix. Since \( \texttt{D}^n t^n \ne 0 , \) the minimal polynomial for \( \texttt{D} \) must be λⁿ.

If we consider the derivative operator on space of all polynomials (of arbitrary degree), then there exist no polynomial for which \( p \left(\texttt{D}\right) = 0. \) Hence \( \texttt{D} \) has no minimal polynomial in this space.

A projection is a linear transformation P from a vector space to itself such that P² = P. That is, whenever P is applied twice to any value, it gives the same result as if it were applied once (idempotent). It leaves its image unchanged.

A projection (idempotent) matrix always has two eigenvalues of 1 and 0 because its minimum polynomial is \( \psi (\lambda ) = \lambda \left( \lambda -1 \right) . \)

Example: Let us consider the 3 × 3 matrix from the first example, which has the minimal polynomial \( \psi (\lambda ) = \left( \lambda -1 \right) \left( \lambda -2 \right) . \) For this matrix correspond two projection matrices on the corresponding eigenspaces:

\[ {\bf P}_1 = \begin{bmatrix} 16&-3&36 \\ 20&-3&48 \\ -5&1&-11 \end{bmatrix} \qquad\mbox{and}\qquad {\bf P}_2 = \begin{bmatrix} -15&3&-36 \\ -20&4&-48 \\ 5&-1&12 \end{bmatrix} . \]

It is not had to verify that \( {\bf P}_1^2 = {\bf P}_1 \) and \( {\bf P}_2^2 = {\bf P}_2 . \) The characteristic polynomial of P₁ is \( \chi_1 (\lambda ) = \det \left( \lambda {\bf I} - {\bf P}_1 \right) = \lambda \left( \lambda -1 \right)^2 , \) while the characteristic polynomial of P₂ is \( \chi_2 (\lambda ) = \det \left( \lambda {\bf I} - {\bf P}_2 \right) = \lambda^2 \left( \lambda -1 \right) . \)

Recall that a permutation matrix is a square binary matrix that has exactly one entry of 1 in each row and each column and zeroes elsewhere. Each such matrix of size n, say P, represents a permutation of n elements and, when used to multiply another matrix, say A, results in permuting the rows (when pre-multiplying, i.e., PA) or columns (when post-multiplying, AP) of the matrix A.

Example: Consider the 3 × 3 permutation matrix

\[ {\bf P} = \begin{bmatrix} 0&1&0 \\ 1&0&0 \\ 0&0&1 \end{bmatrix} . \]

Its characteristic polynomial is \( \chi (\lambda ) = \det\left( \lambda {\bf I} - {\bf P} \right) = \left( \lambda -1 \right)^2 \left( \lambda +1 \right) . \) However, its minimal polynomial is a product of two terms: \( \psi (\lambda ) = \left( \lambda -1 \right) \left( \lambda +1 \right) = \lambda^2 -1 \) because \( {\bf P}^2 = {\bf I} , \) the identity matrix.

We consider also another permutation matrix \( {\bf P} = \begin{bmatrix} 0&0&1 \\ 1&0&0 \\ 0&1&0 \end{bmatrix} . \) Since this matrix satisfies the equation P³ = I, the permutation matrix has one real eigenvalue λ = 1 (which is always the case because some power of a permutation matrix is the identity matrix), and two complex conjugate eigenvalues \( \lambda_{2,3} = - \frac{1}{2} \pm {\bf j}\, \frac{\sqrt{3}}{2} \) that are the roots of the quadratic equation \( \lambda^2 + \lambda + 1 =0 . \) Hence the minimal polynomial for the given permutation matrix is the same as its characteristic polynomial: \( \psi (\lambda ) = \left( \lambda -1 \right) \left(\lambda^2 + \lambda +1 \right) = \lambda^3 -1 . \)

The 4 × 4 permutation matrix

\[ {\bf P} = \begin{bmatrix} 0&0&1&0 \\ 1&0&0&0 \\ 0&0&0&1 \\ 0&1&0&0 \end{bmatrix} \]

has four distinct eigenvalues 1, -1, j, -j because its fourth power is the identity matrix: P⁴ = I. Hence, the minimal polynomial for this permutation matrix is \( \psi (\lambda ) = \lambda^4 -1 . \) ■

The following theorem gives a practical way to determine the minimal polynomial of a square matrix A without actual checking which possible choice obtained from the characteristic polynomial

\[ \chi_A (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \left( \lambda - \lambda_1 \right)^{m_1} \left( \lambda - \lambda_2 \right)^{m_2} \cdots \left( \lambda - \lambda_s \right)^{m_s} \]

leads to the annihilating operator.

Theorem (V. Dobrushkin): Let A be a square n × n matrix and \( {\bf R}_A (\lambda ) = \left( \lambda {\bf I} - {\bf A} \right)^{-1} = \frac{1}{\chi (\lambda )} \, \left[ \mbox{Cof}_{j,i} \right] \) be its resolvent, where [Cof]_i,j is the cofactor matrix for \( \lambda {\bf I} - {\bf A} \) and \( \chi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) . \) Upon reducing common factors (if any), the resolvent can be written as

\[ {\bf R}_A (\lambda ) = \left( \lambda {\bf I} - {\bf A} \right)^{-1} = \frac{{\bf Q} (\lambda )}{\psi (\lambda )} , \]

where ψ(λ) is the minimal polynomial for A. It is assumed that ψ(λ) and every entry of matrix Q are relatively prime. ■

Theorem: Let T be a linear operator on an n-dimensional vector space V such that V is a T-cyclic subspace of itself. Then the characteristic polynomial χ(λ) and the minimal polynomial ψ(λ) have the same degree, so χ(λ) = ψ(λ). ■

Theorem: Let p(λ) be an annihilating polynomial of an n × n matrix A. Every eigenvalue λ is a root of p. That is, \( p(\lambda ) =0 . \) ■

Let \( (\lambda , {\bf x}) \) be an eigenpair of A. Then

\[ {\bf 0} = 0\,{\bf x} = p \left( {\bf A} \right) {\bf x} = p(\lambda ) \,{\bf x} . \]

So \( p(\lambda ) =0 . \)

Theorem: If square matrices A and B are similar (that is, \( {\bf A} = {\bf T}^{-1} {\bf B} {\bf T} \) for some nonsingular matrix T), then their minimal polynomials are the same: \( \psi_A (\lambda ) = \psi_B (\lambda ). \) ■

Corollary: If matrix C is the direct sum of two matrices, \( {\bf C} = {\bf A} \oplus {\bf B} , \) then its minimal polynomial is the product of corresponding minimal polynomials: \( \psi_C (\lambda ) = \psi_A (\lambda )\,\psi_B (\lambda ) . \) ■

Theorem: Let T be a linear operator on a finite-dimensional vector space, and let ψ(λ) be its minimal polynomial. Then T is invertible if and only if \( \psi (0 ) \ne 0 . \) ■

Let T be invertible and \( \psi (\lambda ) = \lambda^n + a_{n-1} \lambda^{n-1} + \cdots + a_0 \) be its minimal polynomial. Then multiplying the equation \( \psi (T ) = T^n + a_{n-1} T^{n-1} + \cdots + a_0 I =0\) by T^-1, we get

\[ T^{n-1} + a_{n-1} T^{n-2} + \cdots + a_1 I + a_0 T^{-1} =0 \qquad\Longrightarrow \qquad T^{-1} = -\frac{1}{a_0} \left( T^{n-1} + a_{n-1} T^{n-2} + \cdots + a_1 I \right) . \]

It is obvious that a₀ cannot be zero because we will get an annihilating polynomial of lesser degree than n. Hence \( \psi (0 ) = a_0 \ne 0 . \)

Now suppose that \( \psi (0 ) = a_0 \ne 0 . \) Then transformation T has no zero eigenvalue. So its determinant, \( \det T = \lambda_1 \lambda_2 \cdots \lambda_n \ne 0 \) and T is invertible.

Example: Consider the matrix

\[ {\bf A} = \begin{bmatrix} 90&51&-13&-17 \\ -89&-50&13&17 \\ 44&24&-3&-8 \\ 145&84&-23&-27 \end{bmatrix} \]

that has two distinct eigenvalues \( \lambda_{1,2} = 4 \quad\mbox{and} \quad \lambda_{3,4} =1 . \)


A = {{90, 51, -13, -17}, {-89, -50, 13, 17}, {44, 24, -3, -8}, {145, 84, -23, -27}}  

Eigenvalues[A]

Out[2]= {4, 4, 1, 1}

Since the eigenvalue λ = 4 is defective, matrix A is not diagonalizable.


Eigenvectors[A]

Out[3]= {{-1, 1, -4, 1}, {0, 0, 0, 0}, {0, 1, 0, 3}, {-1, 2, 1, 0}}

According to Mathematica,


Simplify[Inverse[s*IdentityMatrix[4] - A]]

Out[]= {{(-262 + 81 s + s^2)/((-4 + s)^2 (-1 + s)), ( 3 (-53 + 17 s))/((-4 + s)^2 (-1 + s)), ( 40 - 13 s)/((-4 + s)^2 (-1 + s)), ( 53 - 17 s)/((-4 + s)^2 (-1 + s))},
{( 278 - 89 s)/((-4 + s)^2 (-1 + s)), ( 175 - 59 s + s^2)/((-4 + s)^2 (-1 + s)), (-40 + 13 s)/((-4 + s)^2 (-1 + s)), (-53 + 17 s)/((-4 + s)^2 (-1 + s))},
{( 4 (34 + 11 s))/((-4 + s)^2 (-1 + s)), ( 12 (7 + 2 s))/((-4 + s)^2 (-1 + s)), (-16 - 12 s + s^2)/((-4 + s)^2 (-1 + s)), -(( 4 (7 + 2 s))/((-4 + s)^2 (-1 + s)))}, {(-658 + 145 s)/((-4 + s)^2 (-1 + s)), (-381 + 84 s)/((-4 + s)^2 (-1 + s)), (104 - 23 s)/((-4 + s)^2 (-1 + s)), ( 143 - 36 s + s^2)/((-4 + s)^2 (-1 + s))}}

\[ {\bf R}_{\lambda} \left( {\bf A} \right) = \left( \lambda{\bf I} - {\bf A} \right)^{-1} = \frac{1}{\left( \lambda -4 \right)^2 \left( \lambda -1 \right)}\, \begin{bmatrix} \lambda^2 + 81\lambda -262 & 3 \left( 17\lambda -53 \right) & 40 - 13 \lambda & 53 - 17 \lambda \\ 278 -89\lambda & 175 -59 \lambda + \lambda^2 & 13\lambda -40 & 17\lambda -53 \\ 4 \left( 34 + 11 \lambda \right) & 12 \left( 2\lambda + 7 \right) & \lambda^2 -12\lambda -16 & -4 \left( 2 \lambda + 7 \right) \\ 145\lambda -658 & 84\lambda -381 & 104-23\lambda & \lambda^2 - 36\lambda + 143 \end{bmatrix} . \]

Therefore, its minimal polynomial is

\[ \psi \left( \lambda \right) = \left( \lambda -4 \right)^2 \left( \lambda -1 \right) . \qquad ■ \]

Example: Consider the matrix:

\[ {\bf A} = \begin{bmatrix} {\bf A}_{11} & {\bf A}_{12} \\ {\bf 0}&{\bf A}_{22} \end{bmatrix} = \begin{bmatrix} 1&1&3&4 \\ -1&1&5&6 \\ 0&0&1&2 \\ 0&0&3&2 \end{bmatrix} , \]

where

\[ {\bf A}_{11} = \begin{bmatrix} 1&1 \\ -1&1 \end{bmatrix} , \quad {\bf A}_{12} = \begin{bmatrix} 3&4 \\ 5&6 \end{bmatrix} , \quad {\bf A}_{22} = \begin{bmatrix} 1&2 \\ 3&2 \end{bmatrix} , \]

Since the characteristic polynomial of A has four distinct eigenvalues, its minimal polynomial is the product of simple terms:

\[ \psi_A (\lambda ) = \left( \lambda -4 \right) \left( \lambda +1 \right) \left( \lambda^2 -2\lambda +2 \right) . \]

 
A = {{1, 1, 3, 4}, {-1, 1, 5, 6}, {0, 0, 1, 2}, {0, 0, 3, 2}} 

Eigenvalues[{{1, 1, 3, 4}, {-1, 1, 5, 6}, {0, 0, 1, 2}, {0, 0, 3, 2}}]

Out[2]= {4, 1 + I, 1 - I, -1}

We check that the characteristic polynomial annihilates the matrix:


(4*IdentityMatrix[4] - A).(-1*IdentityMatrix[4] - A).(A.A - 2*A +
   2*IdentityMatrix[4])

Out[3]= {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}

This minimal polynomial is the product of minimal polynomials of matrices A₁₁ and A₂₂:

\[ \psi_{11} (\lambda ) = \left( \lambda -4 \right) \left( \lambda +1 \right) \qquad\mbox{and} \qquad \psi_{22} (\lambda ) = \lambda^2 -2\lambda +2 = \left( \lambda -1- {\bf j} \right) \left( \lambda -1+{\bf j} \right) , \]

respectively. The block matrix A₁₂ plays no role in determination of eigenvalues. ■

Theorem: If n ≥ 2, the monic polynomial \( p(\lambda ) = \lambda^n + a_{n-1} \lambda^{n-1} + \cdots + a_1 \lambda + a_0 \) is both the minimal polynomial and the characteristic polynomial of its companion matrix

\[ {\bf C}_{p} = \begin{bmatrix} 0&1&0& \cdots &0&0 \\ 0&0&1& \cdots &0&0 \\ \vdots&\vdots&\vdots&\ddots &\vdots&\vdots \\ 0&0&0& \cdots &0&1 \\ -a_{0}&-a_1&-a_2& \vdots& -a_{n-2}&-a_{n-1} \end{bmatrix} . \qquad\blacksquare \]

First, we prove that the characteristic polynomial for the companion matrix is \( p_n (\lambda ) = \lambda^n + a_{n-1} \lambda^{n-1} + \cdots + a_1 \lambda + a_0 , \) by induction. For n = 2, we have

\[ \lambda \,{\bf I} - {\bf C}( p_2 ) = \begin{bmatrix} \lambda & -1&0 \\ 0&\lambda&-1 \\ a_0 & a_1& a_2 \end{bmatrix} . \]

Its determinant is

\[ \left\vert \lambda \,{\bf I} - {\bf C}( p_2 ) \right\vert = \lambda^2 a_2 + a_0 + a_1 \lambda = p_2 (\lambda ) . \]

Suppose that the statement is true for any dimension less than n, we consider the characteristic polynomial for the companion matrix of size n. Expanding along with the first column, we get

\[ \left\vert \lambda \,{\bf I} - {\bf C}( p_n ) \right\vert = \lambda \, p_{n-1} (\lambda ) + (-1)^{n-1} a_0 \det \begin{bmatrix} -1&0&0& \cdots & 0 \\ \lambda&-1&0&\cdots & 0 \\ \vdots&\vdots&\vdots&\ddots&\vdots \\ 0&0&0&\cdots&-1 \end{bmatrix} , \]

where \( p_{n-1} (\lambda ) = \lambda^{n-1} + a_{n-1} \lambda^{n-2} + \cdots + a_2 \lambda + a_1 . \) Matrix in the right-hand side is lower triangular and its determinant is the product of its diagonal terms---in our case (-1). Since it has size n-1, the determinant becomes (-1)^n-1, and we obtain the characteristic polynomial to be p_n.

To prove that the minimal polynomial is the characteristic polynomial, we trancate the companion matrix as

\[ {\bf C}_{p} = \left[ {\bf e}_2 \ {\bf e}_3 \ \ldots \ {\bf e}_n \ -{\bf a} \right] , \]

in which e₂, e₃, ... , e_n is the standard basis of ℂⁿ and \( {\bf a} = \left[ a_0 , a_1 , \ldots , a_{n-1} \right]^{\mathrm T} . \) Then we have C_p e_n = - a and C_p e_j-1 = e_j for each j = 2, 3, ..., n. So \( {\bf e}_j = {\bf C}_p {\bf e}_{j-1} = {\bf C}_p^2 {\bf e}_{j-2} = \cdots = {\bf C}_p^{j-1} {\bf e}_{1} \) for each j = 1,2, ..., n. Then

\[ {\bf C}_{p}^n {\bf e}_1 = {\bf C}_{p} {\bf C}_{p}^{n-1} {\bf e}_1 = {\bf C}_{p}{\bf e}_n = -{\bf a} , \]

which shows that

\[ p \left( {\bf C}_{p} \right) {\bf e}_1 = {\bf C}_{p}^n {\bf e}_1 + a_{n-1} {\bf C}_{p}^{n-1} {\bf e}_1 + \cdots +a_1 {\bf C}_{p} {\bf e}_1 + a_0 {\bf e}_1 = {\bf 0} . \]

Thus, \( p \left( {\bf C}_{p} \right) = {\bf 0} . \) If g(z) = z^m + b_m-1 z^m-1 + ... + b₁ z + b₀ is any polynomial of degree less than n, then

\[ g \left( {\bf C}_{p} \right) {\bf e}_1 = {\bf C}_{p}^m {\bf e}_1 + b_{m-1} {\bf C}_{p}^{m-1} {\bf e}_1 + \cdots + b_1 {\bf C}_{p} {\bf e}_1 + b_0 {\bf e}_1 = {\bf e}_{m+1} + b_{m-1} {\bf e}_m + \cdots + b_1 {\bf e}_2 + b_0 {\bf e}_1 . \]

The linear independence of e₁, e₂, ..., e_m+1 ensures that g(C_p) e₁ ≠ 0, so g cannot annihilate C_p. This concludes the proof because the only monic polynomial that annihilates the companion matrix is the characteristic one.

M. D. Burrow, The Minimal Polynomial of a Linear Transformation, American Mathematical Monthly, 80, (1973), pp. 1129--1131.
Conrad, Keith, The minimal polynomial and some applications

Introduction to Linear Algebra

Systems of Linear Equations

Matrix Algebra

Vector Spaces

Eigenvalues, Eigenvectors

Euclidean Spaces

Matrix Decompositions

Applications

Functions of Matrices

Miscellany

Preliminaries

Glossary

Reference

Minimal Polynomial