es

The objective of this section is to present one of the main tools of applied probability---Markov chains, also known as Markov processes. They are named in honor of the Russian mathematician Andrey (Andrei) Andreyevich Markov (1856 -- 1922), a student of P. Chebyshev. We consider a very important version of a discrete stochastic process that can be in one of a finite number of states---similarly to definition of a finite state automaton. Markov chains give a universal approach to the problems considered in the previous section. Because of its generality, computational tractability, and adequate representation of problems in applications, the Markov chains have become an important part of almost every course in probability or operations research.

Markov Chains

14
14

Andrey Andreyevich Markov.

  In a probabilistic description of an experiment or process, the outcome or result is uncertain and may take any value from known set of states. For example, if a coin is tossed successively, the outcome in the n-th toss could be a head or a tail; or if an ordinary die is rolled, the outcome may be 1, 2, 3, 4, 5, or 6. Associated with each outcome is a number p (\( 0 \le p \le 1 \) ), called the probability of the outcome, which is a measure of the likelihood of occurence of the outcome. If we denote the outcome of a roll of a die by X, then we write \( \Pr [X=i] = p_i \) to mean that the probability that X takes the value i is equal to pi, where \( p_1 + p_2 + \cdots + p_6 = 1 . \) The symbol X is called a random variable and, in general, X may take a real value.

Yakov Tamarkin.

  A Markov process is named after the Russian mathematician Andrey Andreyevich Markov (1856--1922), a student of Pafnuty Chebyshev (Saint Petersburg, Russia). Markov studied Markov processes in the early 20th century, publishing his first paper on the topic in 1906. Markov and his younger brother Vladimir Andreevich Markov (1871--1897) proved Markov brothers' inequality. His son, another Andrei Andreevich Markov (1903--1979), was also a notable mathematician, making contributions to constructive mathematics and recursive function theory. One of his famous Ph.D. students was Jacob/Yakov Davidovich Tamarkin (1888--1945), who immigrated to the USA in 1927 and became a professor at Brown University.

 Often we are given the information that a particular event A has occurred, in which case the probability of occurrence of another event B will be effected. This gives rise to the concept of the conditional probability of the event A given that B has occurred, which is denoted by \( \Pr [A\,|\,B] \) and defined as

\begin{equation} \label{EqMarkov.1} \Pr [A\,|\,B] = \frac{\Pr [A \ \mbox{ and } B]}{\Pr [B]} , \end{equation}
provided that \( \Pr [B] \ne 0. \)

Stochastic process is a process in which changing of the system from one state to another is not predetermined but can only be specified in terms of certain probabilities which are obtained from the past behavior or history of the system.

We now are going to define a first-order Markov process or a Markov Chain as a stochastic process where the transition probability of a state depends exclusively on its immediately preceding state. A more explicit definition can be given as follows.
A discrete-time Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event (also sometimes characterized as "memorylessness"). Roughly speaking, a process satisfies the Markov propertise:
  • The outcome of the n-th trial depends only on the outcome of the (n−1)-th trial and not on the outcome of the earlier trials; and
  • In two successive trials of the experiment, the probability of going from state Si to state Sj remains the same or does not change.

 Markov chains have many applications as statistical models of real-world processes, such as studying cruise control systems in motor vehicles, queues or lines of customers arriving at an airport, exchange rates of currencies, storage systems such as dams, and population growths of certain animal species. The algorithm known as PageRank, which was originally proposed for the internet search engine Google, is based on a Markov process. Furthermore, Markov processes are the basis for general stochastic simulation methods known as Gibbs sampling and Markov Chain Monte Carlo, are used for simulating random objects with specific probability distributions, and have found extensive application in Bayesian statistics.

To understand a system described by a Markov process or a Markov chain the following concepts are necessary.

A probability vector is a row vector, p = [p1, p2, … , pm] whose entries are nonnegative and their sum is one, that is, \[ \sum_{i=1}^m p_i = 1 , \]
Note that in probability theory, it is customary to consider row-vectors in opposite to engineering where vecors are column-verctors.
Stochastic matrix (also called transition matrix or markov matrix) is a matrix P = [ pi,j ] of order m × m where each of its rows is a probability vector, so \( \sum_{j=1}^m p_{i,j} = 1. \) This is a right stochastic matrix; however, when sum of of entries in every column is 1, the matrix is called a left stochastic matrix.
Transition probability is the probability of a system moving from one state to another. Transition matrix of a Markov chain is a probability matrix T = [ ti,j ] of order m × m where ti,j is the transition probability of the system going from state Si to state Sj on successive trials of the experiment.
A transition matrix T is said to be regular if all the entries of at least one of its powers Tn are strictly positive. A Markov chain is regular if its transition matrix is regular.

If a discrete dynamical system vt + 1 = T vt is such that T has an eigenvalue of 1, then something interesting happens. If vt happens to be an eigenvector for the eigenvalue 1, then

\[ \mathbf{v}_{t+1} = \mathbf{T}\,\mathbf{v}_t = \mathbf{v}_t \qquad \Longrightarrow \qquad \mathbf{v}_t = \mathbf{v}_{t+1} = \mathbf{v}_{t+2} = \cdots . \]
If T is a regular probability matrix, then the state vt of the system is ever an eigenvector for the eigenvalue 1, then the system will stay in that state forever:
\[ {\bf T}\,{\bf t} = {\bf t} . \]
This unique probability vector is also called as a steady state vector (also known as a stationary distribution). A common occurrence is when matrix T is diagonalizable, has the eigenvalue 1 and when every other eigenvalue of T satisfies | λ | < 1. In this case, the long-term behaviour of the system will be to converge to a steady state.

   
Example 1: Suppose that m moleculas are distributed between urns (I) and (II), and at each step or trial, a molecule is chosen at random and transferred from its urn to the otehr urn. Let Xn denote the number of molecules in urn (I) just after the n-th step. Then \( X_0 , X_1 , \ldots \) is a Markov chain with state space \( \left\{ 0, 1, \ldots , m \right\} \) and transition probabilities are
\[ \begin{split} P_{i,i-1} &= \frac{i}{m} , \quad\mbox{if } 0 \le i \le m , \\ P_{i,i+1} &= \frac{m-i}{m} , \quad\mbox{if } 0 \le i \le m , \\ P_{i,j} &=0 , \quad\mbox{if } |i-j| >1. \end{split} \]
   ■
End of Example 1
   
Example 2:    ■
End of Example 2
   
Example 3: In state elections, it has been determined that voters will switch votes according to the following probabilities.
    Socialist Capitalist Independent
Socialist 0.7 0.35 0.4
Capitalist 0.2 0.6 0.3
Independent 0.1 0.05 0.3

The transition matrix for the Markov chain is

\[ {\bf T} = \begin{bmatrix} 0.7 & 0.35 & 0.4 \\ 0.2 & 0.6 & 0.3 \\ 0.1 & 0.05 & 0.3 \end{bmatrix} \]
Since T is regular matrix, there exists a unique vector t = [x, y, z]T such that T t = t. Since t is a probability vector, x + y + z = 1 and further
\[ \begin{bmatrix} 0.7 & 0.35 & 0.4 \\ 0.2 & 0.6 & 0.3 \\ 0.1 & 0.05 & 0.3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} x \\ y \\ z \end{bmatrix} . \]
This leads to the system of algebraic equations
\[ \begin{split} 1 &= x+y+z , \\ x &= 0.7\,x + 0.35\,y + 0.4\,z , \\ y &= 0.2\,x + 0.6\,y + 0.3\,z , \\ z &= 0.1\,x + 0.05\,y + 0.3\,z . \end{split} \]
Solving this system with Mathematica, we get
Solve[{1 == x + y + z, x == 0.7*x + 0.35*y + 0.4*z, y == 0.2*x + 0.6*y + 0.3*z, z == 0.1*x + 0.05*y + 0.3*z}, {x, y, z}]
{{x -> 0.546392, y -> 0.350515, z -> 0.103093}}
\[ x \approx 0.546392, \qquad y \approx 0.350515, \qquad z \approx 0.103093 . \]
Hence, the percentages of voters that will vote for socialist is 54.63%, capitalist is 35.06% and an independent is 10.31%. Note that if the given information is close to reality, then the conclusion will also be close to reality.    ■
End of Example 3
   
Example 4:    ■
End of Example 4
   
Example 6:
MatrixRank[m = {{1, 2, 0, -3, 1, 0}, {1, 2, 2, -3, 1, 2}, {1, 2, 1, -3, 1, 1}, {3, 6, 1, -9, 4, 3}}]

Out[1]= 3

Then each of the following scripts determine a subset of linearly independent vectors:

m[[ Flatten[ Position[#, Except[0, _?NumericQ], 1, 1]& /@
Last @ QRDecomposition @ Transpose @ m ] ]]

Out[2]= {{1, 2, 0, -3, 1, 0}, {1, 2, 2, -3, 1, 2}, {3, 6, 1, -9, 4, 3}}

or, using subroutine

MinimalSublist[x_List] :=
Module[{tm, ntm, ytm, mm = x}, {tm = RowReduce[mm] // Transpose,
ntm = MapIndexed[{#1, #2, Total[#1]} &, tm, {1}],
ytm = Cases[ntm, {___, ___, d_ /; d == 1}]};
Cases[ytm, {b_, {a_}, c_} :> mm[[All, a]]] // Transpose]

we apply it to our set of vectors.

m1 = {{1, 2, 0, -3, 1, 0}, {1, 2, 1, -3, 1, 2}, {1, 2, 0, -3, 2, 1}, {3, 6, 1, -9, 4, 3}};
MinimalSublist[m1]

Out[3]= {{1, 0, 1}, {1, 1, 1}, {1, 0, 2}, {3, 1, 4}}

In m1 you see 1 row and n columns together,so you can transpose it to see it as column {{1, 1, 1, 3}, {0, 1, 0, 1}, {1, 1, 2, 4}}

One can use also the standard Mathematica command: IndependenceTest.    ■
End of Example 6
Consider a system that can be in any one of n+1 mutually exclusive categories called states and labeled v₀, v₁, … , vn. The system is observed at discrete points of time, labeled t = 0, 1, 2, …. Let the random variable Xt represent the state of the system at time t. A discrete stochastic process X₀, X₁, X₂, … is a Markov chain (or is said to have the Markovian property) if the conditional probability \begin{align*} \Pr [\,X_{t+1} &= v_{t+1} \mid X_t = v_t ,\ X_{t-1} = v_{t-1} , \ldots , X_0 = v_0 \,] \\ &= \Pr [\,X_{t+1} = v_{t+1} \mid X_t = v_t \,] \end{align*} for ,I.t,/I. = 0, 1, … and every sequence { vj }j ≥ 0.

This definition is naturally extended to countable number of states. The Markovian property just assures us that it is memoryless: transition from a state vt to vt+1 does not depend on how the system arrives to the state vt. It is convenient and does not lose any generality to assume that finite state space of the Markov chain is { 0, 1, 2, … , n }, which is customarily denoted as [0..n] (from Maple).

To make our exposition clear and succinct, we use matrix notations. First, we distinguish between vectors and matrices, denoted by upper case letters in bold font. We use lower case letters to denote vectors: column-vectors are written in bold font whereas row-vectors are identified by putting arrows above the letters. For instance, \( \displaystyle \quad \vec{a} = \langle a_1 , a_2 , \ldots , a_n \rangle \quad \) is a row-vector, but its transposition, \( \displaystyle \quad \vec{a}^{\mathrm T}, \ \) which is a column-vector, will be denoted by a letter in bold font. However, it will be clear from exposition when we use row vecors or column vectors, which gives us a power to use the same letter for both.

Let pij be the probability of transition from state i to state j in one time step. The square matrix P = [ pij ] is called the transition probability matrix or simply transition matrix. It has stochastic property that the sum of entries in every row is 1. For m > 0, let \( \displaystyle \quad p_{ij}^{(m)} \stackrel{\tiny def}{=} \Pr [\,X_{t+m} = j \mid X_t = i \,] \quad \) be the conditional probability that the system will be in state j after exactly m time steps, given that it starts in state i at time t. When m = 1, p(1)ij = pij. The corresponding matrix P(m) = [ p(m)ij ] is expressed as P(m) = PP(m-1). So by induction, we get P(m) = Pm. Therefore, the probability of passing from state i to j in m steps (without specifying the intermediate states) is given by raising P to the power m and picking the corresponding element: p(m)ij. The following Chapman-Kolmogorov equations provide a method for computing these m-step transition probabilities:

\begin{equation} \label{E.markov1} p^{(x+y)}_{i,j} = \sum_{k\ge 0} p^{(x)}_{i,k} \, p^{(y)}_{k,j} , \qquad \mbox{for any integers }\ x,y\ge 0 \quad \mbox{and all } i,j\in [0..n]. \end{equation}

   

Example 5:    ■
End of Example 5
Let pi(t) denote the probability that the process is at state i at time t, then (n+1)-tuple, \( \displaystyle \quad \vec{p} (t) =\langle p_0 (t), p_1 (t), p_2 (t), \ldots, p_n (t) \rangle ,\ \) is the row vector giving the distribution of the chain at time t. Summing over all possible intermediate states at time t−1, we get
\begin{equation}\label{E7M.1} p_i (t) = \sum_{k\ge 0} p_k (t-1)\,p_{k,i} \qquad\mbox{or} \qquad \vec{p}(t) = \vec{p}(t-1) {\bf P} . %\quad \text{ where }~~ p_j' = \sum_{i=1}^n p_i\,P_{i,j}. \end{equation}
This relation allows us to express the probability distribution at (integer) time t via the initial distribution \( \displaystyle \quad \vec{\pi}_0 : \)
\[ \vec{p} (t) = \vec{\pi}_0 \,{\bf P}^t . \]

   

Example 6:    ■
End of Example 6
We bring some terminology and basic results from the general theory of Markov chains. A good compact treatment of Markov chains is in books of Feller [Chapter XV], Ross [Chapter IV], and Mitzenmacher & Upfal, where the reader can find many examples of problems solved with Markov chains.
A state i is said to be a accessible from state j if, for some integer n ≥ 0, p(n)i,k > 0. If two states i and j are accessible from each other, we say that they communicate or that they belong to the same communicating class. A Markov chain is irreducible if all its states belong to one communicating class.

Since the communication relation is an equivalence relation (that is, it is reflexive, symmetric, and transitive), all states may be partitioned into one or more disjoint classes such that those states that communicate with each other are in the same class. It might exist a transition from one class to another one, but it is impossible to return to the previous class. This is entirely analogous to viewing the graphical description of the chain as a directed graph, and finding its strongly-connected components.    

Example 7:    ■
End of Example 7
A state i is said to be a transient state if there exists a state j, ij, that is accessible from state i but not vice versa, namely, state i is not accessible from state j. In other words, a state is transient if, upon entering this state, the process may never return to this state again. A communication class consisting of transient states is called the transient class: with probability 1, the process will leave this class and never return.
A state i is said to be a recurrent (or persistent) state if, upon entering this state, the process definitely will return to this state again. A Markov chain is recurrent if every state in the chain is recurrent. A recurrent class is a communication class consisting of recurrent states: a Markov chain never leaves the recurrent class once it reaches it.

A state is recurrent if the chain with probability 1 visits this state. This means that the chain visits a recurrent state infinitely many times. On the other hand, if state i is transient, then the time of next visit of state i is a geometric random variable.

A state i is said to be absorbing if, upon entering this state, the process never will leave this state. Hence, state $i$ is an absorbing state if and only if pi,i = 1.

Observation 1: If X₁, X₂, … , Xn are the states in a Markov chain, then Xi is an absorbing state if
\[ \Pr [X_i = x_i \mid X_{i-1} = x_i ] = 1 . \]
   
Example 8:    ■
End of Example 8
A recurrent state i is said to be a periodic if there exists an integer, d > 1, such that p(n)k,k = 1 for all values of n other than d, 2d, 3d, … , and d is the largest integer with this property. So d = d(k) is the greatest common divisor of the set { n ≥ 0 : i>p(n)k,k >0 }. A discrete Markov chain is periodic if any state in the chain is periodic. A state that is not periodic is called {\bf aperiodic}. We call irreducible chain or transition matrix aperiodic if it has period 1.
Periodicity is a class property, that is, if state i in a communication class has period d, then the all states in that class have period d.
In a finite-state Markov chain, recurrent states that are aperiodic are called ergodic states. A Markov chain is said to be ergodic if all its states are ergodic states.
   
Example 9:    ■
End of Example 9
    If a Markov chain has the only one communication class, then for every pair of indices i,j ∈ [0..n] there exists m = m(i,j) such that p(m)i,j > 0. If a chain has no transient states, then no matter where the process starts, after sufficient lapse of time it would be in any state. Therefore, for sufficiently large m, all entries of Pm are strictly positive.

When a Markov chain is reducible having r recurrent classes, R₁, R₂, … , Rr, then its transition matrix can be written in the following canonical form (after, perhaps, reordering the states): \begin{equation} \label{E.markov5} {\bf P} = \left[ \begin{array}{c|c} {\bf Q} & {\bf S} \\ \hline {\bf 0} & \tilde{\bf P} \end{array} \right] , \end{equation} where \( \displaystyle \quad \tilde{\bf P} \ \) is a block-diagonal matrix corresponding to each Rk, k = 1, 2, … , r (the submatrix obtained from considering only the rows and columns for states in Rk).

   
Example 10:    ■
End of Example 10

If the chain has s transient states, then s × s submatrix Q concerns the process as long as it stays in transient states. The matrix Q is a substochastic matrix, that is, a matrix with nonnegative entries whose row sums are less than or equal 1. This implies that all the eigenvalues of Q have absolute values strictly less than 1. Therefore, Qk0 as k → ∞, and IQ is an invertible matrix, where I denotes the identity s ×. s matrix.

The resolvent of the matrix Q at 1 is called the fundamental matrix: \begin{equation} \label{E.markov6} {\bf N} \stackrel{\tiny def}{=} ({\bf I} - {\bf Q})^{-1} = {\bf I} + {\bf Q} + {\bf Q}^2 + \cdots = \sum_{k\ge 0} {\bf Q}^k . \end{equation}

Formula \eqref{E.markov6} follows from the Maclaurin series, for the function \( \displaystyle \quad \frac{1}{1-x} = 1 + x + x^2 + \cdots = \sum_{k\ge 0} x^k . \quad \) Similarly, if a Markov chain has at least one absorbing state, then IQ is invertible with inverse \( \displaystyle \quad \mathbf{N} = \sum_{k\ge 0} \mathbf{Q}^k . \quad \) The (i,j)-th entry of the fundamental matrix N represents the expected time that the Markov chain stays in state Xj given that it starts in Xi.

The rate of the convergence of series \eqref{E.markov6} is determined by the second-largest (in absolute value) eigenvalue of the matrix Q, which is guaranteed to be less than 1, but could be close. In \eqref{E.markov5}, the matrix \( \displaystyle \quad \tilde{\bf P} \ \) deals with the process after it has reached an ergodic set. The matrix S concerns the transition from transient to ergodic states.

If there are q transient states and r absorbing states, the states of a Markov chain can be renumbered, thus permuting the rows and columns of the transition matrix in order to put the transition matrix into the form

\begin{equation} \label{E.markov7} {\bf P} = \left( \begin{array}{c|c} {\bf Q} & {\bf S}^{\mathrm T} \\ \hline {\bf 0} & {\bf I} \end{array} \right) \qquad\mbox{or} \qquad \left[ \begin{array}{c|c} {\bf Q} & {\bf 0} \\ \hline {\bf S} & {\bf I} \end{array} \right] , \end{equation}
where the q × q matrix Q represents the transition between transient states, the r × r identity matrix I represents absorption, the q × r matrix S represents the transition from a transient state to an absorbing state, and the r × q matrix 0 represents the lack of ability to move from an absorbing state to a transient state.

Recall that we use two forms of transition matrix P in order to please everyone (or no one ?) because some people use matrix multiplication of column vector from left (then it is written in brackets), others perform multiplication of row vectors by a matrix from right (then the transient matrix is written in parethersis). As usual, ST denoted a transposed matrix to matrix S.

Theorem 1: In any finite Markov chain, there are numbers c > 0 and 0< b < 1 such that \( \displaystyle \quad p^{(m)}_{i,j} \le c\,b^m , \quad \) for any transient states i, j.

Moreover, Qm0, as m → ∞, exponentially.
It is sufficient to prove the statement for the case when transient states i and j belong to the same communication class. Suppose that the process starts in a transient state. Since there are only finite number of states, it is possible to reach an ergodic state in no more than $r$ steps (r is less than the number of states). Hence there is a positive number ω such that the probability of entering an ergodic state in at most r steps is at least ω, from any transient state. Therefore, the probability of not being absorbed within r transitions is at most 1 − ω, which is less than 1. The probability of not reaching an ergodic state in mr steps is less than or equal to (1 − ω)m as m increases.
   
Example 11:    ■
End of Example 11

Understanding the long-term behavior of a Markov chain boils down to understanding the behavior of Pm for large m values. The crucial role in this issue is played by an invariant probability distribution for the corresponding transition matrix.

A stationary distribution (also called an equilibrium distribution) of a Markov chain is a probability distribution \( \displaystyle \quad \vec{\pi} = \langle \pi_0 , \pi_1, \ldots , \pi_n \rangle \ \) such that \begin{equation} \label{E.markov3} \sum_k \pi_k =1, \qquad \vec{\pi} =\vec{\pi} \,{\bf P} \qquad\mbox{or}\qquad \vec{\pi} \, ({\bf I} - {\bf P} ) = {\bf 0}, \end{equation} where I is the identity matrix (which has all entries zeroes except 1's on the diagonal).
Equation \eqref{E.markov3} shows that an invariant probability vector, \( \displaystyle \quad \vec{\pi} = \langle \pi_0 , \pi_1, \ldots , \pi_n \rangle ,\ \) is a left eigenvalue of P corresponding the eigenvalue 1. One way to compute the stationary distribution of a finite Markov chain is to solve the homogeneous system of n+1$ linear equations \eqref{E.markov3}. Since there are n+1 unknowns, the system \( \displaystyle \quad \vec{\pi} =\vec{\pi} \,{\bf P} \ \) has infinite many solutions, but the additional constraint \( \displaystyle \quad \sum_{k=0}^n \pi_k =1\ \) leads to the unique solution.

If a chain ever reaches an equilibrium distribution, then it maintains that distribution for all future time. In particular, a Markov chain with an absorbing state will be in its stationary distribution once it reaches the absorbing state. If k is an absorbing state, and the process starts in state i, the probability of ever going to state k is called the probability of absorption into state k, given that the system started in state i.

It is possible to give a probabilistic interpretation of the elements of the fundamental matrix. Define Nij to be the number of times the process passes through the transient state j starting at another transient state i till absorption, i,j = 1, 2, … , s. If i = j, the initial sojourn is already counted as one visit. The expected number of steps until the chain enters a recurrent class, assuming that it starts at state i, is the sum \( \displaystyle \quad \sum_j n_{i,j} \ \) over all transient states j.

Theorem 2: The entries of the fundamental matrix are expectations of Ni,j: \[ n_{i,j} \stackrel{\tiny def}{=} E[N_{i,j}] = {({\bf I} -{\bf Q})^{-1}}_{i,j}, \quad 1\le i,j\le s, \] where s is the number of transient states.
We show two proofs, which while making similar statements do it in usefully different manners. The first is straightforward. Since b>Qki,j is the probability that a path from state i visits state j in step k (using Q&sup0; = I), we view Ni,j as a sum of indicator random variables, for each possible step, and then \begin{equation}\label{E7M.6} n_{i,j} = \sum_{k\ge 0} 1\cdot {{\bf Q}^k}_{i.j} = {({\bf I}-{\bf Q})^{-1}}_{i,j}, \end{equation} where we add up the contributions of all steps. The second proof furnishes an example of a useful computational method for Markov chains: randomization on the first step from state i to, say, 𝑎: \begin{equation}\label{E7M.8} n_{i,j}=\delta_{i,j}+\sum_{a=1}^s {\bf Q}_{i,a}\, n_{a.j} . \end{equation}
   
Example 12:    ■
End of Example 12
Theorem 3: Suppose the transition matrix for a finite Markov chain is in the canonical form \eqref{E.markov5}. If bi,j is the probability of absorption into state j, that is, the probability that the process starting in transient state i ends up in absorbing state j, then \begin{equation} \label{E.markov8} {\bf B} \stackrel{\tiny def}{=} [ b_{i,j} ] = {\bf N} \, {\bf S} = ({\bf I} - {\bf Q})^{-1} \, {\bf S} . \end{equation}
Starting in transient state i, the process may be absorbed in j in one or more steps. The probability of capture on a single step is pi,j. If this does not happen, the process may move either to another absorbing state (so it will be impossible to reach state j), or to a transient state k. In the latter case, there is probability bk,j of being captured in the right state. Hence, we have \[ b_{i,j} = p_{i,j} + \sum_k p_{i,k} \,b_{k,j} , \] which can be written in matrix form as \[ {\bf B} = {\bf S} + {\bf Q} \,{\bf B}. \] Solving for {\bf B}, we get the required formula.
   
Example 13:    ■
End of Example 13

The number of transitions made by the process in going from state i to state j for the first time is usually referred to as the first passage time. When i = j, this first passage time is just the number of transitions until the process returns to the initial state. Once it happens, the system recommences the process from scratch, forgetting that it was in the same state before. We denote by r(m)i,j the probability that, starting at state i, the first transition to state j occurs at time m: \begin{equation} \label{E.markov2} r^{(m)}_{i,j} = \Pr [\, X_m =j \quad\mbox{and for }\ 1\le k \( \displaystyle \quad r_{i,j} = \sum_{m\ge 1} r^{(m)}_{i,j} \ \) be the probability that, starting from state i, the system ever reaches state j. Then we can claim that state i is recurrent if \( \displaystyle \quad \sum_{m\ge 0} r^{(m)}_{i,i} =1, \quad \) and it is transient if \( \displaystyle \quad \sum_{m\ge 0} r^{(m)}_{i,i} <1. \ \) Whenever \( \displaystyle \quad \sum_{m\ge 0} r^{(m)}_{i,i} =1, \quad \) the expected time to first reach state j from state i, denoted by \( \displaystyle \quad \mu_{i,j} = \sum_{m\ge 1} m\,r^{(m)}_{i,i}, \ \) uniquely satisfies the equation \begin{equation} \label{E.markov9} \mu_{i,j} = 1+ \sum_{k\ne j} p_{i,k} \, \mu_{k,j} . \end{equation} So μi,i is the expected time to return to state i. Since we consider only finite Markov chains (with finite number of states), all expected values μi,i are finite for all recurrent states, but not for transient ones. A finite Markov chain has at least one recurrent state.

We summarize the properties of stationary distributions in the following statement from the book by Mitzenmacher and Upfa.

Theorem 4: A finite, irreducible, ergodic Markov chain has the following properties:
  1. the chain has a unique stationary distribution \( \displaystyle \quad \vec{\pi} = \langle \pi_0 , \pi_2 , \ldots , \pi_n \rangle; \)
  2. for all i and j, the limit \( \displaystyle \quad \lim_{t\to\infty} p_{i,j}^{(t)} \ \) exists and is independent of i$;
  3. \( \displaystyle \pi_k = \lim_{t\to\infty} p_{i,k}^{(t)} = 1/\mu_{k,k}. \)
  4. State i is recurrent if \( \displaystyle \quad \sum_{n\ge 1} p_{i,i}^{(n)} =\infty \quad \) and transient if \( \displaystyle \quad \sum_{n\ge 1} p_{i,i}^{(n)} <\infty .\)

Property 3 tells us that the expected number of steps to return to state k, assuming that the chain starts at k, is given by the reciprocal of the kth component of the stationary distribution.

There is a typical way to relate Markov chains to analysis of algorithms. An algorithm starts at some fixed initial state. Then, according to the input, it moves till completion through a finite number of , which we again denote by vi, i = 0, 1, 2, … , n. A state in this context reflects the values of the variables that the algorithm uses to record the processing that it has performed so far. If a sufficiently large number of states is allowed, we could probably represent in this way any algorithm, but this approach is not really meaningful and effective unless the state space is either small or has a simple structure. Such operation is represented formally as traversal of a directed graph: nodes represent states of the algorithm, edges represent possible transitions. The crucial step in the model formulation is the assignment of probabilities to the edges.

Usually, there are two sources of probabilities in an algorithm. One is induced by the distribution over the input, as would be typically the case in a sorting or searching algorithm. Another scenario arises when the algorithm itself contains a random element, taking action based on sampling a built-in random event generator.

We say that this representation has the Markovian property when the probability that the algorithm goes from a certain state $v_i$ into some other state vj does not depend on the history of the algorithm execution before it reaches state vi. It is then equal to the probability assigned to the edge ($i,j$). There are two different mathematical descriptions of such a chain: a directed, labeled graph with weights, and a stochastic matrix of dimensions equal to the number of nodes, with entries that are positive and sum, in each row, to 1.

Definition of a discrete stochastic process is equivalent to saying that the set of states supports a first-order Markov chain, or provides a Markovian description of the algorithm. If the algorithm had in fact a memory of the last two, or some {\em fixed} number of states (say $k$), we would say it is a $k$th-order Markov chain. It is relatively easy to translate any such higher-order chain to a first-order one, at the cost of a significant increase in the size of the state space.

Suppose that the chain has the only one absorbing state, the last one, $v_n$. Since ni,j is the expected number of visits of the chain in vj till absorption, starting in state vi, then summing over the ith row of the fundamental matrix would provide us with di, defined as the expected time (counted in state transitions) till absorption, from state vi. Let us show this by independent calculation, invoking randomization on the first transition: %Randomizing, \begin{equation}\label{E7M.d} d_i =1+\sum_{j=0}^n p_{i,j}d_{j} =1+\sum_{j=0}^{n-1} p_{i,j}d_j =1+\sum_{j=0}^{n-1}{\bf Q}_{i,j}d_j, \quad 0\le i <n. \end{equation} We change the sum above to terminate at n−1 without changing its value, since dn = 0 is the only value which is meaningful. With our convention that v₀ is the initial state, the first component, d₀, is the expected run-time of the algorithm. Note that \eqref{E7M.d} is identical to \eqref{E923.1}. Using the fundamental n × n matrix, N, we can express the solution of the system of algebraic equations \eqref{E7M.d} in the vector form: \begin{equation}\label{E.markov10} {\bf d} = ({\bf I}-{\bf Q})^{-1}\mathbf{e} ={\bf N}\,\mathbf{e} , \end{equation} where \( \displaystyle \quad \mathbf{d}^{T} = \langle d_0 , d_1 , \ldots , d_{n-1} \rangle\quad \) and \( \displaystyle \quad \mathbf{e}^{T} = \langle 1,1,\ldots , 1 \rangle \quad \) are n-dimensional vectors.

=======================================

Example 1 Transition matrix

\[ \mathbf{M} = \begin{bmatrix} 0.15 & 0.6 & 0.36 \\ 0.55 & 0.3 & 0.23 \\ 0.3 & 0.1 & 0.41 \end{bmatrix} \]
It is sometimes helpful in a Markov process to visualize the system with a state diagram, where arrows between states representing the transitions and weights representing the probability of that transition. An example of a state diagram corresponding to the transition matrix can be seen in the following igure.
c1 = Graphics[{Thick, Black, Circle[{0, 0}, 0.25]}]; c2 = Graphics[{Thick, Black, Circle[{1.5, 0}, 0.25]}]; c3 = Graphics[{Thick, Black, Circle[{3, 0}, 0.25]}]; ar1 = Graphics[ Arrow[BezierCurve[{{-0.2, -0.2}, {-0.78, 0}, {-0.2, 0.2}}]]] ar2 = Graphics[ Arrow[BezierCurve[{{1.4, -0.2}, {1.5, -0.78}, {1.6, -0.2}}]]]; ar3 = Graphics[ Arrow[BezierCurve[{{3, -0.25}, {1.5, -1.4}, {0, -0.25}}]]]; ar4 = Graphics[ Arrow[BezierCurve[{{0, 0.25}, {1.5, 1.0}, {03, 0.25}}]]]; ar5 = Graphics[{Thick, Arrow[{{0.21, 0.15}, {1.29, 0.15}}]}]; ar6 = Graphics[{Thick, Arrow[{{1.71, 0.15}, {2.79, 0.15}}]}]; ar7 = Graphics[{Thick, Arrow[{{1.29, -0.15}, {0.21, -0.15}}]}]; ar8 = Graphics[{Thick, Arrow[{{2.79, -0.15}, {1.71, -0.15}}]}]; ar9 = Graphics[ Arrow[BezierCurve[{{3.2, -0.15}, {3.9, 0}, {3.2, 0.15}}]]] txt = Graphics[{Black, Text[Style["0.15", FontSize -> 14, Bold], {-0.36, 0.3}], Text[Style["0.55", FontSize -> 14, Bold], {0.8, 0.26}], Text[Style["0.6", FontSize -> 14, Bold], {0.8, -0.26}], Text[Style["0.3", FontSize -> 14, Bold], {1.9, 0.6}]}]; txt2 = Graphics[{Black, Text[Style["0.41", FontSize -> 14, Bold], {3.36, 0.3}], Text[Style["0.1", FontSize -> 14, Bold], {2.3, 0.26}], Text[Style["0.23", FontSize -> 14, Bold], {2.3, -0.26}], Text[Style["0.36", FontSize -> 14, Bold], {0.9, -1.0}]}]; txt3 = Graphics[{Black, Text[Style["0.3", FontSize -> 14, Bold], {1.35, -0.55}], Text[Style["1", FontSize -> 14, Bold], {0, 0.0}], Text[Style["2", FontSize -> 14, Bold], {1.5, 0.0}], Text[Style["3", FontSize -> 14, Bold], {3.0, 0.0}]}]; Show[c1, c2, c3, ar1, ar2, ar3, ar4, ar5, ar6, ar7, ar8,ar9,txt,txt2,txt3]

Example1, Markov chain

A long-term behavior of the states is given in the following statement provided that a transition matrix is known.

Theorem 5: Given a Markov chain with transition matrix M, \[ \mathbf{v} = \lim_{n\to\infty} \mathbf{M}^b \mathbf{v} , \] where v is the unit eigenvector associated with eigenvalue λ = 1.
The Perron-Frobenius theorem is the main tool in this proof. If the Markov chain has one or more absorbing states, the transition matrix can be written in a form such that the absorbing states are all in the last rows (and columns). In this case, the transition matrix is irreducible. However, if all states in the Markov chain are transient, the transition matrix may be irreducible or primitive, or may only be non-negative. Thus, we will only make the assumption that the transition matrix, M, is non-negative.

Using the general case of the Perron-Frobenius Theorem, we assume that the largest eigenvalue in magnitude λP F has a corresponding unit eigenvector v.

If M is an n × n transition matrix with entries Mi,j, then \[ \lambda_P \mathbf{v} = \mathbf{M}\,\mathbf{v} = \sum_{i=1}^n M_{i,j} \mathbf{v}_j . \] Thus, \[ \lambda_P \mathbf{v} \bullet \mathbf{v} = \sum_{j=1}^n M_{i,j} = 1 \] and λP = 1. In addition, \[ \lim_{n\to\infty} \mathbf{M}^n \mathbf{v} = \lim_{n\to\infty} \lambda_P^n \mathbf{v} = \mathbf{v} . \] The vector \( \displaystyle \quad \frac{\mathbf{v}}{\sum_j v_j} \quad \) will provide a percent vector for the long-run behavior of the Markov chain.

  1. Using matrix \( {\bf N} = \begin{bmatrix} 8 & -1 \\ 13 & 18 \end{bmatrix} , \) send message

 

  1. Anton, Howard (2005), Elementary Linear Algebra (Applications Version) (9th ed.), Wiley International
  2. Feller, William, An Introduction to Probability Theory and its Applications, Vol. 1, 3rd Ed., John Wiley & Sons, New York, 1968.
  3. Mitzenmacher, Michael and Upfal, Eli, Probability and Computing, Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, Cambridge, UK, 2005.
  4. Ross, Sheldon M., Introduction to Probability Models Academic Press, New York, 1989.