The objective of this section is to present one of the main tools of applied
probability---Markov chains, also known as Markov processes. They are named in honor of the Russian mathematician Andrey
(Andrei) Andreyevich Markov (1856 -- 1922), a student of P. Chebyshev.
We consider a very important version of a discrete stochastic process
that can be in one of a finite number of states---similarly to definition of a
finite state automaton. Markov chains give a universal approach to the problems
considered in the previous section. Because of its generality, computational
tractability, and adequate representation of problems
in applications, the Markov chains have become an important part of
almost every course in probability or operations research.
Markov Chains
14
14
Andrey Andreyevich Markov.
In a probabilistic description of an experiment or process, the outcome or result is uncertain and may take
any value from known set of states. For example, if a coin is tossed successively, the outcome in the n-th
toss could be a head or a tail; or if an ordinary die is rolled, the outcome may be 1, 2, 3, 4, 5, or 6. Associated
with each outcome is a number p (\( 0 \le p \le 1 \) ), called the probability of
the outcome, which is a measure of the likelihood of occurence of the outcome. If we denote the outcome of a roll
of a die by X, then we write \( \Pr [X=i] = p_i \) to mean that the probability that
X takes the value i is equal to pi, where
\( p_1 + p_2 + \cdots + p_6 = 1 . \) The symbol X is called a random variable and,
in general, X may take a real value.
Yakov Tamarkin.
A Markov process is named after the Russian mathematician Andrey Andreyevich Markov (1856--1922), a student
of Pafnuty Chebyshev (Saint Petersburg, Russia). Markov studied Markov processes in the early 20th century, publishing
his first paper on the topic in 1906. Markov and his younger brother Vladimir Andreevich Markov (1871--1897) proved Markov brothers'
inequality. His son, another Andrei Andreevich Markov (1903--1979), was also a notable mathematician, making
contributions to constructive mathematics and recursive function theory. One of his famous Ph.D. students was Jacob/Yakov Davidovich Tamarkin
(1888--1945), who immigrated to the USA in 1927 and became a professor at Brown University.
Often we are given the information that a particular event A has occurred, in which case the probability
of occurrence of another event B will be effected. This gives rise to the concept of the conditional probability
of the event A given that B has occurred, which is denoted by
\( \Pr [A\,|\,B] \) and defined as
Stochastic process is a process in which changing of the system from one state to another is not predetermined but can only be specified in terms of certain probabilities which are obtained from the past behavior or history of the system.
We now are going to define a first-order Markov process or a Markov Chain as a stochastic process where the transition probability of a state depends exclusively on its immediately preceding state.
A more explicit definition can be given as follows.
A discrete-time Markov chain is a stochastic model describing a sequence of possible events in which
the probability of each event depends only on the state attained in the previous event (also sometimes
characterized as "memorylessness"). Roughly speaking, a process satisfies the Markov propertise:
The outcome of the n-th trial depends only on the outcome of the (n−1)-th trial and not on the outcome of the earlier trials; and
In two successive trials of the experiment, the probability of going from state Si to state Sj remains the same or does not change.
Markov chains have many applications as statistical models of real-world processes, such as studying cruise
control systems in motor vehicles, queues or lines of customers arriving at an airport, exchange rates of currencies,
storage systems such as dams, and population growths of certain animal species. The algorithm known as PageRank,
which was originally proposed for the internet search engine Google, is based on a Markov process. Furthermore,
Markov processes are the basis for general stochastic simulation methods known as Gibbs sampling and Markov Chain
Monte Carlo, are used for simulating random objects with specific probability distributions, and have found extensive
application in Bayesian statistics.
To understand a system described by a Markov process or a Markov chain the following concepts are necessary.
A probability vector is a row vector, p = [p1, p2, … , pm] whose entries are nonnegative and their sum is one, that is,
\[
\sum_{i=1}^m p_i = 1 ,
\]
Note that in probability theory, it is customary to consider row-vectors in opposite to engineering where vecors are column-verctors.
Stochastic matrix (also called transition matrix or markov matrix) is a matrix P = [ pi,j ] of order m × m where each of its rows is a probability vector, so \( \sum_{j=1}^m p_{i,j} = 1. \) This is a right stochastic matrix; however, when sum of of entries in every column is 1, the matrix is called a left stochastic matrix.
Transition probability is the probability of a system moving from one state to another. Transition matrix of a Markov chain is a probability matrix T = [ ti,j ] of order m × m where ti,j is the transition probability of the system going from state Si to state Sj on successive trials of the experiment.
A transition matrix T is said to be regular if all the entries of at least one of its powers Tn are strictly positive. A Markov chain is regular if its transition matrix is regular.
If a discrete dynamical system
vt + 1 = T vt
is such that T has an eigenvalue of 1,
then something interesting happens. If vt
happens to be an eigenvector for the eigenvalue 1,
then
If T is a regular probability matrix, then the state
vt
of the system is ever an eigenvector for the eigenvalue
1,
then the system will stay in that state forever:
\[
{\bf T}\,{\bf t} = {\bf t} .
\]
This unique probability vector is also called as a steady state vector (also known as a stationary distribution).
A common occurrence is when matrix T
is diagonalizable, has the eigenvalue 1
and when every other eigenvalue of T
satisfies
| λ | < 1.
In this case, the long-term behaviour of the system will be to converge to a steady state.
Example 1:
Suppose that m moleculas are distributed between urns (I) and (II), and
at each step or trial, a molecule is chosen at random and transferred from its urn to the otehr urn. Let
Xn denote the number of molecules in urn (I) just after the n-th step. Then
\( X_0 , X_1 , \ldots \) is a Markov chain with state space
\( \left\{ 0, 1, \ldots , m \right\} \) and transition probabilities are
\[
\begin{split}
P_{i,i-1} &= \frac{i}{m} , \quad\mbox{if } 0 \le i \le m , \\
P_{i,i+1} &= \frac{m-i}{m} , \quad\mbox{if } 0 \le i \le m , \\
P_{i,j} &=0 , \quad\mbox{if } |i-j| >1.
\end{split}
\]
■
End of Example 1
Example 2:
■
End of Example 2
Example 3:
In state elections, it has been determined that voters will switch votes according to the following probabilities.
Solve[{1 == x + y + z, x == 0.7*x + 0.35*y + 0.4*z,
y == 0.2*x + 0.6*y + 0.3*z, z == 0.1*x + 0.05*y + 0.3*z}, {x, y, z}]
{{x -> 0.546392, y -> 0.350515, z -> 0.103093}}
\[
x \approx 0.546392, \qquad y \approx 0.350515, \qquad z \approx 0.103093 .
\]
Hence, the percentages of voters that will vote for socialist is 54.63%, capitalist is 35.06% and an independent is 10.31%.
Note that if the given information is close to reality, then the conclusion will also be close to reality.
■
In m1 you see 1 row and n columns together,so you can transpose it to see it as column
{{1, 1, 1, 3}, {0, 1, 0, 1}, {1, 1, 2, 4}}
One can use also the standard Mathematica command: IndependenceTest.
■
End of Example 6
Consider a system that can be in any one of n+1 mutually exclusive
categories called states and labeled v₀, v₁, … , vn. The
system is observed at discrete points of time, labeled t = 0, 1, 2, …. Let
the random variable Xt represent the state of the system at time t. A
discrete stochastic process X₀, X₁, X₂,
… is a Markov
chain (or is said to have the Markovian property) if the conditional
probability
\begin{align*}
\Pr [\,X_{t+1} &= v_{t+1} \mid X_t = v_t ,\ X_{t-1} = v_{t-1} , \ldots , X_0 =
v_0 \,]
\\
&= \Pr [\,X_{t+1} = v_{t+1} \mid X_t = v_t \,]
\end{align*}
for ,I.t,/I. = 0, 1, … and every sequence { vj }j ≥ 0.
This definition is naturally extended to countable number of states.
The Markovian property just assures us that it is memoryless: transition from
a state vt to vt+1 does not depend on how the system arrives to the
state vt. It is convenient and does not lose any generality to assume that
finite state space of the Markov chain is { 0, 1, 2, … , n }, which is
customarily denoted as [0..n] (from Maple).
To make our exposition clear and succinct, we use matrix notations. First, we
distinguish between vectors and matrices, denoted by upper case letters in
bold font. We use lower case letters to denote vectors: column-vectors are
written in bold font whereas row-vectors are identified by putting arrows
above the letters. For instance, \( \displaystyle \quad \vec{a} = \langle a_1 , a_2 , \ldots , a_n
\rangle \quad \) is a row-vector, but its transposition, \( \displaystyle \quad \vec{a}^{\mathrm T}, \ \) which is a
column-vector, will be denoted by a letter in bold font. However, it will be clear from exposition when we use row vecors or column vectors, which gives us a power to use the same letter for both.
Let pij be the probability of transition from state i to state j in one time step. The square matrix P = [ pij ] is
called the transition probability matrix or simply transition
matrix. It has stochastic
property that the sum of entries in every row is 1. For m > 0, let
\( \displaystyle \quad p_{ij}^{(m)} \stackrel{\tiny def}{=} \Pr [\,X_{t+m} = j \mid X_t = i \,] \quad \) be the conditional
probability that
the system will be in state j after exactly m time steps, given that it
starts in state i at time t. When m = 1,
p(1)ij = pij. The
corresponding matrix P(m) = [ p(m)ij ] is expressed as
P(m) = PP(m-1). So by induction, we get P(m) = Pm. Therefore, the probability of passing
from state i to j in m steps (without specifying
the intermediate states) is given by raising P to the power m and
picking the corresponding element: p(m)ij. The following Chapman-Kolmogorov equations provide a method for computing these
m-step transition probabilities:
Let pi(t) denote the probability that the process is at state i at time t, then (n+1)-tuple, \( \displaystyle \quad \vec{p} (t) =\langle p_0 (t), p_1 (t), p_2 (t),
\ldots, p_n (t) \rangle ,\ \) is the row vector giving the distribution of the
chain at time t. Summing over all possible intermediate states at time
t−1, we get
This relation allows us to express the probability distribution at (integer)
time t via the initial distribution \( \displaystyle \quad \vec{\pi}_0 : \)
\[
\vec{p} (t) = \vec{\pi}_0 \,{\bf P}^t .
\]
Example 6:
■
End of Example 6
We bring some terminology and basic results from the general theory of
Markov chains. A good compact treatment of Markov chains is in books
of Feller [Chapter XV], Ross [Chapter IV], and Mitzenmacher & Upfal, where
the reader can find many examples of problems solved with Markov chains.
A state i is said to be a accessible from state j if, for some integer n ≥ 0, p(n)i,k > 0. If two states i and j are
accessible from each other, we say that they communicate or that they
belong to the same communicating class. A Markov chain is irreducible if all its states belong to one communicating class.
Since the communication relation is an equivalence relation (that is, it is
reflexive, symmetric, and transitive), all states may be partitioned into one
or more disjoint classes such that those states that communicate with each
other are in the same class. It might exist a transition from one class to
another one, but it is impossible to return to the previous class. This
is entirely analogous to viewing the
graphical description of the chain as a directed graph, and finding its
strongly-connected components.
Example 7:
■
End of Example 7
A state i is said to be a transient state if there exists a state
j, i ≠ j, that is accessible from state i but not vice versa,
namely, state i is not accessible from state j. In other words, a state is
transient if, upon entering this state, the process may never return to this
state again. A communication class consisting of transient states is called
the transient class: with probability 1, the process will leave this class and
never return.
A state i is said to be a recurrent (or persistent) state if, upon
entering this state, the process definitely will return to this state again. A
Markov chain is recurrent if every state in the chain is recurrent. A
recurrent class is a communication class consisting of recurrent states: a
Markov chain never leaves the recurrent class once it reaches it.
A state is recurrent if the chain with probability 1 visits this state. This
means that the chain visits a recurrent state infinitely many times. On the
other hand, if state i is transient, then the time of next visit of state i is a geometric random variable.
A state i is said to be absorbing if, upon entering this
state, the process never will leave this state. Hence, state $i$ is an
absorbing state if and only if pi,i = 1.
Observation 1:
If X₁, X₂, … , Xn are the states in a Markov chain, then Xi is
an absorbing state if
\[
\Pr [X_i = x_i \mid X_{i-1} = x_i ] = 1 .
\]
Example 8:
■
End of Example 8
A recurrent state i is said to be a periodic if there exists an
integer, d > 1, such that p(n)k,k = 1 for all values of n other than
d, 2d, 3d, … , and d is the largest integer with this property. So d = d(k) is the greatest common divisor of the set { n ≥ 0 : i>p(n)k,k >0 }. A discrete Markov chain is periodic if any state in
the chain is periodic. A state that is not periodic is called {\bf
aperiodic}. We call irreducible chain or transition matrix aperiodic
if it has period 1.
Periodicity is a class property, that is, if state i in a communication
class has period d, then the all states in that class have period d.
In a finite-state Markov chain, recurrent states that are aperiodic are called
ergodic states. A Markov chain is said to be ergodic if all its
states are ergodic states.
Example 9:
■
End of Example 9
If a Markov chain has the only one communication class, then for every pair of
indices i,j ∈ [0..n] there exists m = m(i,j) such that p(m)i,j > 0. If a chain has no transient states, then no matter where the process
starts, after sufficient lapse of time it would be in any state. Therefore,
for sufficiently large m, all entries of Pm are strictly positive.
When a Markov chain is reducible having r recurrent classes, R₁, R₂, … , Rr, then its transition matrix can be written in the following
canonical form (after, perhaps, reordering the states):
\begin{equation} \label{E.markov5}
{\bf P} = \left[ \begin{array}{c|c}
{\bf Q} & {\bf S} \\ \hline
{\bf 0} & \tilde{\bf P}
\end{array} \right] ,
\end{equation}
where \( \displaystyle \quad \tilde{\bf P} \ \) is a block-diagonal matrix corresponding to each Rk,
k = 1, 2, … , r (the submatrix obtained from considering only the rows and
columns for states in Rk).
Example 10:
■
End of Example 10
If the chain has s transient states, then s × s submatrix Q concerns the process as long as it stays in transient states. The matrix Q is a substochastic matrix, that is, a matrix with nonnegative entries
whose row sums are less than or equal 1. This implies that all the eigenvalues
of Q have absolute values strictly less than 1. Therefore, Qk → 0 as k → ∞, and I − Q is an invertible matrix, where I denotes the identity s ×. s matrix.
The resolvent of the matrix Q at 1 is called the fundamental
matrix:
\begin{equation} \label{E.markov6}
{\bf N} \stackrel{\tiny def}{=} ({\bf I} - {\bf Q})^{-1} = {\bf I} + {\bf Q} + {\bf Q}^2 +
\cdots = \sum_{k\ge 0} {\bf Q}^k .
\end{equation}
Formula \eqref{E.markov6} follows from the Maclaurin series, for the function \( \displaystyle \quad \frac{1}{1-x} = 1 + x + x^2 + \cdots = \sum_{k\ge 0} x^k . \quad \) Similarly,
if a Markov chain has at least one absorbing state, then I − Q is invertible with inverse \( \displaystyle \quad \mathbf{N} = \sum_{k\ge 0} \mathbf{Q}^k . \quad \) The (i,j)-th entry of the fundamental matrix N represents the expected
time that the Markov chain stays in state Xj given that it starts in Xi.
The rate of the convergence of series \eqref{E.markov6} is determined by the second-largest (in absolute
value) eigenvalue of the matrix Q, which is guaranteed to be less than 1,
but could be close.
In \eqref{E.markov5}, the matrix \( \displaystyle \quad \tilde{\bf P} \ \) deals with the process after
it has reached an ergodic set. The matrix S concerns the transition from
transient to ergodic states.
If there are q transient states and r absorbing states, the states of a Markov
chain can be renumbered, thus permuting the rows and columns of the transition
matrix in order to put the transition matrix into the form
where the q × q matrix Q represents the transition between transient states,
the r × r identity matrix I represents absorption, the q × r matrix S represents the
transition from a transient state to an absorbing state, and the r × q matrix
0 represents the lack of ability to move from an absorbing state to a transient state.
Recall that we use two forms of transition matrix P in order to please everyone (or no one ?) because some people use matrix multiplication of column vector from left (then it is written in brackets), others perform multiplication of row vectors by a matrix from right (then the transient matrix is written in parethersis). As usual, ST denoted a transposed matrix to matrix S.
Theorem 1:
In any finite Markov chain, there are numbers c > 0 and 0< b < 1 such that \( \displaystyle \quad
p^{(m)}_{i,j} \le c\,b^m , \quad \) for any transient states i, j.
Moreover, Qm → 0, as m → ∞, exponentially.
It is sufficient to prove the statement for the case when transient states i
and j belong to the same communication class. Suppose that the process
starts in a transient state. Since there are only finite number of states, it
is possible to reach an ergodic state in no more than $r$ steps (r is less than the number of states). Hence there is a positive number ω such
that the probability of entering an ergodic state in at most r steps is at least ω, from any transient state. Therefore, the probability of not
being absorbed within r transitions is at most 1 − ω, which is less than 1. The probability of not reaching an ergodic state in mr steps is less
than or equal to (1 − ω)m as m increases.
Example 11:
■
End of Example 11
Understanding the long-term behavior of a Markov chain boils down to
understanding the behavior of Pm for large m values. The crucial
role in this issue is played by an invariant probability distribution for the
corresponding transition matrix.
A stationary distribution (also called an equilibrium distribution) of a
Markov chain is a probability distribution \( \displaystyle \quad \vec{\pi} = \langle \pi_0 , \pi_1,
\ldots , \pi_n \rangle \ \) such that
\begin{equation} \label{E.markov3}
\sum_k \pi_k =1, \qquad \vec{\pi} =\vec{\pi} \,{\bf P} \qquad\mbox{or}\qquad
\vec{\pi} \, ({\bf I} - {\bf P} ) = {\bf 0},
\end{equation}
where I is the identity matrix (which has all entries
zeroes except 1's on the diagonal).
Equation \eqref{E.markov3} shows that an invariant probability vector, \( \displaystyle \quad \vec{\pi} =
\langle \pi_0 , \pi_1, \ldots , \pi_n \rangle ,\ \) is a left
eigenvalue of P corresponding the eigenvalue 1. One way to compute the
stationary distribution of a finite Markov chain is to solve the homogeneous
system of n+1$ linear equations \eqref{E.markov3}. Since there are n+1
unknowns, the system \( \displaystyle \quad \vec{\pi} =\vec{\pi} \,{\bf P} \ \) has infinite many
solutions, but the additional constraint \( \displaystyle \quad \sum_{k=0}^n \pi_k =1\ \) leads to the
unique solution.
If a chain ever reaches an equilibrium distribution, then it maintains that
distribution for all future time. In particular, a Markov chain with an
absorbing state will be in its
stationary distribution once it reaches the absorbing state. If k is an
absorbing state, and the process starts in state i, the probability of ever
going to state k is called the probability of absorption into state
k, given that the system started in state i.
It is possible to give a probabilistic interpretation of the elements of the
fundamental matrix. Define Nij to be the number of times the
process passes through the transient state j starting at another
transient state i till absorption, i,j = 1, 2, … , s.
If i = j, the initial sojourn is already counted as one visit. The expected
number of steps until the chain enters a recurrent class, assuming that it
starts at state i, is the sum \( \displaystyle \quad \sum_j n_{i,j} \ \) over all transient states
j.
Theorem 2:
The entries of the fundamental matrix are expectations of Ni,j:
\[
n_{i,j} \stackrel{\tiny def}{=} E[N_{i,j}] = {({\bf I} -{\bf Q})^{-1}}_{i,j}, \quad 1\le i,j\le s,
\]
where s is the number of transient states.
We show two proofs, which while making similar statements do it in
usefully different manners. The first is straightforward. Since
b>Qki,j is the probability that a path from state i
visits state j in step k (using Q&sup0; = I), we view Ni,j
as a sum of indicator random variables, for each possible step, and then
\begin{equation}\label{E7M.6}
n_{i,j} = \sum_{k\ge 0} 1\cdot {{\bf Q}^k}_{i.j} = {({\bf I}-{\bf
Q})^{-1}}_{i,j},
\end{equation}
where we add up the contributions of all steps.
The second proof furnishes an example of a useful
computational method for Markov chains: randomization on the first step
from state i to, say, 𝑎:
\begin{equation}\label{E7M.8}
n_{i,j}=\delta_{i,j}+\sum_{a=1}^s {\bf Q}_{i,a}\, n_{a.j} .
\end{equation}
Example 12:
■
End of Example 12
Theorem 3:
Suppose the transition matrix for a finite Markov chain is in the canonical
form \eqref{E.markov5}. If bi,j is the probability of absorption into
state j, that is, the probability that the process
starting in transient state i ends up in absorbing state j, then
\begin{equation} \label{E.markov8}
{\bf B} \stackrel{\tiny def}{=} [ b_{i,j} ] = {\bf N} \, {\bf S} = ({\bf I} - {\bf Q})^{-1} \,
{\bf S} .
\end{equation}
Starting in transient state i, the process may be absorbed in j in one or
more steps. The probability of capture on a single step is pi,j. If this
does not happen, the process may move either to another absorbing state (so it
will be impossible to reach state j), or to a transient state k. In the
latter case, there is probability bk,j of being captured in the right
state. Hence, we have
\[
b_{i,j} = p_{i,j} + \sum_k p_{i,k} \,b_{k,j} ,
\]
which can be written in matrix form as
\[
{\bf B} = {\bf S} + {\bf Q} \,{\bf B}.
\]
Solving for {\bf B}, we get the required formula.
Example 13:
■
End of Example 13
The number of transitions made by the process in going from state i to
state j for the first time is usually referred to as the first passage
time. When i = j, this first passage time is just the number of transitions
until the process returns to the initial state. Once it happens, the system
recommences the process from scratch, forgetting that it was in the same state
before.
We denote by r(m)i,j the probability that, starting at state i, the
first transition to state j occurs at time m:
\begin{equation} \label{E.markov2}
r^{(m)}_{i,j} = \Pr [\, X_m =j \quad\mbox{and for }\ 1\le k \( \displaystyle \quad r_{i,j} = \sum_{m\ge 1} r^{(m)}_{i,j} \ \) be the probability that, starting
from state i, the system ever reaches state j.
Then we can claim that state i is recurrent if \( \displaystyle \quad \sum_{m\ge 0}
r^{(m)}_{i,i} =1, \quad \) and it is transient if \( \displaystyle \quad \sum_{m\ge 0}
r^{(m)}_{i,i} <1. \ \) Whenever \( \displaystyle \quad \sum_{m\ge 0} r^{(m)}_{i,i} =1, \quad \) the expected
time to first reach state j from state i, denoted by
\( \displaystyle \quad \mu_{i,j} = \sum_{m\ge 1} m\,r^{(m)}_{i,i}, \ \) uniquely satisfies the
equation
\begin{equation} \label{E.markov9}
\mu_{i,j} = 1+ \sum_{k\ne j} p_{i,k} \, \mu_{k,j} .
\end{equation}
So μi,i is the
expected time to return to state i. Since we consider only finite Markov
chains (with finite number of states), all expected values μi,i are
finite for all recurrent states, but not for transient ones. A finite Markov
chain has at least one recurrent state.
We summarize the properties of stationary
distributions in the following statement from the book by Mitzenmacher and Upfa.
Theorem 4:
A finite, irreducible, ergodic Markov chain has the following
properties:
the chain has a unique stationary distribution \( \displaystyle \quad \vec{\pi} =
\langle \pi_0 , \pi_2 , \ldots , \pi_n \rangle; \)
for all i and j, the limit \( \displaystyle \quad \lim_{t\to\infty}
p_{i,j}^{(t)} \ \) exists and is independent of i$;
State i is recurrent if \( \displaystyle \quad \sum_{n\ge 1} p_{i,i}^{(n)} =\infty \quad \)
and transient if \( \displaystyle \quad \sum_{n\ge 1} p_{i,i}^{(n)} <\infty .\)
Property 3 tells us that the expected number of steps to return to state k,
assuming that the chain starts at k, is given by the reciprocal of the kth
component of the stationary distribution.
There is a typical way to relate Markov chains to analysis of algorithms. An algorithm starts at some fixed initial state.
Then, according to the input, it moves till completion through a finite number
of , which we again denote by vi, i = 0, 1, 2, … , n.
A state in this context reflects the values of the variables that the
algorithm uses to record the processing that it has performed so far.
If a sufficiently large number of states is allowed, we
could probably represent in this way any algorithm, but this approach is
not really meaningful and
effective unless the state space is either small or has a simple
structure.
Such operation is represented formally as traversal of
a directed graph: nodes represent states of the algorithm, edges
represent possible transitions.
The crucial step in the model formulation is the assignment of
probabilities to the edges.
Usually, there are two sources of probabilities in an algorithm.
One is induced by the distribution over the
input, as would be typically the case in a sorting or searching algorithm.
Another scenario arises when the algorithm itself contains a random
element, taking action based on sampling a built-in random event generator.
We say that this representation has the Markovian property when the
probability that the algorithm goes from a certain state $v_i$ into some other
state vj does not depend on the history of the algorithm execution before it
reaches state vi. It is then equal to the probability assigned to the edge
($i,j$). There are two different mathematical descriptions of such a chain:
a directed, labeled graph with weights, and a stochastic matrix of dimensions
equal to the number of nodes, with entries that are positive and sum, in each
row, to 1.
Definition of a discrete stochastic process is equivalent to saying that the set of states
supports a
first-order Markov chain, or provides a Markovian description of the algorithm.
If the algorithm had in fact a memory of the last two, or some {\em
fixed} number of states (say $k$), we would say it is a $k$th-order Markov
chain. It is relatively easy to translate any such higher-order chain to
a first-order one, at the cost of a significant increase in the size of
the state space.
Suppose that the chain has the only one absorbing state, the last one, $v_n$.
Since ni,j is the expected number of visits of the chain in vj
till absorption, starting in state vi, then summing over the ith row
of the fundamental matrix would provide us with di, defined as
the expected time (counted in state transitions) till absorption, from
state vi. Let us show this by independent calculation, invoking
randomization on the first transition:
%Randomizing,
\begin{equation}\label{E7M.d}
d_i =1+\sum_{j=0}^n p_{i,j}d_{j}
=1+\sum_{j=0}^{n-1} p_{i,j}d_j
=1+\sum_{j=0}^{n-1}{\bf Q}_{i,j}d_j, \quad 0\le i <n.
\end{equation}
We change the sum above to terminate at n−1 without changing its value,
since dn = 0 is the only value which is meaningful. With our convention that
v₀ is the initial state, the first component, d₀, is the expected
run-time of the algorithm. Note that \eqref{E7M.d} is identical to
\eqref{E923.1}. Using the fundamental
n × n matrix, N, we can express the solution of the system of
algebraic equations \eqref{E7M.d} in the vector form:
\begin{equation}\label{E.markov10}
{\bf d} = ({\bf I}-{\bf Q})^{-1}\mathbf{e} ={\bf N}\,\mathbf{e} ,
\end{equation}
where \( \displaystyle \quad \mathbf{d}^{T} = \langle d_0 , d_1 , \ldots , d_{n-1} \rangle\quad \) and
\( \displaystyle \quad \mathbf{e}^{T} = \langle 1,1,\ldots , 1 \rangle \quad \) are n-dimensional vectors.
It is sometimes helpful in a Markov process to visualize the system with a state
diagram, where arrows between states representing the transitions and weights
representing the probability of that transition.
An example of a state diagram corresponding to the transition matrix can
be seen in the following igure.
A long-term behavior of the states is given in the following statement provided that a transition matrix is known.
Theorem 5:
Given a Markov chain with transition matrix M,
\[
\mathbf{v} = \lim_{n\to\infty} \mathbf{M}^b \mathbf{v} ,
\]
where v is the unit eigenvector associated with eigenvalue λ = 1.
The Perron-Frobenius theorem is the main tool in this proof. If the
Markov chain has one or more absorbing states, the transition matrix can be
written in a form such that the absorbing states are all in the last rows (and
columns). In this case, the transition matrix is irreducible. However, if all states
in the Markov chain are transient, the transition matrix may be irreducible or
primitive, or may only be non-negative. Thus, we will only make the assumption
that the transition matrix, M, is non-negative.
Using the general case of the Perron-Frobenius Theorem, we assume that
the largest eigenvalue in magnitude λP F has a corresponding unit eigenvector
v.
If M is an n × n transition matrix with entries Mi,j, then
\[
\lambda_P \mathbf{v} = \mathbf{M}\,\mathbf{v} = \sum_{i=1}^n M_{i,j} \mathbf{v}_j .
\]
Thus,
\[
\lambda_P \mathbf{v} \bullet \mathbf{v} = \sum_{j=1}^n M_{i,j} = 1
\]
and λP = 1. In addition,
\[
\lim_{n\to\infty} \mathbf{M}^n \mathbf{v} = \lim_{n\to\infty} \lambda_P^n \mathbf{v} = \mathbf{v} .
\]
The vector \( \displaystyle \quad \frac{\mathbf{v}}{\sum_j v_j} \quad \)
will provide a percent vector for the long-run behavior of the Markov chain.
Anton, Howard (2005), Elementary Linear Algebra (Applications Version) (9th ed.), Wiley International
Feller, William, An Introduction to Probability Theory
and its Applications, Vol. 1, 3rd Ed., John Wiley & Sons, New York, 1968.
Mitzenmacher, Michael and Upfal, Eli, Probability and Computing,
Randomized Algorithms and Probabilistic Analysis, Cambridge University
Press, Cambridge, UK, 2005.
Ross, Sheldon M., Introduction to Probability Models Academic
Press, New York, 1989.