©2009 2010 2011 2012 Jennifer Harlow, Dominic Lee and Raazesh Sainudiin.
Creative Commons Attribution-Noncommercial-Share Alike 3.0
Markov chain, named after Andrey Markov, is a mathematical model for a possibly dependent sequence of random variables. Intuitively, a Markov Chain is a system which "jumps" among several states, with the next state depending (probabilistically) only on the current state. A useful heuristic is that of a frog jumping among several lily-pads, where the frog's memory is short enough that it doesn't remember what lily-pad it was last on, and so its next jump can only be influenced by where it is now.
Formally, the Markov property states that the conditional probability distribution for the system at the next step (and in fact at all future steps) given its current state depends only on the current state of the system, and not additionally on the state of the system at previous steps:
Since the system changes randomly, it is generally impossible to predict the exact state of the system in the future. However, the statistical and probailistic properties of the system's future can be predicted. In many applications it is these statistical properties that are important.
A Markov chain is a sequence of random variables $X_1, X_2, X_3, \ldots$ with the Markov property, namely that, given the present state, the future and past states are independent. Formally,
The possible values of $X_i$ or the set of all states of the system form a countable set $\mathbb{X}$ called the state space of the chain.
The changes of state of the system are called transitions, and the probabilities associated with various state-changes are called transition probabilities.
Markov chains are often depicted by a weighted directed graph, where the edges are labeled by the probabilities of going from one state to the other states. This is called the flow diagram or transition probability diagram. The transition probability matrix $\mathbf{P}$ encodes the probabilities associated with state-changes or "jumps" from one state to another in the state-space $\mathbb{X}$. If $\mathbb{X}$ is labelled by $\{0,1,2,\ldots\}$ then the $i,j$-th entry in the matrix $\mathbf{P}$ corresponds to the probability of going from state $i$ to state $j$ in one time-step.
The state of the system at the $n$-th time step is described by a state probability vector $$\mathbf{p}^{(n)} = \left( \mathbf{p}^{(n)}_0, \mathbf{p}^{(n)}_1, \mathbf{p}^{(n)}_2,\ldots \right)$$ Thus, $\mathbf{p}^{(n)}_i$ is the probability you will find the Markov chain at state $i \in \mathbb{X}$ at time-step $n$ and $\mathbf{p}^{(0)}_i$ is called the initial probability vector at the convenient initial time $0$.
The state space $\mathbb{X}$ and transition probability matrix $\mathbf{P}$ completely characterizes a Markov chain.
We can coarsely describe the weather of a given day by a toy model that states if it is "wet" or "dry". Each day the weather in our toy model is an element of our state space
We can make a picture of our toy probability model with a flow diagram or transition probability diagram as follows:
The probabilities of weather conditions, given the weather on the preceding day, can be represented by a transition probability matrix:
The matrix $\mathbf{P}$ represents our toy weather model in which a dry day is 90% likely to be followed by another dry day, and a wet or rainy day is 50% likely to be followed by another wet day. The columns can be labelled "dry" and "wet" respectively, and the rows can be labelled in the same manner. For convenience, we will use integer labels $0$ and $1$ for "dry" and "wet", respectively.
$(\mathbf{P})_{i j}=p_{i,j}$ is the probability that, if a given day is of type $i$, it will be followed by a day of type $j$.
Since the transition probability matrix $\mathbf{P}$ is a stochastic matrix:
How do we represent a matrix $\mathbf{P}$ in Sage?
{{{id=4| help(matrix) /// }}} {{{id=11| P = matrix(RR,[[9/10,1/10],[1/2,1/2]]) # construct and assign the matrix to P # could have used QQ instead of RR above if the probs are rational and for exact rational arithmetic P # display P /// [0.900000000000000 0.100000000000000] [0.500000000000000 0.500000000000000] }}} {{{id=14| P[0,1] # accessing (i,j)-th entry of matrix P /// }}} {{{id=12| # know the parent of your matrix type, especially if you are a Math/Stat major P.parent() /// }}}The weather on day 0 is known to be dry. This is represented by a probability vector in which the "dry" entry is 100%, and the "wet" entry is 0%:
The weather on day 1 can be predicted by:
Thus, there is an 90% chance that day 1 will also be sunny.
{{{id=13| p0 = vector((1,0)) # initial probability vector for a dry day zero p0 /// }}} {{{id=15| p1 = p0*P # p1 /// }}} {{{id=169| /// }}}The weather on day 2 can be predicted in the same way:
or, equivalently,
How do we do this in Sage?
{{{id=7| p2 = p0*P^2 # left multiply the initial probability vector by square of P p2 # disclose the probability vector at time-step 2 /// }}}or, equivalently,
{{{id=9| p2 = p1*P # left multiply the probability vector at time-step 1 by P # disclose the probability vector at time-step 2 (compare output of previous cell) p2 /// }}}General rules for day $n$ follow from mathematical induction are:
How do we operate with a matrix in Sage to do this for any given $n$?
{{{id=22| n = 3 # assign some specific time-step or day dry_p0 = vector((1,0)) # initial probability vector for a dry day zero pn = dry_p0 * P^n # probability vector for day n pn # display it /// }}} {{{id=23| n = 3 # assign some specific time-step or day wet_p0 = vector((0,1)) # initial probability vector for a wet day zero pn = wet_p0 * P^n # probability vector for day n pn # display it /// }}} {{{id=111| nmax = 3 # maximum number of days or time-steps of interest [(n, vector((0,1)) * P^n ) for n in range(nmax+1)] # state probability vector at time n=0,1,...,nmax /// }}} {{{id=112| %hide # what's going on here... try to increase nmax and see where the state prob vector is going nmax=2 # maximum number of time-steps of interest # the next line is just html figure heading html("In this example, predictions for the weather on more distant days are increasingly inaccurate and tend towards a steady state vector. This vector represents the probabilities of dry and wet weather on all days, and is independent of the initial weather.
The steady state vector is defined as:
$$\mathbf{s} = \lim_{n \to \infty} \mathbf{p}^{(n)}$$
but only converges to a strictly positive vector if $\mathbf{P}$ is a regular transition matrix (that is, there is at least one $\mathbf{P}^n$ with all non-zero entries making the Markov chain irreducible and aperiodic).
Since the $\mathbf{s}$ is independent from initial conditions, it must be unchanged when transformed by $\mathbf{P}$. This makes it an eigenvector (with eigenvalue $1$), and means it can be derived from $\mathbf{P}$. For our toy model of weather:
So $- 0.1 \mathbf{s}_0 + 0.5 \mathbf{s}_1 = 0$ and since they are a probability vector we know that $\mathbf{s}_0 + \mathbf{s}_1 = 1$.
Solving this pair of simultaneous equations gives the steady state distribution:
In conclusion, in the long term, 83% of days are dry.
How do we operate the above to solve for $\mathbf{s}$ in Sage? There are two "methods". We can either use
You are not expected to follow method 2 if you have not had been introduced to eigen values and eigen vectors.
Method 1: Solving a system of equations to obtain $\mathbb{s}$.
{{{id=118| #P = matrix(QQ,[[9/10,1/10],[1/2,1/2]]); P; P.parent() P = matrix(RR,[[9/10,1/10],[1/2,1/2]]); P; P.parent() /// }}} {{{id=117| I=matrix(2,2,1); I; I.parent() # the 2X2 identity matrix /// }}} {{{id=119| P-I; (P-I).parent() /// }}} {{{id=121| s0, s1 = var('s0, s1') /// }}} {{{id=122| eqs = vector((s0,s1)) * (P-I); eqs[0]; eqs[1] /// }}} {{{id=120| solve([eqs[0] == 0, eqs[1]==0, s0+s1==1], s0,s1) /// }}} {{{id=150| solve([eqs[0] == 0, s0+s1==1], s0,s1) # just use eqs[0]==0 since eqs[1]==0 is redundant /// }}} {{{id=151| solve([eqs[1]==0, s0+s1==1], s0,s1) # just use eqs[1]==0 since eqs[0]==0 is redundant /// }}}End of Method 1 to solve for the steady state vector $\mathbf{s}$.
Method 2: Alternatively use eigen decomposition over rationals in Sage to solve for $\mathbb{s}$. You may ignore this if you hove not seen eigen decomposition before. To follow Method 2 you need to know a bit more about eigen values, eigen vectors and eigen spaces if you are interested.
{{{id=124| P = matrix(QQ,[[9/10,1/10],[1/2,1/2]]); P; P.parent() /// }}} {{{id=129| P.eigenvalues() /// }}} {{{id=134| D, V = P.eigenmatrix_left() # left eigen decomposition D # diagonal matrix of eigen values /// }}} {{{id=135| V # left eigen vectors /// }}} {{{id=154| # checking when we left-multiply by left-eigenvector # of eigenvalue 1 we get the output scaled by 1 V[0]; V[0]*P /// }}} {{{id=155| # checking when we left-multiply by left-eigenvector # of eigenvalue 2/5 we get the output scaled by 2/5 V[1]; V[1]*P /// }}} {{{id=153| V.inverse()*D*V /// }}} {{{id=152| # checking that the eigen decomposition of P is indeed P V*D*V.inverse() /// }}} {{{id=136| EigVecForEigValue1 = V[0]; EigVecForEigValue1 /// }}} {{{id=137| EigVecForEigValue1.norm(1) # normalization factor /// }}} {{{id=138| # normalize to make it a probability vector EigVecForEigValue1 / EigVecForEigValue1.norm(1) /// }}} {{{id=139| /// }}}End of Method 2.
Paul Brouwers has a basic CliFlo datafeed on
http://www.math.canterbury.ac.nz/php/lib/cliflo/?range=20100425:20100501
This returns the date and rainfall in mm as measured from the CHCH aeroclub station. It is assumed that days without readings would not be listed. It expects a range parameter such as: ?range=20100425:20100501 The first number is the starting search date (YYYYMMDD). Colon as separator. The first number is the ending search date (YYYYMMDD). CliFlo limits us to 2 million rows for the subscription and 40,000 rows per query (which is equivalent of over 100 years of data, so I we're safe - The data doesn't go back much before 1944).
{{{id=2| %auto import urllib2 as U #wetdataURL = 'http://www.math.canterbury.ac.nz/php/lib/cliflo/?range=20100101:20100510' wetdataURL = 'http://www.math.canterbury.ac.nz/php/lib/cliflo/rainfall.php' wetdata = U.urlopen(wetdataURL).readlines() datalines=[] for a_line in wetdata: #print a_line temp = a_line.replace('\n','').split(',') temp = [float(q) for q in temp if q != ','] datalines.append(temp) /// }}} {{{id=10| datalines[0] /// [19430802.0, 0.0] }}} {{{id=25| len(datalines) /// 24212 }}} {{{id=143| (2010-1943)*365 /// 24455 }}} {{{id=191| /// }}} {{{id=3| @interact def chch_precipitation(start_date = slider(0,len(datalines)-1,1,len(datalines)-100), end_date = slider(0,len(datalines)-1,1,len(datalines)-1)): htmls1 = 'Grab all the days data from start to end:
{{{id=194| %auto all_daysdata = [[i,int(datalines[i][1]>0)] for i in range(len(datalines))] # all the data as 0s and 1s /// }}}Interactive cell to allow you to select some specific data and turn it into the list of 0 or 1 tuples (this list will then be available in sel_daysdata in later cells in the worksheet).
{{{id=144| @interact def chch_wetdry(output = checkbox(False, "Print out selected Data?"),start_date = slider(0,len(datalines)-1,1,len(datalines)-100), end_date = slider(0,len(datalines)-1,1,len(datalines)-1)): htmls1 = 'In the example we have been working with earlier, the transition probabilities were given by the matrix $$\mathbf{P}=\begin{bmatrix} 0.9 & 0.1\\0.5&0.5\end{bmatrix}$$ and we simply used the given $\mathbf{P}$ to understand the properties and utilities of the probability model for a possibly dependent sequence of $\{0,1\}$-valued random variables encoding the $\{\text{Dry},\text{Wet}\}$ days, respectively.
What we want to do now is use the data from Christchurch's Aeroclub obtained from NIWA to estimate Christchurch's unknown transition probability matrix $$\mathbf{P}= \begin{bmatrix} p_{0,0} & p_{0,1}\\ p_{1,0} & p_{1,1} \end{bmatrix}.$$ Let us use the principle of maximum likelihood and derive the maximum likelihood estimator $$\widehat{\mathbf{P}}=\begin{bmatrix} \hat{p}_{0,0} & \hat{p}_{0,1}\\ \hat{p}_{1,0} & \hat{p}_{1,1}\end{bmatrix}.$$
Recall that the likelihood function $$L(\text{unknown parameters} \ ; \ \text{Data})$$ is essentially the joint density of the data $X_0,X_1,X_2,\ldots,X_n$ as a function of the parameters. The data gives $n+1$ consecutive days of Dry or Wet recordings as $0$ or $1$ at the Christchurch's Aeroclub. What are the unknown parameters here? Well, they are the four entries $(p_{0,0}, p_{0,1}, p_{1,0}, p_{1,1})$ of the unknown $\mathbf{P}$. But, due to the fact that $\mathbf{P}$ is not any old matrix of real numbers, but rather a stochastic matrix, it is constrained so that the entries are non-negative and the entries of each row sums to $1$. For instance we can write the off-diagonal entries in terms of the diagonal entries $p_{0,1}=1-p_{0,0}$ and $p_{1,0}=1-p_{1,1}$ and merely treat two parameters $(p_{0,0},p_{1,1})$ as the truly unknowns that can take any value in the unit square $[0,1] \times [0,1]$ parameter space. $$ \begin{eqnarray*} L(p_{0,0},p_{1,1}) &=& L(p_{0,0},p_{1,1}; x_0,x_1,\ldots,x_n)\\ &=& P\left( X_0=x_0,X_1=x_1,\ldots,X_{n-1}=x_{n-1},X_n=x_n \right) \end{eqnarray*} $$ In the above equation, we are given that the transition probabilities are $p_{0,0},p_{1,1}$. Now, by definition of conditional probability and the markov property, $$ \begin{eqnarray*} L(p_{0,0},p_{1,1}) &=& P\left( X_n=x_n \ | \ X_{n-1}=x_{n-1},\ldots,X_2=x_2,X_1=x_1,X_0=x_0 \right) \ P\left( X_{n-1}=x_{n-1},\ldots,X_2=x_2,X_1=x_1,X_0=x_0 \right)\\ &=& P\left( X_n=x_n \ | \ X_{n-1}=x_{n-1}\right) \ P\left( X_{n-1}=x_{n-1},\ldots,X_2=x_2,X_1=x_1,X_0=x_0 \right) \\ &=& P\left( X_n=x_n \ | \ X_{n-1}=x_{n-1}\right) \ P\left( X_{n-1}=x_{n-1} \ | \ X_{n-2}=x_{n-2}, X_{n-3}=x_{n-3},\ldots,X_2=x_2,X_1=x_1,X_0=x_0 \right) \\ & & \qquad \qquad \qquad P\left(X_{n-2}=x_{n-2}, X_{n-3}=x_{n-3},\ldots,X_2=x_2,X_1=x_1,X_0=x_0 \right) \\ &=& P\left( X_n=x_n \ | \ X_{n-1}=x_{n-1}\right) \ P\left( X_{n-1}=x_{n-1} \ | \ X_{n-2}=x_{n-2} \right) \ P\left(X_{n-2}=x_{n-2}, X_{n-3}=x_{n-3},\ldots,X_2=x_2,X_1=x_1,X_0=x_0 \right) \\ &\vdots&\\ &=& P\left( X_n=x_n \ | \ X_{n-1}=x_{n-1}\right) \ P\left( X_{n-1}=x_{n-1} \ | \ X_{n-2}=x_{n-2} \right) \ \cdots \ P\left( X_{2}=x_{2} \ | \ X_{1}=x_{1} \right) \ P\left( X_{1}=x_{1} \ | \ X_{0}=x_{0} \right) P(X_{0}=x_{0})\\ &=& P(X_{0}=x_{0}) \ \prod_{i=1}^n P \left( X_i=x_i \ | \ X_{i-1}=x_{i-1} \right) \\ &=& \left( p_{0,0} \right)^{n_{0,0}} \ \left( 1-p_{0,0} \right)^{n_{0,1}} \ \left( p_{1,1} \right)^{n_{1,1}} \ \left( 1-p_{1,1} \right)^{n_{1,0}} ,\\ \end{eqnarray*} $$ where, $n_{i,j}$ is the number of transitions from state $i$ to state $j$ in the observed data sequence $x_0,x_1,\ldots,x_n$. Therefore the log likelihood function is $$ \begin{eqnarray*} l(p_{0,0},p_{1,1}) &=& \log \left( L(p_{0,0},p_{1,1}) \right) \\ &=& \log \left( \left( p_{0,0} \right)^{n_{0,0}} \ \left( 1-p_{0,0} \right)^{n_{0,1}} \ \left( p_{1,1} \right)^{n_{1,1}} \ \left( 1-p_{1,1} \right)^{n_{1,0}} \right) \\ &=& {n_{0,0}} \log \left( p_{0,0} \right) + {n_{0,1}} \log \left( 1 - p_{0,0} \right) + {n_{1,1}} \log \left( p_{1,1} \right) + {n_{1,0}} \log \left( 1-p_{1,1} \right) \end{eqnarray*} $$ Finally, we can find the maximum likelihood estimates (MLEs) $\widehat{p}_{0,0}$ and $\widehat{p}_{1,1}$ for the unknown transition probabilities $p_{0,0}$ and $p_{1,1}$ by differentiating the loglikelihood function with respect to $p_{0,0}$ and $p_{1,1}$, respectively, and solving the resulting equations in terms of the variable of differentiation. This will yield the following which me can also obtain symbolically in sage. $$ \begin{eqnarray*} \frac{d}{d p_{0,0}} \left( l(p_{0,0},p_{1,1}) \right) &=& \frac{d}{d p_{0,0}} \left( {n_{0,0}} \log \left( p_{0,0} \right) + {n_{0,1}} \log \left( 1 - p_{0,0} \right) + {n_{1,1}} \log \left( p_{1,1} \right) + {n_{1,0}} \log \left( 1-p_{1,1} \right) \right) \\ &=& \cdots \\ &=& \frac{n_{0,0}}{p_{0,0}} - \frac{n_{0,1}}{1-p_{0,0}} \end{eqnarray*} $$ Similarly, $$ \begin{eqnarray*} \frac{d}{d p_{1,1}} \left( l(p_{0,0},p_{1,1}) \right) &=& \frac{d}{d p_{1,1}} \left( {n_{0,0}} \log \left( p_{0,0} \right) + {n_{0,1}} \log \left( 1 - p_{0,0} \right) + {n_{1,1}} \log \left( p_{1,1} \right) + {n_{1,0}} \log \left( 1-p_{1,1} \right) \right) \\ &=& \cdots \\ &=& \frac{n_{1,1}}{p_{1,1}} - \frac{n_{1,0}}{1-p_{1,1}} \end{eqnarray*} $$ Finally, solving the above equations in terms of $p_{0,0}$ and $p_{1,1}$ gives the MLEs that $$\widehat{p}_{0,0} = \frac{n_{0,0}}{n_{0,0}+n_{0,1}} \quad \text{and} \quad \widehat{p}_{1,1} = \frac{n_{1,1}}{n_{1,0}+n_{1,1}}$$ as follows: $$ \begin{eqnarray*} \frac{d}{d p_{0,0}} \left( l(p_{0,0},p_{1,1}) \right) &=& 0 \\ &\Leftrightarrow& \frac{n_{0,0}}{p_{0,0}} - \frac{n_{0,1}}{1-p_{0,0}} = 0\\ &\Leftrightarrow& p_{0,0} = \frac{n_{0,0}}{n_{0,0}+n_{0,1}} \end{eqnarray*} $$ and $$ \begin{eqnarray*} \frac{d}{d p_{1,1}} \left( l(p_{0,0},p_{1,1}) \right) &=& 0 \\ &\Leftrightarrow& \frac{n_{1,1}}{p_{1,1}} - \frac{n_{1,0}}{1-p_{1,1}} = 0\\ &\Leftrightarrow& p_{1,1} = \frac{n_{1,1}}{n_{1,0}+n_{1,1}} \end{eqnarray*} $$
{{{id=221| /// }}} {{{id=216| %auto var('p00, p11, n00, n01, n10, n11') # declare variables # assign the symbolic expression for the log likelihood function to L L=n00*log(p00) + n01*log(1-p00) + n11*log(p11) + n10*log(1-p11) /// }}} {{{id=220| L.diff(p00) # differentiate log likelihood symbolically with respect to p00 /// n01/(p00 - 1) + n00/p00 }}} {{{id=219| L.diff(p11) # differentiate log likelihood symbolically with respect to p11 /// n10/(p11 - 1) + n11/p11 }}} {{{id=251| solve(L.diff(p00)==0, p00) # solve the equation in terms of p00 to get MLE /// [p00 == n00/(n00 + n01)] }}} {{{id=218| solve(L.diff(p11)==0, p11) # solve the equation in terms of p11 to get MLE /// [p11 == n11/(n10 + n11)] }}} {{{id=217| /// }}} {{{id=161| x0ton = [sel_daysdata[i][1] for i in range(len(sel_daysdata))] transitions_data =[(x0ton[i],x0ton[i+1]) for i in range(0,len(x0ton)-1,1)] #transitions_data /// }}} {{{id=162| transition_counts = makeFreqDict(transitions_data) transition_counts /// }}} {{{id=158| n_00 = transition_counts[(0,0)] n_01 = transition_counts[(0,1)] n_10 = transition_counts[(1,0)] n_11 = transition_counts[(1,1)] n_00, n_01, n_10, n_11 /// }}}Make a function to make a transition counts matrix from any list of 0/1 tuples passed in.
{{{id=195| %auto def makeTransitionCounts(theData): '''Return a transition counts matrix from a list of tuples representing transitions between 2 states 0 and 1. Param theData is a list of tuples where the tuples can be (0,0) or (0,1) or (1,0) or (1,1). Return a 2x2 matrix [[count of (0,0), count of (0,1)],[count of (1,0), count of (1,1)]].''' retMatrix = matrix([[0,0],[0,0]]) # default counts x0ton = [theData[i][1] for i in range(len(theData))] transitions_data =[(x0ton[i],x0ton[i+1]) for i in range(0,len(x0ton)-1,1)] transition_counts = makeFreqDict(transitions_data) #keysToFind = [(0,0),(0,1),(1,0),(1,1)] # we will get an error if we try to access a value for a key that is not in the dictionary # so we need to check if each of the keys we might find is in the dictionary # and only try to access the count if the key is there for i in [0,1]: for j in [0,1]: if (i,j) in transition_counts: retMatrix[i,j] = transition_counts[(i,j)] # else the value in the matrix [i,j] stays as 0 return retMatrix /// }}}Get the transition counts matrix for all the data (we will get the same as we had before, but as a matrix which echoes the layout of our transition probabilities matrix, and we would also be able to use our function for other lists of tuples):
{{{id=196| allTransitionCounts = makeTransitionCounts(all_daysdata) allTransitionCounts /// }}}Make a function to turn transitions counts into a matrix of values for $\widehat{\mathbf{P}}=\begin{bmatrix} \hat{p}_{0,0} & \hat{p}_{0,1}\\ \hat{p}_{1,0} & \hat{p}_{1,1}\end{bmatrix}.$
{{{id=197| %auto def makeMLEMatrix(tcMatrix): '''Return an MLE Matrix from given 2-state transition count data. Param tcMatrix is a 2x2 matrix of transition counts. Returns MLE matrix as [[n_00/(n_00+n_01), n_01/(n_00+n_01)], [n_10/(n_10+n_11), n_11/(n_10+n_11)]]. Returns None if there is not at least one count in each row of tcMatrix.''' retValue = None if (tcMatrix[0] > 0) and (tcMatrix[1] > 0): retValue = matrix(RR,[[tcMatrix[0,0]/(tcMatrix[0,0]+tcMatrix[0,1]), tcMatrix[0,1]/(tcMatrix[0,0]+tcMatrix[0,1])],[tcMatrix[1,0] / (tcMatrix[1,0]+tcMatrix[1,1]), tcMatrix[1,1]/(tcMatrix[1,0]+tcMatrix[1,1])]]) return retValue /// }}}Look at this for all the data:
{{{id=205| allDataMLEMatrix = makeMLEMatrix(allTransitionCounts) allDataMLEMatrix /// }}} {{{id=224| P # compare to made up probs in toy model /// }}}As we said before, we can concentrate just on two unknowns $(\hat{p}_{0,0},\hat{p}_{1,1})$, so we can make a function just to return this tuple:
{{{id=203| %auto def makeMLE00And11(tcMatrix): '''Return an MLE tuple (p00, p11) from given 2-state transition count data. Param tcMatrix is a 2x2 matrix of transition counts. Returns (n_00/(n_00+n_01, n_11/(n_10+n_11) from tcMatrix. Returns null if there is not at least one count in each row of tcMatrix. ''' retValue = None if (tcMatrix[0] > 0) and (tcMatrix[1] > 0): retValue = (RR(tcMatrix[0,0]/(tcMatrix[0,0]+tcMatrix[0,1])), RR(tcMatrix[1,1]/(tcMatrix[1,0]+tcMatrix[1,1]))) return retValue /// }}}What is $(\hat{p}_{0,0},\hat{p}_{1,1})$ using all our data?
{{{id=204| allDataMLE00And11 = makeMLE00And11(allTransitionCounts) allDataMLE00And11 /// }}} {{{id=164| /// }}} {{{id=185| /// }}}We can use our log-likelihood function in the form of a Sage symbolic function with symbolic variables n00, n01, n10, n11, p00, p01, and substitute in the values we have just found, using all our data, for all of these variables, to find the maximum of the log-likelihood function (i.e. the value of the function evaluated at the MLE).
Here is a plot of $(\hat{p}_{00},\hat{p}_{11})$, moving as the amount of data increases. It loops continually so if it looks like it is not moving, it is towards the end when the MLE has settled down - just wait a short while and the loop will start again:
Here is an animated contour plot of our log-likelihood function, moving as the amount of data increases, with the MLE indicated as the black dot •. Again, it loops continually:
You Think: What is the MLE of $\theta$ in an product $Bernoulli(\theta)$ experiment for the problem, i.e., we now model $$X_0,X_1,\ldots,X_n \overset{IID}{\thicksim} Bernoulli(\theta)$$
{{{id=165| # what is the MLE thetahat of the wet or dry (1 or 0) days under IID Bernoulli(theta) RV makeFreqDict(x0ton) /// }}}Here is a nice trick to make a flow diagram fast and dirty in Sage. For our Christchurch Dry-Wet chain with MLE $\widehat{\mathbf{P}}$ we can do the following flow diagram.
{{{id=180| P = matrix([[3/4,1/4],[1/2,1/2]]) # construct and assign the matrix to P p = DiGraph(P,format="weighted_adjacency_matrix") pos_dict={} pos_dict[0] = [1,1] pos_dict[1] = [3,1] p.plot(edge_labels=True,pos=pos_dict,vertex_size=300).show() /// }}}You Try: Consider the Markov chain describing the mode of transport used by a lazy professor. He has only two modes of transport, namely Walk or Drive. Label Walk by $0$ and Drive by $1$. If he walks today then he will definitely drive tomorrow. But, if he drives today then he flips a fair coin to decide whether he will Walk or Drive tomorrow. His decision to get to work is the same on each day. In the cells below try to:
We get introduced to simple random walks in SAGE.
Problem: