diff --git a/lectures/markov_chains_I.md b/lectures/markov_chains_I.md index 6b6ee924..acc4ff52 100644 --- a/lectures/markov_chains_I.md +++ b/lectures/markov_chains_I.md @@ -82,16 +82,15 @@ In other words, If $P$ is a stochastic matrix, then so is the $k$-th power $P^k$ for all $k \in \mathbb N$. -Checking this in {ref}`the first exercises ` below. +You are asked to check this in {ref}`an exercise ` below. ### Markov chains + Now we can introduce Markov chains. Before defining a Markov chain rigorously, we'll give some examples. -(Among other things, defining a Markov chain will clarify a connection between **stochastic matrices** and **Markov chains**.) - (mc_eg2)= #### Example 1 @@ -110,7 +109,7 @@ Here there are three **states** * "mr" represents mild recession * "sr" represents severe recession -The arrows represent **transition probabilities** over one month. +The arrows represent transition probabilities over one month. For example, the arrow from mild recession to normal growth has 0.145 next to it. @@ -120,7 +119,7 @@ The arrow from normal growth back to normal growth tells us that there is a 97% probability of transitioning from normal growth to normal growth (staying in the same state). -Note that these are *conditional* probabilities --- the probability of +Note that these are conditional probabilities --- the probability of transitioning from one state to another (or staying at the same one) conditional on the current state. @@ -258,14 +257,12 @@ Here is a visualization, with darker colors indicating higher probability. :tags: [hide-input] G = nx.MultiDiGraph() -edge_ls = [] -label_dict = {} for start_idx, node_start in enumerate(nodes): for end_idx, node_end in enumerate(nodes): value = P[start_idx][end_idx] if value != 0: - G.add_edge(node_start,node_end, weight=value, len=100) + G.add_edge(node_start,node_end, weight=value) pos = nx.spring_layout(G, seed=10) fig, ax = plt.subplots() @@ -273,7 +270,7 @@ nx.draw_networkx_nodes(G, pos, node_size=600, edgecolors='black', node_color='wh nx.draw_networkx_labels(G, pos) arc_rad = 0.2 -curved_edges = [edge for edge in G.edges()] + edges = nx.draw_networkx_edges(G, pos, ax=ax, connectionstyle=f'arc3, rad = {arc_rad}', edge_cmap=cm.Blues, width=2, edge_color=[G[nodes[0]][nodes[1]][0]['weight'] for nodes in G.edges]) @@ -317,7 +314,7 @@ This means that, for any date $t$ and any state $y \in S$, = \mathbb P \{ X_{t+1} = y \,|\, X_t, X_{t-1}, \ldots \} ``` -This means that once we know the current state $X_t$, adding knowledge of earlier states $X_{t-1}, X_{t-2}$ provides no additional information about probabilities of **future** states. +This means that once we know the current state $X_t$, adding knowledge of earlier states $X_{t-1}, X_{t-2}$ provides no additional information about probabilities of *future* states. Thus, the dynamics of a Markov chain are fully determined by the set of **conditional probabilities** @@ -356,7 +353,7 @@ By construction, the resulting process satisfies {eq}`mpp`. ```{index} single: Markov Chains; Simulation ``` -A good way to study a Markov chains is to simulate it. +A good way to study Markov chains is to simulate them. Let's start by doing this ourselves and then look at libraries that can help us. @@ -434,7 +431,7 @@ P = [[0.4, 0.6], Here's a short time series. ```{code-cell} ipython3 -mc_sample_path(P, ψ_0=[1.0, 0.0], ts_length=10) +mc_sample_path(P, ψ_0=(1.0, 0.0), ts_length=10) ``` It can be shown that for a long series drawn from `P`, the fraction of the @@ -448,7 +445,7 @@ $X_0$ is drawn. The following code illustrates this ```{code-cell} ipython3 -X = mc_sample_path(P, ψ_0=[0.1, 0.9], ts_length=1_000_000) +X = mc_sample_path(P, ψ_0=(0.1, 0.9), ts_length=1_000_000) np.mean(X == 0) ``` @@ -488,11 +485,11 @@ The following code illustrates ```{code-cell} ipython3 mc = qe.MarkovChain(P, state_values=('unemployed', 'employed')) -mc.simulate(ts_length=4, init='employed') +mc.simulate(ts_length=4, init='employed') # Start at employed initial state ``` ```{code-cell} ipython3 -mc.simulate(ts_length=4, init='unemployed') +mc.simulate(ts_length=4, init='unemployed') # Start at unemployed initial state ``` ```{code-cell} ipython3 @@ -570,7 +567,7 @@ This is very important, so let's repeat it X_0 \sim \psi_0 \quad \implies \quad X_m \sim \psi_0 P^m ``` -The general rule is that post-multiplying a distribution by $P^m$ shifts it forward $m$ units of time. +The general rule is that postmultiplying a distribution by $P^m$ shifts it forward $m$ units of time. Hence the following is also valid. @@ -625,12 +622,12 @@ $$ (mc_eg1-1)= -### Example 2: Cross-sectional distributions +### Example 2: cross-sectional distributions The distributions we have been studying can be viewed either 1. as probabilities or -1. as cross-sectional frequencies that the Law of Large Numbers leads us to anticipate for large samples. +1. as cross-sectional frequencies that the law of large numbers leads us to anticipate for large samples. To illustrate, recall our model of employment/unemployment dynamics for a given worker {ref}`discussed above `. @@ -641,9 +638,9 @@ workers' processes. Let $\psi_t$ be the current *cross-sectional* distribution over $\{ 0, 1 \}$. -The cross-sectional distribution records fractions of workers employed and unemployed at a given moment t. +The cross-sectional distribution records fractions of workers employed and unemployed at a given moment $t$. -* For example, $\psi_t(0)$ is the unemployment rate. +* For example, $\psi_t(0)$ is the unemployment rate at time $t$. What will the cross-sectional distribution be in 10 periods hence? @@ -651,11 +648,11 @@ The answer is $\psi_t P^{10}$, where $P$ is the stochastic matrix in {eq}`p_unempemp`. This is because each worker's state evolves according to $P$, so -$\psi_t P^{10}$ is a marginal distribution for a single randomly selected +$\psi_t P^{10}$ is a [marginal distribution](https://en.wikipedia.org/wiki/Marginal_distribution) for a single randomly selected worker. -But when the sample is large, outcomes and probabilities are roughly equal (by an application of the Law -of Large Numbers). +But when the sample is large, outcomes and probabilities are roughly equal (by an application of the law +of large numbers). So for a very large (tending to infinite) population, $\psi_t P^{10}$ also represents fractions of workers in @@ -688,11 +685,11 @@ Such distributions are called **stationary** or **invariant**. (mc_stat_dd)= Formally, a distribution $\psi^*$ on $S$ is called **stationary** for $P$ if $\psi^* P = \psi^* $. -Notice that, post-multiplying by $P$, we have $\psi^* P^2 = \psi^* P = \psi^*$. +Notice that, postmultiplying by $P$, we have $\psi^* P^2 = \psi^* P = \psi^*$. -Continuing in the same way leads to $\psi^* = \psi^* P^t$ for all $t$. +Continuing in the same way leads to $\psi^* = \psi^* P^t$ for all $t \ge 0$. -This tells us an important fact: If the distribution of $\psi_0$ is a stationary distribution, then $\psi_t$ will have this same distribution for all $t$. +This tells us an important fact: If the distribution of $\psi_0$ is a stationary distribution, then $\psi_t$ will have this same distribution for all $t \ge 0$. The following theorem is proved in Chapter 4 of {cite}`sargent2023economic` and numerous other sources. @@ -767,7 +764,7 @@ For example, we have the following result (strict_stationary)= ```{prf:theorem} -Theorem: If there exists an integer $m$ such that all entries of $P^m$ are +If there exists an integer $m$ such that all entries of $P^m$ are strictly positive, with unique stationary distribution $\psi^*$, then $$ @@ -801,11 +798,10 @@ First, we write a function to iterate the sequence of distributions for `ts_leng def iterate_ψ(ψ_0, P, ts_length): n = len(P) ψ_t = np.empty((ts_length, n)) - ψ = ψ_0 - for t in range(ts_length): - ψ_t[t] = ψ - ψ = ψ @ P - return np.array(ψ_t) + ψ_t[0 ]= ψ_0 + for t in range(1, ts_length): + ψ_t[t] = ψ_t[t-1] @ P + return ψ_t ``` Now we plot the sequence @@ -814,12 +810,7 @@ Now we plot the sequence ψ_0 = (0.0, 0.2, 0.8) # Initial condition fig = plt.figure() -ax = fig.add_subplot(111, projection='3d') - -ax.set(xlim=(0, 1), ylim=(0, 1), zlim=(0, 1), - xticks=(0.25, 0.5, 0.75), - yticks=(0.25, 0.5, 0.75), - zticks=(0.25, 0.5, 0.75)) +ax = fig.add_subplot(projection='3d') ψ_t = iterate_ψ(ψ_0, P, 20) @@ -852,13 +843,9 @@ First, we write a function to draw initial distributions $\psi_0$ of size `num_d ```{code-cell} ipython3 def generate_initial_values(num_distributions): n = len(P) - ψ_0s = np.empty((num_distributions, n)) - - for i in range(num_distributions): - draws = np.random.randint(1, 10_000_000, size=n) - - # Scale them so that they add up into 1 - ψ_0s[i,:] = np.array(draws/sum(draws)) + + draws = np.random.randint(1, 10_000_000, size=(num_distributions,n)) + ψ_0s = draws/draws.sum(axis=1)[:, None] return ψ_0s ``` @@ -917,7 +904,7 @@ The convergence to $\psi^*$ holds for different initial distributions. -#### Example: Failure of convergence +#### Example: failure of convergence In the case of a periodic chain, with @@ -1077,7 +1064,7 @@ Solution 1: ``` -Since the matrix is everywhere positive, there is a unique stationary distribution. +Since the matrix is everywhere positive, there is a unique stationary distribution $\psi^*$ such that $\psi_t\to \psi^*$ as $t\to \infty$. Solution 2: