From 8fa5945122ba2b3428446393df5fc41af903f33d Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 23 Feb 2024 10:13:46 +1100 Subject: [PATCH 01/40] [inquality] Review of lecture and incorporate updates --- lectures/inequality.md | 269 +++++++++++++++++++++++++---------------- 1 file changed, 166 insertions(+), 103 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index ee006a15..9e5e3b29 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -4,7 +4,7 @@ jupytext: extension: .md format_name: myst format_version: 0.13 - jupytext_version: 1.14.5 + jupytext_version: 1.15.1 kernelspec: display_name: Python 3 (ipykernel) language: python @@ -13,7 +13,6 @@ kernelspec: # Income and Wealth Inequality - ## Overview In this section we @@ -46,7 +45,6 @@ Many recent political debates revolve around inequality. Many economic policies, from taxation to the welfare state, are aimed at addressing inequality. - ### Measurement One problem with these debates is that inequality is often poorly defined. @@ -63,23 +61,14 @@ In this lecture we discuss standard measures of inequality used in economic rese For each of these measures, we will look at both simulated and real data. -We will install the following libraries. - -```{code-cell} ipython3 -:tags: [hide-output] - -!pip install --upgrade quantecon interpolation -``` - -And we use the following imports. +We will use the following imports. ```{code-cell} ipython3 import pandas as pd import numpy as np import matplotlib.pyplot as plt -import quantecon as qe import random as rd -from interpolation import interp +import quantecon as qe ``` ## The Lorenz curve @@ -104,16 +93,18 @@ The curve $L$ is just a function $y = L(x)$ that we can plot and interpret. To create it we first generate data points $(x_i, y_i)$ according to -\begin{equation*} - x_i = \frac{i}{n}, - \qquad - y_i = \frac{\sum_{j \leq i} w_j}{\sum_{j \leq n} w_j}, - \qquad i = 1, \ldots, n -\end{equation*} +$$ +x_i = \frac{i}{n}, +\qquad +y_i = \frac{\sum_{j \leq i} w_j}{\sum_{j \leq n} w_j}, +\qquad i = 1, \ldots, n +$$ Now the Lorenz curve $L$ is formed from these data points using interpolation. -(If we use a line plot in Matplotlib, the interpolation will be done for us.) +```{tip} +If we use a line plot in `matplotlib`, the interpolation will be done for us. +``` The meaning of the statement $y = L(x)$ is that the lowest $(100 \times x)$\% of people have $(100 \times y)$\% of all wealth. @@ -124,16 +115,71 @@ The meaning of the statement $y = L(x)$ is that the lowest $(100 In the discussion above we focused on wealth but the same ideas apply to income, consumption, etc. -+++ ### Lorenz curves of simulated data Let's look at some examples and try to build understanding. +First let us construct a `lorenz_curve` function that we can +use in our simulations below. + +It is useful to construct a function that translates an array of +income or wealth data into the cumulative share +of people and the cumulative share of income (or wealth). + +```{code-cell} ipython3 +:tags: [hide-input] + +def lorenz_curve(y): + """ + Calculates the Lorenz Curve, a graphical representation of + the distribution of income or wealth. + + It returns the cumulative share of people (x-axis) and + the cumulative share of income earned. + + Parameters + ---------- + y : array_like(float or int, ndim=1) + Array of income/wealth for each individual. + Unordered or ordered is fine. + + Returns + ------- + cum_people : array_like(float, ndim=1) + Cumulative share of people for each person index (i/n) + cum_income : array_like(float, ndim=1) + Cumulative share of income for each person index + + + References + ---------- + .. [1] https://en.wikipedia.org/wiki/Lorenz_curve + + Examples + -------- + >>> a_val, n = 3, 10_000 + >>> y = np.random.pareto(a_val, size=n) + >>> f_vals, l_vals = lorenz(y) + + """ + + n = len(y) + y = np.sort(y) + s = np.zeros(n + 1) + s[1:] = np.cumsum(y) + cum_people = np.zeros(n + 1) + cum_income = np.zeros(n + 1) + for i in range(1, n + 1): + cum_people[i] = i / n + cum_income[i] = s[i] / s[n] + return cum_people, cum_income +``` + In the next figure, we generate $n=2000$ draws from a lognormal distribution and treat these draws as our population. -The straight line ($x=L(x)$ for all $x$) corresponds to perfect equality. +The straight 45-degree line ($x=L(x)$ for all $x$) corresponds to perfect equality. The lognormal draws produce a less equal distribution. @@ -145,7 +191,7 @@ households own just over 40\% of total wealth. --- mystnb: figure: - caption: "Lorenz curve of simulated data" + caption: Lorenz curve of simulated data name: lorenz_simulated --- n = 2000 @@ -153,41 +199,41 @@ sample = np.exp(np.random.randn(n)) fig, ax = plt.subplots() -f_vals, l_vals = qe.lorenz_curve(sample) +f_vals, l_vals = lorenz_curve(sample) ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) -ax.legend(fontsize=12) +ax.legend() ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') -ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) +ax.set_xlabel("Cumulative share of households (%)") +ax.set_ylim((0, 1)) +ax.set_ylabel("Cumulative share of income (%)") plt.show() ``` ### Lorenz curves for US data -Next let's look at the real data, focusing on income and wealth in the US in -2016. +Next let's look at the real data, focusing on income and wealth in the US in 2016. -The following code block imports a subset of the dataset ``SCF_plus``, +The following code block imports a subset of the dataset `SCF_plus`, which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF). ```{code-cell} ipython3 url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv' df = pd.read_csv(url) -df = df.dropna() -df_income_wealth = df +df_income_wealth = df.dropna() ``` ```{code-cell} ipython3 -df_income_wealth.head() +df_income_wealth.head(n=5) ``` -The following code block uses data stored in dataframe ``df_income_wealth`` to generate the Lorenz curves. +The following code block uses data stored in dataframe `df_income_wealth` to generate the Lorenz curves. (The code is somewhat complex because we need to adjust the data according to population weights supplied by the SCF.) @@ -222,7 +268,7 @@ for var in varlist: rd.shuffle(y) # calculate and store Lorenz curve data - f_val, l_val = qe.lorenz_curve(y) + f_val, l_val = lorenz_curve(y) f_vals.append(f_val) l_vals.append(l_val) @@ -240,7 +286,7 @@ US in 2016. --- mystnb: figure: - caption: "2016 US Lorenz curves" + caption: 2016 US Lorenz curves name: lorenz_us image: alt: lorenz_us @@ -252,7 +298,7 @@ ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income') ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income') ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality') -ax.legend(fontsize=12) +ax.legend() plt.show() ``` @@ -263,12 +309,9 @@ Total income is the sum of households' all income sources, including labor incom One key finding from this figure is that wealth inequality is significantly more extreme than income inequality. -+++ - ## The Gini coefficient -The Lorenz curve is a useful visual representation of inequality in a -distribution. +The Lorenz curve is a useful visual representation of inequality in a distribution. Another popular measure of income and wealth inequality is the Gini coefficient. @@ -280,19 +323,17 @@ Lorenz curve. ### Definition - As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. The Gini coefficient is defined for the sample above as -\begin{equation} - \label{eq:gini} - G := - \frac - {\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} - {2n\sum_{i=1}^n w_i}. -\end{equation} +$$ +G := +\frac + {\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} + {2n\sum_{i=1}^n w_i}. +$$ (eq:gini) The Gini coefficient is closely related to the Lorenz curve. @@ -306,18 +347,18 @@ The idea is that $G=0$ indicates complete equality, while $G=1$ indicates comple --- mystnb: figure: - caption: "Shaded Lorenz curve of simulated data" + caption: Shaded Lorenz curve of simulated data name: lorenz_gini image: alt: lorenz_gini --- fig, ax = plt.subplots() -f_vals, l_vals = qe.lorenz_curve(sample) +f_vals, l_vals = lorenz_curve(sample) ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) -ax.legend(fontsize=12) +ax.legend() ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') @@ -327,7 +368,7 @@ ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) -ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area', fontsize=12) +ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area') plt.show() ``` @@ -336,6 +377,37 @@ plt.show() Let's examine the Gini coefficient in some simulations. +```{code-cell} ipython3 +:tags: [hide-input] + +def gini_coefficient(y): + r""" + Implements the Gini inequality index + + Parameters + ---------- + y : array_like(float) + Array of income/wealth for each individual. + Ordered or unordered is fine + + Returns + ------- + Gini index: float + The gini index describing the inequality of the array of income/wealth + + References + ---------- + + https://en.wikipedia.org/wiki/Gini_coefficient + """ + n = len(y) + i_sum = np.zeros(n) + for i in range(n): + for j in range(n): + i_sum[i] += abs(y[i] - y[j]) + return np.sum(i_sum) / (2 * n * np.sum(y)) +``` + The following code computes the Gini coefficients for five different populations. @@ -349,8 +421,10 @@ In each case we set $\mu = - \sigma^2 / 2$. This implies that the mean of the distribution does not change with $\sigma$. -(You can check this by looking up the expression for the mean of a lognormal -distribution.) +```{note} +You can check this by looking up the expression for the mean of a lognormal +distribution. +``` ```{code-cell} ipython3 k = 5 @@ -371,10 +445,10 @@ def plot_inequality_measures(x, y, legend, xlabel, ylabel): fig, ax = plt.subplots() ax.plot(x, y, marker='o', label=legend) - ax.set_xlabel(xlabel, fontsize=12) - ax.set_ylabel(ylabel, fontsize=12) + ax.set_xlabel(xlabel) + ax.set_ylabel(ylabel) - ax.legend(fontsize=12) + ax.legend() plt.show() ``` @@ -382,7 +456,7 @@ def plot_inequality_measures(x, y, legend, xlabel, ylabel): --- mystnb: figure: - caption: "Gini coefficients of simulated data" + caption: Gini coefficients of simulated data name: gini_simulated image: alt: gini_simulated @@ -397,13 +471,11 @@ plot_inequality_measures(σ_vals, The plots show that inequality rises with $\sigma$, according to the Gini coefficient. -+++ - ### Gini coefficient dynamics for US data Now let's look at Gini coefficients for US data derived from the SCF. -The following code creates a list called ``Ginis``. +The following code creates a list called `Ginis`. It stores data of Gini coefficients generated from the dataframe ``df_income_wealth`` and method [gini_coefficient](https://quanteconpy.readthedocs.io/en/latest/tools/inequality.html#quantecon.inequality.gini_coefficient), from [QuantEcon](https://quantecon.org/quantecon-py/) library. @@ -455,7 +527,7 @@ ginis_li_new[5] = (ginis_li[4] + ginis_li[6]) / 2 --- mystnb: figure: - caption: "Gini coefficients of US net wealth" + caption: Gini coefficients of US net wealth name: gini_wealth_us image: alt: gini_wealth_us @@ -467,8 +539,8 @@ fig, ax = plt.subplots() ax.plot(years, ginis_nw, marker='o') -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) +ax.set_xlabel(xlabel) +ax.set_ylabel(ylabel) plt.show() ``` @@ -477,7 +549,7 @@ plt.show() --- mystnb: figure: - caption: "Gini coefficients of US income" + caption: Gini coefficients of US income name: gini_income_us image: alt: gini_income_us @@ -490,10 +562,10 @@ fig, ax = plt.subplots() ax.plot(years, ginis_li_new, marker='o', label="labor income") ax.plot(years, ginis_ti, marker='o', label="total income") -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) +ax.set_xlabel(xlabel) +ax.set_ylabel(ylabel) -ax.legend(fontsize=12) +ax.legend() plt.show() ``` @@ -523,16 +595,14 @@ share is defined as $$ T(p) = 1 - L (1-p) \approx \frac{\sum_{j\geq i} w_j}{ \sum_{j \leq n} w_j}, \quad i = \lfloor n (1-p)\rfloor -$$(topshares) +$$ (topshares) Here $\lfloor \cdot \rfloor$ is the floor function, which rounds any number down to the integer less than or equal to that number. -+++ - -The following code uses the data from dataframe ``df_income_wealth`` to generate another dataframe ``df_topshares``. +The following code uses the data from dataframe `df_income_wealth` to generate another dataframe `df_topshares`. -``df_topshares`` stores the top 10 percent shares for the total income, the labor income and net wealth from 1950 to 2016 in US. +`df_topshares` stores the top 10 percent shares for the total income, the labor income and net wealth from 1950 to 2016 in US. ```{code-cell} ipython3 :tags: [hide-input] @@ -546,19 +616,16 @@ df4 = pd.merge(df3, df1, how="left", on=["year"]) df4['r_weights'] = df4['weights'] / df4['r_weights'] # create weighted nw, ti, li - df4['weighted_n_wealth'] = df4['n_wealth'] * df4['r_weights'] df4['weighted_t_income'] = df4['t_income'] * df4['r_weights'] df4['weighted_l_income'] = df4['l_income'] * df4['r_weights'] # extract two top 10% groups by net wealth and total income. - df6 = df4[df4['nw_groups'] == 'Top 10%'] df7 = df4[df4['ti_groups'] == 'Top 10%'] # calculate the sum of weighted top 10% by net wealth, # total income and labor income. - df5 = df4.groupby('year').sum(numeric_only=True).reset_index() df8 = df6.groupby('year').sum(numeric_only=True).reset_index() df9 = df7.groupby('year').sum(numeric_only=True).reset_index() @@ -568,7 +635,6 @@ df5['weighted_t_income_top10'] = df9['weighted_t_income'] df5['weighted_l_income_top10'] = df9['weighted_l_income'] # calculate the top 10% shares of the three variables. - df5['topshare_n_wealth'] = df5['weighted_n_wealth_top10'] / \ df5['weighted_n_wealth'] df5['topshare_t_income'] = df5['weighted_t_income_top10'] / \ @@ -587,7 +653,7 @@ Then let's plot the top shares. --- mystnb: figure: - caption: "US top shares" + caption: US top shares name: top_shares_us image: alt: top_shares_us @@ -604,17 +670,14 @@ ax.plot(years, df_topshares["topshare_n_wealth"], ax.plot(years, df_topshares["topshare_t_income"], marker='o', label="total income") -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) +ax.set_xlabel(xlabel) +ax.set_ylabel(ylabel) -ax.legend(fontsize=12) -plt.show() +ax.legend() ``` ## Exercises -+++ - ```{exercise} :label: inequality_ex1 @@ -635,8 +698,6 @@ Confirm that higher variance generates more dispersion in the sample, and hence greater inequality. ``` -+++ - ```{solution-start} inequality_ex1 :class: dropdown ``` @@ -665,10 +726,10 @@ l_vals = [] for σ in σ_vals: μ = -σ ** 2 / 2 y = np.exp(μ + σ * np.random.randn(n)) - f_val, l_val = qe._inequality.lorenz_curve(y) + f_val, l_val = lorenz_curve(y) f_vals.append(f_val) l_vals.append(l_val) - ginis.append(qe._inequality.gini_coefficient(y)) + ginis.append(qe.gini_coefficient(y)) topshares.append(calculate_top_share(y)) ``` @@ -676,7 +737,7 @@ for σ in σ_vals: --- mystnb: figure: - caption: "Top shares of simulated data" + caption: Top shares of simulated data name: top_shares_simulated image: alt: top_shares_simulated @@ -692,7 +753,7 @@ plot_inequality_measures(σ_vals, --- mystnb: figure: - caption: "Gini coefficients of simulated data" + caption: Gini coefficients of simulated data name: gini_coef_simulated image: alt: gini_coef_simulated @@ -708,7 +769,7 @@ plot_inequality_measures(σ_vals, --- mystnb: figure: - caption: "Lorenz curves for simulated data" + caption: Lorenz curves for simulated data name: lorenz_curve_simulated image: alt: lorenz_curve_simulated @@ -736,12 +797,19 @@ Plot the top shares generated from Lorenz curve and the top shares approximated ``` -+++ - ```{solution-start} inequality_ex2 :class: dropdown ``` +We will use the `interpolation` package in this solution. + +```{code-cell} ipython3 +:tags: [hide-output] + +!pip install --upgrade interpolation +from interpolation import interp +``` + Here is one solution: ```{code-cell} ipython3 @@ -760,25 +828,20 @@ for f_val, l_val in zip(f_vals_nw, l_vals_nw): --- mystnb: figure: - caption: "US top shares: approximation vs Lorenz" + caption: 'US top shares: approximation vs Lorenz' name: top_shares_us_al image: alt: top_shares_us_al --- -xlabel = "year" -ylabel = "top $10\%$ share" - fig, ax = plt.subplots() ax.plot(years, df_topshares["topshare_n_wealth"], marker='o',\ label="net wealth-approx") ax.plot(years, top_shares_nw, marker='o', label="net wealth-lorenz") -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) - -ax.legend(fontsize=12) -plt.show() +ax.set_xlabel("year") +ax.set_ylabel("top $10\%$ share") +ax.legend() ``` ```{solution-end} From 1444052d124d76d2e40ee1205f520bcc050cd2bf Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 23 Feb 2024 10:40:38 +1100 Subject: [PATCH 02/40] update plots --- lectures/inequality.md | 89 +++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 54 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 9e5e3b29..a40b1f95 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -441,15 +441,12 @@ for σ in σ_vals: ```{code-cell} ipython3 def plot_inequality_measures(x, y, legend, xlabel, ylabel): - fig, ax = plt.subplots() ax.plot(x, y, marker='o', label=legend) - ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) - ax.legend() - plt.show() + return fig, ax ``` ```{code-cell} ipython3 @@ -461,11 +458,12 @@ mystnb: image: alt: gini_simulated --- -plot_inequality_measures(σ_vals, - ginis, - 'simulated', - '$\sigma$', - 'gini coefficients') +fix, ax = plot_inequality_measures(σ_vals, + ginis, + 'simulated', + '$\sigma$', + 'gini coefficients') +plt.show() ``` The plots show that inequality rises with $\sigma$, according to the Gini @@ -475,9 +473,9 @@ coefficient. Now let's look at Gini coefficients for US data derived from the SCF. -The following code creates a list called `Ginis`. +The following code creates a list called `ginis`. - It stores data of Gini coefficients generated from the dataframe ``df_income_wealth`` and method [gini_coefficient](https://quanteconpy.readthedocs.io/en/latest/tools/inequality.html#quantecon.inequality.gini_coefficient), from [QuantEcon](https://quantecon.org/quantecon-py/) library. +It stores data of Gini coefficients generated from the dataframe `df_income_wealth` and method [gini_coefficient](https://quanteconpy.readthedocs.io/en/latest/tools/inequality.html#quantecon.inequality.gini_coefficient), from [QuantEcon](https://quantecon.org/quantecon-py/) library. ```{code-cell} ipython3 :tags: [hide-input] @@ -489,13 +487,11 @@ varlist = ['n_wealth', # net wealth df = df_income_wealth # create lists to store Gini for each inequality measure - -Ginis = [] +results = {} for var in varlist: # create lists to store Gini - ginis = [] - + gini_yr = [] for year in years: # repeat the observations according to their weights counts = list(round(df[df['year'] == year]['weights'] )) @@ -503,16 +499,18 @@ for var in varlist: y = np.asarray(y) rd.shuffle(y) # shuffle the sequence - + # calculate and store Gini gini = qe.gini_coefficient(y) - ginis.append(gini) + gini_yr.append(gini) - Ginis.append(ginis) + results[var] = gini_yr ``` ```{code-cell} ipython3 -ginis_nw, ginis_ti, ginis_li = Ginis +ginis_nw = results['n_wealth'] # net wealth +ginis_ti = results['t_income'] # total income +ginis_li = results['l_income'] # labour income ``` Let's plot the Gini coefficients for net wealth, labor income and total income. @@ -532,16 +530,10 @@ mystnb: image: alt: gini_wealth_us --- -xlabel = "year" -ylabel = "gini coefficient" - fig, ax = plt.subplots() - ax.plot(years, ginis_nw, marker='o') - -ax.set_xlabel(xlabel) -ax.set_ylabel(ylabel) - +ax.set_xlabel("year") +ax.set_ylabel("gini coefficient") plt.show() ``` @@ -554,17 +546,11 @@ mystnb: image: alt: gini_income_us --- -xlabel = "year" -ylabel = "gini coefficient" - fig, ax = plt.subplots() - ax.plot(years, ginis_li_new, marker='o', label="labor income") ax.plot(years, ginis_ti, marker='o', label="total income") - -ax.set_xlabel(xlabel) -ax.set_ylabel(ylabel) - +ax.set_xlabel("year") +ax.set_ylabel("gini coefficient") ax.legend() plt.show() ``` @@ -574,7 +560,6 @@ substantially since 1980. The wealth time series exhibits a strong U-shape. - ## Top shares Another popular measure of inequality is the top shares. @@ -658,21 +643,15 @@ mystnb: image: alt: top_shares_us --- -xlabel = "year" -ylabel = "top $10\%$ share" - fig, ax = plt.subplots() - ax.plot(years, df_topshares["topshare_l_income"], marker='o', label="labor income") ax.plot(years, df_topshares["topshare_n_wealth"], marker='o', label="net wealth") ax.plot(years, df_topshares["topshare_t_income"], marker='o', label="total income") - -ax.set_xlabel(xlabel) -ax.set_ylabel(ylabel) - +ax.set_xlabel("year") +ax.set_ylabel("top $10\%$ share") ax.legend() ``` @@ -742,11 +721,12 @@ mystnb: image: alt: top_shares_simulated --- -plot_inequality_measures(σ_vals, - topshares, - "simulated data", - "$\sigma$", - "top $10\%$ share") +fig, ax = plot_inequality_measures(σ_vals, + topshares, + "simulated data", + "$\sigma$", + "top $10\%$ share") +plt.show() ``` ```{code-cell} ipython3 @@ -758,11 +738,12 @@ mystnb: image: alt: gini_coef_simulated --- -plot_inequality_measures(σ_vals, - ginis, - "simulated data", - "$\sigma$", - "gini coefficient") +fig, ax = plot_inequality_measures(σ_vals, + ginis, + "simulated data", + "$\sigma$", + "gini coefficient") +plt.show() ``` ```{code-cell} ipython3 From bd4baca9c3bcff073ac50623e65f62ec6213df33 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 23 Feb 2024 11:34:12 +1100 Subject: [PATCH 03/40] more plot updates --- lectures/inequality.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index a40b1f95..31954b14 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -203,16 +203,13 @@ f_vals, l_vals = lorenz_curve(sample) ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) -ax.legend() - ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') - ax.set_xlim((0, 1)) ax.set_xlabel("Cumulative share of households (%)") ax.set_ylim((0, 1)) ax.set_ylabel("Cumulative share of income (%)") - +ax.legend() plt.show() ``` @@ -653,6 +650,7 @@ ax.plot(years, df_topshares["topshare_t_income"], ax.set_xlabel("year") ax.set_ylabel("top $10\%$ share") ax.legend() +plt.show() ``` ## Exercises @@ -823,6 +821,7 @@ ax.plot(years, top_shares_nw, marker='o', label="net wealth-lorenz") ax.set_xlabel("year") ax.set_ylabel("top $10\%$ share") ax.legend() +plt.show() ``` ```{solution-end} From c2514630993b95489a17e5eea00294fde7bdc6f4 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 23 Feb 2024 11:44:36 +1100 Subject: [PATCH 04/40] add quantecon back in for now re: gini_coefficient --- lectures/inequality.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 31954b14..50ed5a38 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -61,7 +61,13 @@ In this lecture we discuss standard measures of inequality used in economic rese For each of these measures, we will look at both simulated and real data. -We will use the following imports. +We need to install the `quantecon` package. + +```{code-cell} ipython3 +!pip install quantecon +``` + +We will also use the following imports. ```{code-cell} ipython3 import pandas as pd From 2c96d342683a399c5ff58d430d5f26cb05b39912 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 1 Mar 2024 11:59:48 +1100 Subject: [PATCH 05/40] @mmcky edits --- .../usa-gini-nwealth-tincome-lincome.csv | 21 ++ lectures/inequality.md | 190 +++++++++++++++--- 2 files changed, 186 insertions(+), 25 deletions(-) create mode 100644 lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv new file mode 100644 index 00000000..b660e32b --- /dev/null +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -0,0 +1,21 @@ +year,n_wealth,t_income,l_income +1950,0.8257332034366347,0.44248654139458754,0.5342948198773423 +1953,0.8059487586599338,0.42645440609359486,0.5158978980963697 +1956,0.8121790488050631,0.4442694287339931,0.5349293526208138 +1959,0.7952068741637922,0.43749348077061523,0.5213985948309406 +1962,0.808694507657936,0.4435843103853638,0.5345127915054347 +1965,0.7904149225687935,0.43763715466663455,0.7487860020887751 +1968,0.7982885066993515,0.4208620794438898,0.5242396427381535 +1971,0.7911574835420261,0.42333442460902587,0.5576454812313479 +1977,0.7571418922185222,0.46187678800902643,0.5704448110072055 +1983,0.7494335400643035,0.43934561846447007,0.5662220844385907 +1989,0.7715705301674298,0.5115249581654171,0.6013995687471435 +1992,0.75081266140553,0.47406506720767927,0.5983592657979551 +1995,0.756949238811024,0.4896552355840093,0.5969779516716919 +1998,0.7603291991801191,0.49117441585168625,0.5774462841723321 +2001,0.7816118750507045,0.5239092994681127,0.6042739644967284 +2004,0.7700355469522353,0.48843503839032515,0.598143220179272 +2007,0.782141377648697,0.5197156312086194,0.6263452195753192 +2010,0.825082529519343,0.5195972120145608,0.6453653328291911 +2013,0.8227698931835298,0.5314001749843348,0.649868291777264 +2016,0.8342975903562216,0.5541400068900835,0.6706846793375284 diff --git a/lectures/inequality.md b/lectures/inequality.md index 50ed5a38..31e36291 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -61,11 +61,7 @@ In this lecture we discuss standard measures of inequality used in economic rese For each of these measures, we will look at both simulated and real data. -We need to install the `quantecon` package. - -```{code-cell} ipython3 -!pip install quantecon -``` ++++ We will also use the following imports. @@ -74,7 +70,7 @@ import pandas as pd import numpy as np import matplotlib.pyplot as plt import random as rd -import quantecon as qe +import wbgapi as wb ``` ## The Lorenz curve @@ -92,7 +88,7 @@ We suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to l To aid our interpretation, suppose that we are measuring wealth -* $w_1$ is the wealth of the poorest member of the population and +* $w_1$ is the wealth of the poorest member of the population, and * $w_n$ is the wealth of the richest member of the population. The curve $L$ is just a function $y = L(x)$ that we can plot and interpret. @@ -187,7 +183,7 @@ distribution and treat these draws as our population. The straight 45-degree line ($x=L(x)$ for all $x$) corresponds to perfect equality. -The lognormal draws produce a less equal distribution. +The log-normal draws produce a less equal distribution. For example, if we imagine these draws as being observations of wealth across a sample of households, then the dashed lines show that the bottom 80\% of @@ -223,6 +219,8 @@ plt.show() Next let's look at the real data, focusing on income and wealth in the US in 2016. +(data:survey-consumer-finance)= + The following code block imports a subset of the dataset `SCF_plus`, which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF). @@ -333,9 +331,8 @@ The Gini coefficient is defined for the sample above as $$ G := -\frac - {\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} - {2n\sum_{i=1}^n w_i}. +\frac{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} + {2n\sum_{i=1}^n w_i}. $$ (eq:gini) @@ -439,7 +436,7 @@ ginis = [] for σ in σ_vals: μ = -σ**2 / 2 y = np.exp(μ + σ * np.random.randn(n)) - ginis.append(qe.gini_coefficient(y)) + ginis.append(gini_coefficient(y)) ``` ```{code-cell} ipython3 @@ -474,14 +471,131 @@ coefficient. ### Gini coefficient dynamics for US data -Now let's look at Gini coefficients for US data derived from the SCF. +Now let's look at Gini coefficients for US data. -The following code creates a list called `ginis`. +In this section we will get Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). -It stores data of Gini coefficients generated from the dataframe `df_income_wealth` and method [gini_coefficient](https://quanteconpy.readthedocs.io/en/latest/tools/inequality.html#quantecon.inequality.gini_coefficient), from [QuantEcon](https://quantecon.org/quantecon-py/) library. +Let's search the world bank data for gini to find the Series ID. ```{code-cell} ipython3 -:tags: [hide-input] +wb.search("gini") +``` + +We now know the series ID is `SI.POV.GINI`. + +```{tip} +Another, and often useful way to find series ID, is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data. +``` + +Let us fetch the data for the USA. + +```{code-cell} ipython3 +data = wb.data.DataFrame("SI.POV.GINI", "USA") +``` + +```{code-cell} ipython3 +data +``` + +```{note} +This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting +``` + +```{code-cell} ipython3 +data = data.T +data_usa = data['USA'] +``` + +```{code-cell} ipython3 +fig, ax = plt.subplots() +ax = data_usa.plot(ax=ax) +ax.set_ylim(0,data_usa.max()+5) +plt.show() +``` + +The gini coefficient does not have significant variation in the full range from 0 to 100. + +In fact we can take a quick look across all countries and all years in the world bank dataset to observe this. + +```{code-cell} ipython3 +gini_all = wb.data.DataFrame("SI.POV.GINI") +``` + +```{code-cell} ipython3 +# Create a long series with a multi-index of the data to get global min and max values +gini_all = gini_all.unstack(level='economy').dropna() +``` + +```{code-cell} ipython3 +gini_all.plot(kind="hist", title="Gini coefficient"); +``` + +Therefore we can see that across 50 years of data and all countries the measure only varies between 20 and 65. + +This variation would be even smaller for the subset of wealthy countries, so let us zoom in a little on the US data and add some trendlines. + +```{code-cell} ipython3 +data_usa.index = data_usa.index.map(lambda x: int(x.replace('YR',''))) +``` + +```{code-cell} ipython3 +data_usa +``` + +The data suggests there is a change in trend around the year 1981 + +```{code-cell} ipython3 +pre_1981 = data_usa[data_usa.index <= 1981] +post_1981 = data_usa[data_usa.index > 1981] +``` + +```{code-cell} ipython3 +# Pre 1981 Data Trend +x1 = pre_1981.dropna().index.values +y1 = pre_1981.dropna().values +a1, b1 = np.polyfit(x1, y1, 1) + +# Post 1981 Data Trend +x2 = post_1981.dropna().index.values +y2 = post_1981.dropna().values +a2, b2 = np.polyfit(x2, y2, 1) +``` + +```{code-cell} ipython3 +x = data_usa.dropna().index.values +y = data_usa.dropna().values +plt.scatter(x,y) +plt.plot(x1, a1*x1+b1, 'r-') +plt.plot(x2, a2*x2+b2, 'y-') +plt.title("USA gini coefficient dynamics") +plt.legend(['Gini coefficient', 'Trend (before 1981)', 'Trend (after 1981)']) +plt.ylim(25,45) +plt.ylabel("Gini coefficient") +plt.xlabel("Year") +plt.show() +``` + +Looking at this graph you can see that inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time (growing inequality). + +```{admonition} TODO +:class: warning +Why did GINI fall in 2020? I would have thought it accelerate in the other direction or was there a lag in investment returns around COVID +``` + ++++ + +## Comparing income and wealth inequality (the US case) + ++++ + +We can use the data collected above {ref}`survey of consumer finances ` to look at the gini coefficient when using income when compared to wealth data. + +Let's compute the gin coefficient for net wealth, total income, and labour income. + +This section makes use of the following code to compute the data, however to speed up execution we have pre-compiled the results and will use that in the subsequent analysis. + +```{code-cell} ipython3 +import quantecon as qe varlist = ['n_wealth', # net wealth 't_income', # total income @@ -508,20 +622,36 @@ for var in varlist: gini_yr.append(gini) results[var] = gini_yr + +# Convert to DataFrame +results = pd.DataFrame(results, index=years) +results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year') +``` + +```{code-cell} ipython3 +ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year') ``` ```{code-cell} ipython3 -ginis_nw = results['n_wealth'] # net wealth -ginis_ti = results['t_income'] # total income -ginis_li = results['l_income'] # labour income +ginis ``` Let's plot the Gini coefficients for net wealth, labor income and total income. +Looking at each data series we see an outlier in gini coefficient computed for 1965. + +We will smooth our data and take an average of the data either side of it for the time being. + +```{admonition} TODO +Figure out why there is such a spike in the data for this year +``` + ```{code-cell} ipython3 -# use an average to replace an outlier in labor income gini -ginis_li_new = ginis_li -ginis_li_new[5] = (ginis_li[4] + ginis_li[6]) / 2 +ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 +``` + +```{code-cell} ipython3 +ginis["l_income"].plot() ``` ```{code-cell} ipython3 @@ -534,7 +664,7 @@ mystnb: alt: gini_wealth_us --- fig, ax = plt.subplots() -ax.plot(years, ginis_nw, marker='o') +ax.plot(years, ginis["n_wealth"], marker='o') ax.set_xlabel("year") ax.set_ylabel("gini coefficient") plt.show() @@ -550,8 +680,18 @@ mystnb: alt: gini_income_us --- fig, ax = plt.subplots() -ax.plot(years, ginis_li_new, marker='o', label="labor income") -ax.plot(years, ginis_ti, marker='o', label="total income") +ax.plot(years, ginis["l_income"], marker='o', label="labor income") +ax.plot(years, ginis["t_income"], marker='o', label="total income") +ax.set_xlabel("year") +ax.set_ylabel("gini coefficient") +ax.legend() +plt.show() +``` + +```{code-cell} ipython3 +fig, ax = plt.subplots() +ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth") +ax.plot(years, ginis["l_income"], marker='o', label="labour income") ax.set_xlabel("year") ax.set_ylabel("gini coefficient") ax.legend() From fca759f58b771b73587edf81f8ae7e8458e8483a Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 1 Mar 2024 12:07:45 +1100 Subject: [PATCH 06/40] review and minor edits --- lectures/inequality.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 31e36291..db4d5728 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -61,8 +61,6 @@ In this lecture we discuss standard measures of inequality used in economic rese For each of these measures, we will look at both simulated and real data. -+++ - We will also use the following imports. ```{code-cell} ipython3 @@ -582,12 +580,8 @@ Looking at this graph you can see that inequality was falling in the USA until 1 Why did GINI fall in 2020? I would have thought it accelerate in the other direction or was there a lag in investment returns around COVID ``` -+++ - ## Comparing income and wealth inequality (the US case) -+++ - We can use the data collected above {ref}`survey of consumer finances ` to look at the gini coefficient when using income when compared to wealth data. Let's compute the gin coefficient for net wealth, total income, and labour income. @@ -595,6 +589,8 @@ Let's compute the gin coefficient for net wealth, total income, and labour incom This section makes use of the following code to compute the data, however to speed up execution we have pre-compiled the results and will use that in the subsequent analysis. ```{code-cell} ipython3 +:tags: [skip-execution, hide-input, hide-output] + import quantecon as qe varlist = ['n_wealth', # net wealth @@ -852,7 +848,7 @@ for σ in σ_vals: f_val, l_val = lorenz_curve(y) f_vals.append(f_val) l_vals.append(l_val) - ginis.append(qe.gini_coefficient(y)) + ginis.append(gini_coefficient(y)) topshares.append(calculate_top_share(y)) ``` From 54d9de57f33d0dd2bbd8c106088a9ac1dbe2e1cd Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 1 Mar 2024 13:25:16 +1100 Subject: [PATCH 07/40] fix execution issue --- lectures/inequality.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/lectures/inequality.md b/lectures/inequality.md index db4d5728..aa4eb4d1 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -61,6 +61,12 @@ In this lecture we discuss standard measures of inequality used in economic rese For each of these measures, we will look at both simulated and real data. +We will need to install the following packages + +```{code-cell} ipython3 +!pip install wbgapi +``` + We will also use the following imports. ```{code-cell} ipython3 @@ -591,6 +597,7 @@ This section makes use of the following code to compute the data, however to spe ```{code-cell} ipython3 :tags: [skip-execution, hide-input, hide-output] +!pip install quantecon import quantecon as qe varlist = ['n_wealth', # net wealth From edb76c65d8313aa080583ed2aa43f7263977b947 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 1 Mar 2024 13:47:05 +1100 Subject: [PATCH 08/40] minor updates to code output --- lectures/inequality.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index aa4eb4d1..88f4b42b 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -64,6 +64,7 @@ For each of these measures, we will look at both simulated and real data. We will need to install the following packages ```{code-cell} ipython3 +:tags: [hide-output] !pip install wbgapi ``` @@ -582,7 +583,6 @@ plt.show() Looking at this graph you can see that inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time (growing inequality). ```{admonition} TODO -:class: warning Why did GINI fall in 2020? I would have thought it accelerate in the other direction or was there a lag in investment returns around COVID ``` @@ -633,10 +633,7 @@ results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lin ```{code-cell} ipython3 ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year') -``` - -```{code-cell} ipython3 -ginis +ginis.head(n=5) ``` Let's plot the Gini coefficients for net wealth, labor income and total income. From 1c587e3d1a4f25ac2f9afb33e5b69231055c5dd0 Mon Sep 17 00:00:00 2001 From: mmcky Date: Thu, 7 Mar 2024 14:12:58 +1100 Subject: [PATCH 09/40] updates --- .../usa-gini-nwealth-tincome-lincome.csv | 40 ++++++++-------- lectures/inequality.md | 48 +++++++++++-------- 2 files changed, 47 insertions(+), 41 deletions(-) diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv index b660e32b..72e27e92 100644 --- a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -1,21 +1,21 @@ year,n_wealth,t_income,l_income -1950,0.8257332034366347,0.44248654139458754,0.5342948198773423 -1953,0.8059487586599338,0.42645440609359486,0.5158978980963697 -1956,0.8121790488050631,0.4442694287339931,0.5349293526208138 -1959,0.7952068741637922,0.43749348077061523,0.5213985948309406 -1962,0.808694507657936,0.4435843103853638,0.5345127915054347 -1965,0.7904149225687935,0.43763715466663455,0.7487860020887751 -1968,0.7982885066993515,0.4208620794438898,0.5242396427381535 -1971,0.7911574835420261,0.42333442460902587,0.5576454812313479 -1977,0.7571418922185222,0.46187678800902643,0.5704448110072055 -1983,0.7494335400643035,0.43934561846447007,0.5662220844385907 -1989,0.7715705301674298,0.5115249581654171,0.6013995687471435 -1992,0.75081266140553,0.47406506720767927,0.5983592657979551 -1995,0.756949238811024,0.4896552355840093,0.5969779516716919 -1998,0.7603291991801191,0.49117441585168625,0.5774462841723321 -2001,0.7816118750507045,0.5239092994681127,0.6042739644967284 -2004,0.7700355469522353,0.48843503839032515,0.598143220179272 -2007,0.782141377648697,0.5197156312086194,0.6263452195753192 -2010,0.825082529519343,0.5195972120145608,0.6453653328291911 -2013,0.8227698931835298,0.5314001749843348,0.649868291777264 -2016,0.8342975903562216,0.5541400068900835,0.6706846793375284 +1950,0.8257332034366358,0.4424865413945875,0.5342948198773428 +1953,0.8059487586599331,0.42645440609359453,0.5158978980963707 +1956,0.8121790488050629,0.4442694287339922,0.5349293526208139 +1959,0.7952068741637917,0.43749348077061556,0.5213985948309421 +1962,0.8086945076579354,0.4435843103853641,0.5345127915054346 +1965,0.790414922568793,0.4376371546666345,0.7487860020887755 +1968,0.7982885066993517,0.42086207944388987,0.524239642738153 +1971,0.7911574835420266,0.4233344246090252,0.5576454812313467 +1977,0.7571418922185217,0.46187678800902604,0.5704448110072053 +1983,0.7494335400643017,0.4393456184644693,0.5662220844385908 +1989,0.7715705301674308,0.5115249581654219,0.6013995687471431 +1992,0.7508126614055308,0.47406506720767694,0.5983592657979548 +1995,0.756949238811026,0.48965523558400526,0.5969779516716914 +1998,0.7603291991801175,0.4911744158516885,0.5774462841723366 +2001,0.7816118750507022,0.523909299468113,0.6042739644967348 +2004,0.770035546952236,0.48843503839032615,0.5981432201792747 +2007,0.7821413776486992,0.5197156312086196,0.6263452195753294 +2010,0.8250825295193438,0.51959721201456,0.6453653328291932 +2013,0.8227698931835281,0.5314001749843356,0.6498682917772638 +2016,0.8342975903562243,0.5541400068900844,0.6706846793375284 diff --git a/lectures/inequality.md b/lectures/inequality.md index 88f4b42b..e206a3ca 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -65,6 +65,7 @@ We will need to install the following packages ```{code-cell} ipython3 :tags: [hide-output] + !pip install wbgapi ``` @@ -298,12 +299,10 @@ mystnb: alt: lorenz_us --- fig, ax = plt.subplots() - ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth') ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income') ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income') ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality') - ax.legend() plt.show() ``` @@ -346,6 +345,14 @@ The Gini coefficient is closely related to the Lorenz curve. In fact, it can be shown that its value is twice the area between the line of equality and the Lorenz curve (e.g., the shaded area in the following Figure below). +```{note} +Another way to think of the gini coefficient is the area between the 45-degree line of +perfect equality and the Lorenz curve minus the area below the Lorenz curve devided by +the total area below the 45-degree line. + +In other words, it is a measure of average deviation from the line of equality. +``` + The idea is that $G=0$ indicates complete equality, while $G=1$ indicates complete inequality. ```{code-cell} ipython3 @@ -478,7 +485,7 @@ coefficient. Now let's look at Gini coefficients for US data. -In this section we will get Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). +In this section we will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). Let's search the world bank data for gini to find the Series ID. @@ -502,7 +509,7 @@ data = wb.data.DataFrame("SI.POV.GINI", "USA") data ``` -```{note} +```{tip} This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting ``` @@ -518,9 +525,9 @@ ax.set_ylim(0,data_usa.max()+5) plt.show() ``` -The gini coefficient does not have significant variation in the full range from 0 to 100. +The gini coefficient is relatively slow moving and does not have significant variation in the full range from 0 to 100. -In fact we can take a quick look across all countries and all years in the world bank dataset to observe this. +Using `pandas` we can take a quick look across all countries and all years in the world bank dataset to understand how the Gini coefficient varies across countries and time. ```{code-cell} ipython3 gini_all = wb.data.DataFrame("SI.POV.GINI") @@ -535,7 +542,7 @@ gini_all = gini_all.unstack(level='economy').dropna() gini_all.plot(kind="hist", title="Gini coefficient"); ``` -Therefore we can see that across 50 years of data and all countries the measure only varies between 20 and 65. +Therefore we can see that across 50 years of data and all countries (including low and high income countries) the measure only varies between 20 and 65. This variation would be even smaller for the subset of wealthy countries, so let us zoom in a little on the US data and add some trendlines. @@ -582,18 +589,20 @@ plt.show() Looking at this graph you can see that inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time (growing inequality). -```{admonition} TODO -Why did GINI fall in 2020? I would have thought it accelerate in the other direction or was there a lag in investment returns around COVID -``` - ## Comparing income and wealth inequality (the US case) +The Gini coefficient can also be computed over different distributions such as *income* and *wealth*. + We can use the data collected above {ref}`survey of consumer finances ` to look at the gini coefficient when using income when compared to wealth data. -Let's compute the gin coefficient for net wealth, total income, and labour income. +Let's compute the gin coefficient for net wealth, total income, and labour income for the most recent year in our sample. This section makes use of the following code to compute the data, however to speed up execution we have pre-compiled the results and will use that in the subsequent analysis. +```{code-cell} ipython3 +df_income_wealth.year.describe() +``` + ```{code-cell} ipython3 :tags: [skip-execution, hide-input, hide-output] @@ -631,6 +640,10 @@ results = pd.DataFrame(results, index=years) results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year') ``` +While the data can be computed using the code above, we will import a pre-computed dataset from the lecture repository. + + + ```{code-cell} ipython3 ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year') ginis.head(n=5) @@ -926,20 +939,13 @@ Plot the top shares generated from Lorenz curve and the top shares approximated :class: dropdown ``` -We will use the `interpolation` package in this solution. - -```{code-cell} ipython3 -:tags: [hide-output] - -!pip install --upgrade interpolation -from interpolation import interp -``` ++++ Here is one solution: ```{code-cell} ipython3 def lorenz2top(f_val, l_val, p=0.1): - t = lambda x: interp(f_val, l_val, x) + t = lambda x: np.interp(x, f_val, l_val) return 1- t(1 - p) ``` From d16e570680cee1c9f3c6b5101fe32b992095729a Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 8 Mar 2024 14:42:58 +1100 Subject: [PATCH 10/40] Compare inquality across countries --- .../usa-gini-nwealth-tincome-lincome.csv | 40 +++++----- lectures/inequality.md | 80 ++++++++++++++++++- 2 files changed, 99 insertions(+), 21 deletions(-) diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv index 72e27e92..85f233ab 100644 --- a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -1,21 +1,21 @@ year,n_wealth,t_income,l_income -1950,0.8257332034366358,0.4424865413945875,0.5342948198773428 -1953,0.8059487586599331,0.42645440609359453,0.5158978980963707 -1956,0.8121790488050629,0.4442694287339922,0.5349293526208139 -1959,0.7952068741637917,0.43749348077061556,0.5213985948309421 -1962,0.8086945076579354,0.4435843103853641,0.5345127915054346 -1965,0.790414922568793,0.4376371546666345,0.7487860020887755 -1968,0.7982885066993517,0.42086207944388987,0.524239642738153 -1971,0.7911574835420266,0.4233344246090252,0.5576454812313467 -1977,0.7571418922185217,0.46187678800902604,0.5704448110072053 -1983,0.7494335400643017,0.4393456184644693,0.5662220844385908 -1989,0.7715705301674308,0.5115249581654219,0.6013995687471431 -1992,0.7508126614055308,0.47406506720767694,0.5983592657979548 -1995,0.756949238811026,0.48965523558400526,0.5969779516716914 -1998,0.7603291991801175,0.4911744158516885,0.5774462841723366 -2001,0.7816118750507022,0.523909299468113,0.6042739644967348 -2004,0.770035546952236,0.48843503839032615,0.5981432201792747 -2007,0.7821413776486992,0.5197156312086196,0.6263452195753294 -2010,0.8250825295193438,0.51959721201456,0.6453653328291932 -2013,0.8227698931835281,0.5314001749843356,0.6498682917772638 -2016,0.8342975903562243,0.5541400068900844,0.6706846793375284 +1950,0.8257332034366359,0.44248654139458704,0.5342948198773421 +1953,0.8059487586599332,0.42645440609359414,0.5158978980963693 +1956,0.8121790488050623,0.4442694287339929,0.5349293526208143 +1959,0.7952068741637921,0.4374934807706162,0.5213985948309414 +1962,0.8086945076579385,0.4435843103853643,0.5345127915054336 +1965,0.7904149225687938,0.4376371546666339,0.748786002088776 +1968,0.7982885066993525,0.4208620794438893,0.5242396427381537 +1971,0.7911574835420264,0.4233344246090261,0.5576454812313487 +1977,0.7571418922185211,0.46187678800902404,0.5704448110072055 +1983,0.7494335400643021,0.43934561846446935,0.5662220844385908 +1989,0.7715705301674326,0.5115249581654199,0.6013995687471441 +1992,0.75081266140553,0.47406506720767994,0.5983592657979562 +1995,0.7569492388110272,0.48965523558400526,0.596977951671689 +1998,0.7603291991801175,0.49117441585168564,0.5774462841723361 +2001,0.7816118750507013,0.5239092994681116,0.6042739644967291 +2004,0.7700355469522365,0.4884350383903243,0.5981432201792726 +2007,0.7821413776486991,0.5197156312086196,0.6263452195753233 +2010,0.8250825295193426,0.5195972120145639,0.6453653328291923 +2013,0.8227698931835287,0.5314001749843371,0.6498682917772659 +2016,0.8342975903562232,0.5541400068900836,0.6706846793375284 diff --git a/lectures/inequality.md b/lectures/inequality.md index e206a3ca..6aa3b830 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -589,7 +589,7 @@ plt.show() Looking at this graph you can see that inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time (growing inequality). -## Comparing income and wealth inequality (the US case) +### Comparing income and wealth inequality (the US case) The Gini coefficient can also be computed over different distributions such as *income* and *wealth*. @@ -716,6 +716,84 @@ substantially since 1980. The wealth time series exhibits a strong U-shape. ++++ + +### Cross-country comparisons of income inequality + +As we saw earlier in this lecture we used `wbgapi` to get gini data across many countries and saved it in a variable called `gini_all` + +In this section we will compare a few countries and the evolution in their respective gini coefficients + +```{code-cell} ipython3 +# Obtain data for all countries as a table +data = gini_all.unstack() +``` + +```{code-cell} ipython3 +data.columns +``` + +There are 167 countries represented in this dataset. + +Let us compare three western economies: USA, United Kingdom, and Norway + +```{code-cell} ipython3 +data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') +``` + +From this plot we can observe that the USA has a higher gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. + +Norway has the lowest gini coefficient over the three economies from the year 2003, and it is substantially lower than the USA suggesting the Lorenz curve is much closer to the 45-degree line of equality. + ++++ + +### (Optional) Gini Coefficient and GDP per capita (over time) + +```{code-cell} ipython3 +countries = ['USA', 'NOR', 'GBR'] +``` + +```{code-cell} ipython3 +gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries).T +``` + +Let's rearrange the data so that we can plot gdp per capita and the gini coefficient across years + +```{code-cell} ipython3 +pdata = pd.DataFrame(data[countries].unstack()) +pdata.index.names = ['country', 'year'] +pdata.columns = ['gini'] +``` + +```{code-cell} ipython3 +pdata +``` + +```{code-cell} ipython3 +pgdppc = pd.DataFrame(gdppc.unstack()) +pgdppc.index.names = ['country', 'year'] +pgdppc.columns = ['gdppc'] +``` + +```{code-cell} ipython3 +plot_data = pdata.merge(pgdppc, left_index=True, right_index=True) +``` + +```{code-cell} ipython3 +plot_data.reset_index(inplace=True) +``` + +```{code-cell} ipython3 +plot_data.year = plot_data.year.map(lambda x: int(x.replace('YR',''))) +``` + +```{code-cell} ipython3 +import plotly.express as px +fig = px.line(plot_data, x="gini", y="gdppc", color="country", text="year", height=800) +fig.update_traces(textposition="bottom right") +fig.show() +``` + ## Top shares Another popular measure of inequality is the top shares. From 9e5c1b7b74fc87beddc8f4b2b17cf3c1f253c1fb Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 8 Mar 2024 14:49:55 +1100 Subject: [PATCH 11/40] add plotly install for preview --- lectures/inequality.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lectures/inequality.md b/lectures/inequality.md index 6aa3b830..6d6d08df 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -787,6 +787,10 @@ plot_data.reset_index(inplace=True) plot_data.year = plot_data.year.map(lambda x: int(x.replace('YR',''))) ``` +```{code-cell} ipython3 +!pip install plotly +``` + ```{code-cell} ipython3 import plotly.express as px fig = px.line(plot_data, x="gini", y="gdppc", color="country", text="year", height=800) From 72b8bbcd341659d5f24b2ec49f306915bcc47f87 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 8 Mar 2024 15:00:07 +1100 Subject: [PATCH 12/40] supress mimetype warnings --- lectures/_config.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/lectures/_config.yml b/lectures/_config.yml index 348a1881..5da1c55b 100644 --- a/lectures/_config.yml +++ b/lectures/_config.yml @@ -43,6 +43,7 @@ sphinx: nb_render_image_options: width: 80% nb_code_prompt_show: "Show {type}" + suppress_warnings: [mystnb.unknown_mime_type] # ------------- html_favicon: _static/lectures-favicon.ico html_theme: quantecon_book_theme From 2ad29b2cdc6bc2ee0d3b06fd4eaa469617e7c949 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 8 Mar 2024 15:10:18 +1100 Subject: [PATCH 13/40] TMP: temporarily disable warnings for preview --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 5e9c5e60..306b454c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -64,7 +64,7 @@ jobs: shell: bash -l {0} run: | rm -r _build/.doctrees - jb build lectures --path-output ./ -nW --keep-going + jb build lectures --path-output ./ - name: Upload Execution Reports (HTML) uses: actions/upload-artifact@v2 if: failure() From 74298f4831c49a1c1884b07ed969e9202ff8feec Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 8 Mar 2024 15:23:19 +1100 Subject: [PATCH 14/40] TMP: enable require.js --- lectures/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lectures/_config.yml b/lectures/_config.yml index 5da1c55b..9211779b 100644 --- a/lectures/_config.yml +++ b/lectures/_config.yml @@ -45,6 +45,8 @@ sphinx: nb_code_prompt_show: "Show {type}" suppress_warnings: [mystnb.unknown_mime_type] # ------------- + html_js_files: + - https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js html_favicon: _static/lectures-favicon.ico html_theme: quantecon_book_theme html_static_path: ['_static'] From a150cc0a7729bdcff3b0d350c97b302c4dc8c609 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 11:49:43 +1100 Subject: [PATCH 15/40] @mmcky review and proof-read --- lectures/inequality.md | 180 ++++++++++++++++++++++------------------- 1 file changed, 98 insertions(+), 82 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 6d6d08df..1e66f39a 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -66,7 +66,7 @@ We will need to install the following packages ```{code-cell} ipython3 :tags: [hide-output] -!pip install wbgapi +!pip install wbgapi plotly ``` We will also use the following imports. @@ -133,7 +133,7 @@ use in our simulations below. It is useful to construct a function that translates an array of income or wealth data into the cumulative share -of people and the cumulative share of income (or wealth). +of individuals (or households) and the cumulative share of income (or wealth). ```{code-cell} ipython3 :tags: [hide-input] @@ -223,10 +223,9 @@ plt.show() ### Lorenz curves for US data -Next let's look at the real data, focusing on income and wealth in the US in 2016. +Next let's look at data, focusing on income and wealth in the US in 2016. (data:survey-consumer-finance)= - The following code block imports a subset of the dataset `SCF_plus`, which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF). @@ -312,7 +311,9 @@ Here all the income and wealth measures are pre-tax. Total income is the sum of households' all income sources, including labor income but excluding capital gains. One key finding from this figure is that wealth inequality is significantly -more extreme than income inequality. +more extreme than income inequality. + +We will take a look at this trend over time {ref}`in a later section`. ## The Gini coefficient @@ -347,7 +348,7 @@ equality and the Lorenz curve (e.g., the shaded area in the following Figure bel ```{note} Another way to think of the gini coefficient is the area between the 45-degree line of -perfect equality and the Lorenz curve minus the area below the Lorenz curve devided by +perfect equality and the Lorenz curve minus the area below the Lorenz curve divided by the total area below the 45-degree line. In other words, it is a measure of average deviation from the line of equality. @@ -389,6 +390,8 @@ plt.show() Let's examine the Gini coefficient in some simulations. +First the code below enables us to compute the Gini coefficient. + ```{code-cell} ipython3 :tags: [hide-input] @@ -420,8 +423,7 @@ def gini_coefficient(y): return np.sum(i_sum) / (2 * n * np.sum(y)) ``` -The following code computes the Gini coefficients for five different -populations. +Now we can compute the Gini coefficients for five different populations. Each of these populations is generated by drawing from a lognormal distribution with parameters $\mu$ (mean) and $\sigma$ (standard deviation). @@ -451,6 +453,8 @@ for σ in σ_vals: ginis.append(gini_coefficient(y)) ``` +Let's build a function that returns a figure (so that we can use it later in the lecture). + ```{code-cell} ipython3 def plot_inequality_measures(x, y, legend, xlabel, ylabel): fig, ax = plt.subplots() @@ -467,8 +471,6 @@ mystnb: figure: caption: Gini coefficients of simulated data name: gini_simulated - image: - alt: gini_simulated --- fix, ax = plot_inequality_measures(σ_vals, ginis, @@ -481,32 +483,29 @@ plt.show() The plots show that inequality rises with $\sigma$, according to the Gini coefficient. -### Gini coefficient dynamics for US data +### Gini coefficient dynamics for US data (income) -Now let's look at Gini coefficients for US data. +Now let's look at the Gini coefficient using US data. -In this section we will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). +We will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). -Let's search the world bank data for gini to find the Series ID. +Let's use the `wbgapi` package we imported earlier to search the world bank data for gini to find the Series ID. ```{code-cell} ipython3 wb.search("gini") ``` -We now know the series ID is `SI.POV.GINI`. +We now know the series ID is `SI.POV.GINI`. ```{tip} Another, and often useful way to find series ID, is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data. ``` -Let us fetch the data for the USA. +Let us fetch the data for the USA and request for it to be returned as a `DataFrame`. ```{code-cell} ipython3 data = wb.data.DataFrame("SI.POV.GINI", "USA") -``` - -```{code-cell} ipython3 -data +data.head(n=5) ``` ```{tip} @@ -514,53 +513,67 @@ This package often returns data with year information contained in the columns. ``` ```{code-cell} ipython3 -data = data.T -data_usa = data['USA'] +data = data.T # transpose to get data series as columns and years as rows +data_usa = data['USA'] # obtain a simple series of USA data ``` +The `data_usa` series can now be plotted using the pandas `.plot` method. + ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients (USA) + name: gini_usa1 +--- fig, ax = plt.subplots() ax = data_usa.plot(ax=ax) ax.set_ylim(0,data_usa.max()+5) plt.show() ``` -The gini coefficient is relatively slow moving and does not have significant variation in the full range from 0 to 100. +As can be seen in {numref}`gini_usa1` the gini coefficient: -Using `pandas` we can take a quick look across all countries and all years in the world bank dataset to understand how the Gini coefficient varies across countries and time. +1. moves slowly over time, and +2. does not have significant variation in the full range from 0 to 100. + +Using `pandas` we can take a quick look across all countries and all years in the World Bank dataset. + +By leaving off the `"USA"` this function returns all Gini data that is available. ```{code-cell} ipython3 +# Fetch gini data for all countries gini_all = wb.data.DataFrame("SI.POV.GINI") -``` -```{code-cell} ipython3 # Create a long series with a multi-index of the data to get global min and max values gini_all = gini_all.unstack(level='economy').dropna() -``` -```{code-cell} ipython3 -gini_all.plot(kind="hist", title="Gini coefficient"); +# Build a histogram +gini_all.plot(kind="hist", + bins=20, + title="Gini coefficient" + ) +plt.show() ``` -Therefore we can see that across 50 years of data and all countries (including low and high income countries) the measure only varies between 20 and 65. - -This variation would be even smaller for the subset of wealthy countries, so let us zoom in a little on the US data and add some trendlines. +We can see that across 50 years of data and all countries (including low and high income countries) the measure varies between 20 and 65. -```{code-cell} ipython3 -data_usa.index = data_usa.index.map(lambda x: int(x.replace('YR',''))) -``` +This variation would be even smaller for the subset of wealthy countries. -```{code-cell} ipython3 -data_usa -``` +Let us zoom in a little on the US data and add some trendlines. -The data suggests there is a change in trend around the year 1981 +{numref}`gini_usa1` suggests there is a change in trend around the year 1981 ```{code-cell} ipython3 +data_usa.index = data_usa.index.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int +# Use pandas filters to find data before 1981 pre_1981 = data_usa[data_usa.index <= 1981] +# Use pandas filters to find data after 1981 post_1981 = data_usa[data_usa.index > 1981] ``` +We can use `numpy` to compute a linear line of best fit. + ```{code-cell} ipython3 # Pre 1981 Data Trend x1 = pre_1981.dropna().index.values @@ -573,7 +586,17 @@ y2 = post_1981.dropna().values a2, b2 = np.polyfit(x2, y2, 1) ``` +We can now built a plot that includes trend and a range that offers a closer +look at the dynamics over time in the Gini coefficient for the USA. + ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients (USA) with trend + name: gini_usa_trend +--- + x = data_usa.dropna().index.values y = data_usa.dropna().values plt.scatter(x,y) @@ -587,22 +610,23 @@ plt.xlabel("Year") plt.show() ``` -Looking at this graph you can see that inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time (growing inequality). +{numref}`gini_usa_trend` shows inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time. +(compare-income-wealth-usa-over-time)= ### Comparing income and wealth inequality (the US case) -The Gini coefficient can also be computed over different distributions such as *income* and *wealth*. +As we have discussed the Gini coefficient can also be computed over different distributions such as *income* and *wealth*. We can use the data collected above {ref}`survey of consumer finances ` to look at the gini coefficient when using income when compared to wealth data. -Let's compute the gin coefficient for net wealth, total income, and labour income for the most recent year in our sample. - -This section makes use of the following code to compute the data, however to speed up execution we have pre-compiled the results and will use that in the subsequent analysis. +We can compute the Gini coefficient for net wealth, total income, and labour income over many years. ```{code-cell} ipython3 df_income_wealth.year.describe() ``` +This code can be used to compute this information over the full dataset. + ```{code-cell} ipython3 :tags: [skip-execution, hide-input, hide-output] @@ -640,7 +664,7 @@ results = pd.DataFrame(results, index=years) results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year') ``` -While the data can be computed using the code above, we will import a pre-computed dataset from the lecture repository. +However, to speed up execution we will import a pre-computed dataset from the lecture repository. @@ -651,22 +675,17 @@ ginis.head(n=5) Let's plot the Gini coefficients for net wealth, labor income and total income. -Looking at each data series we see an outlier in gini coefficient computed for 1965. +Looking at each data series we see an outlier in gini coefficient computed for 1965 for `labour income`. We will smooth our data and take an average of the data either side of it for the time being. -```{admonition} TODO -Figure out why there is such a spike in the data for this year -``` - ```{code-cell} ipython3 ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 -``` - -```{code-cell} ipython3 ginis["l_income"].plot() ``` +Now we can focus on US net wealth + ```{code-cell} ipython3 --- mystnb: @@ -683,6 +702,8 @@ ax.set_ylabel("gini coefficient") plt.show() ``` +and look at US income for both labour and a total income (excl. capital gains) + ```{code-cell} ipython3 --- mystnb: @@ -701,6 +722,8 @@ ax.legend() plt.show() ``` +Now we can compare net wealth and labour income. + ```{code-cell} ipython3 fig, ax = plt.subplots() ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth") @@ -711,13 +734,11 @@ ax.legend() plt.show() ``` -We see that, by this measure, inequality in wealth and income has risen +We see that, by this measure, inequality in both wealth and income has risen substantially since 1980. The wealth time series exhibits a strong U-shape. -+++ - ### Cross-country comparisons of income inequality As we saw earlier in this lecture we used `wbgapi` to get gini data across many countries and saved it in a variable called `gini_all` @@ -725,11 +746,7 @@ As we saw earlier in this lecture we used `wbgapi` to get gini data across many In this section we will compare a few countries and the evolution in their respective gini coefficients ```{code-cell} ipython3 -# Obtain data for all countries as a table -data = gini_all.unstack() -``` - -```{code-cell} ipython3 +data = gini_all.unstack() # Obtain data for all countries as a table data.columns ``` @@ -743,53 +760,50 @@ data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') From this plot we can observe that the USA has a higher gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. -Norway has the lowest gini coefficient over the three economies from the year 2003, and it is substantially lower than the USA suggesting the Lorenz curve is much closer to the 45-degree line of equality. +Norway has the lowest gini coefficient over the three economies from the year 2003, and it is consistently substantially lower than the USA. -+++ +### Gini Coefficient and GDP per capita (over time) -### (Optional) Gini Coefficient and GDP per capita (over time) +We can also look at how the Gini coefficient compares with GDP per capita (over time). -```{code-cell} ipython3 -countries = ['USA', 'NOR', 'GBR'] -``` +Let's take another look at the USA, Norway, and the United Kingdom. ```{code-cell} ipython3 +countries = ['USA', 'NOR', 'GBR'] gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries).T ``` -Let's rearrange the data so that we can plot gdp per capita and the gini coefficient across years +We can rearrange the data so that we can plot gdp per capita and the gini coefficient across years ```{code-cell} ipython3 -pdata = pd.DataFrame(data[countries].unstack()) -pdata.index.names = ['country', 'year'] -pdata.columns = ['gini'] +plot_data = pd.DataFrame(data[countries].unstack()) +plot_data.index.names = ['country', 'year'] +plot_data.columns = ['gini'] ``` +Looking at the first 5 rows of data + ```{code-cell} ipython3 -pdata +plot_data.head(n=5) ``` +Now we can get the gdp per capita data into a shape that can be merged with `plot_data` + ```{code-cell} ipython3 pgdppc = pd.DataFrame(gdppc.unstack()) pgdppc.index.names = ['country', 'year'] pgdppc.columns = ['gdppc'] -``` - -```{code-cell} ipython3 plot_data = pdata.merge(pgdppc, left_index=True, right_index=True) -``` - -```{code-cell} ipython3 plot_data.reset_index(inplace=True) ``` +We will transform the year column to remove the 'YR' text and return an integer. + ```{code-cell} ipython3 plot_data.year = plot_data.year.map(lambda x: int(x.replace('YR',''))) ``` -```{code-cell} ipython3 -!pip install plotly -``` +Now using plotly to build a plot with gdp per capita on the y-axis and the gini coefficient on the x-axis. ```{code-cell} ipython3 import plotly.express as px @@ -798,6 +812,9 @@ fig.update_traces(textposition="bottom right") fig.show() ``` +This plot shows that all three western economies gdp per capita has grown over time with some fluctuations +in the gini coefficient. However the appears to be significant structural differences between Norway and the USA. + ## Top shares Another popular measure of inequality is the top shares. @@ -809,7 +826,6 @@ In this section we show how to compute top shares. ### Definition - As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. Given the Lorenz curve $y = L(x)$ defined above, the top $100 \times p \%$ From 5ba46f32e8cb8ffe056a91f7597cde84ce3f91a0 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 12:34:40 +1100 Subject: [PATCH 16/40] review of executable version in jupyter lab --- .../usa-gini-nwealth-tincome-lincome.csv | 40 ++++++------- lectures/inequality.md | 58 ++++++++++++++----- 2 files changed, 64 insertions(+), 34 deletions(-) diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv index 85f233ab..4bf8d779 100644 --- a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -1,21 +1,21 @@ year,n_wealth,t_income,l_income -1950,0.8257332034366359,0.44248654139458704,0.5342948198773421 -1953,0.8059487586599332,0.42645440609359414,0.5158978980963693 -1956,0.8121790488050623,0.4442694287339929,0.5349293526208143 -1959,0.7952068741637921,0.4374934807706162,0.5213985948309414 -1962,0.8086945076579385,0.4435843103853643,0.5345127915054336 -1965,0.7904149225687938,0.4376371546666339,0.748786002088776 -1968,0.7982885066993525,0.4208620794438893,0.5242396427381537 -1971,0.7911574835420264,0.4233344246090261,0.5576454812313487 -1977,0.7571418922185211,0.46187678800902404,0.5704448110072055 -1983,0.7494335400643021,0.43934561846446935,0.5662220844385908 -1989,0.7715705301674326,0.5115249581654199,0.6013995687471441 -1992,0.75081266140553,0.47406506720767994,0.5983592657979562 -1995,0.7569492388110272,0.48965523558400526,0.596977951671689 -1998,0.7603291991801175,0.49117441585168564,0.5774462841723361 -2001,0.7816118750507013,0.5239092994681116,0.6042739644967291 -2004,0.7700355469522365,0.4884350383903243,0.5981432201792726 -2007,0.7821413776486991,0.5197156312086196,0.6263452195753233 -2010,0.8250825295193426,0.5195972120145639,0.6453653328291923 -2013,0.8227698931835287,0.5314001749843371,0.6498682917772659 -2016,0.8342975903562232,0.5541400068900836,0.6706846793375284 +1950,0.825733203436636,0.44248654139458754,0.5342948198773422 +1953,0.8059487586599333,0.42645440609359464,0.5158978980963698 +1956,0.8121790488050616,0.4442694287339925,0.5349293526208134 +1959,0.7952068741637915,0.43749348077061606,0.5213985948309418 +1962,0.8086945076579374,0.4435843103853642,0.5345127915054336 +1965,0.7904149225687952,0.43763715466663355,0.7487860020887757 +1968,0.7982885066993517,0.42086207944388965,0.5242396427381534 +1971,0.7911574835420259,0.42333442460902565,0.5576454812313468 +1977,0.7571418922185198,0.46187678800902515,0.5704448110072063 +1983,0.7494335400643009,0.43934561846446973,0.5662220844385935 +1989,0.7715705301674317,0.5115249581654214,0.6013995687471423 +1992,0.7508126614055307,0.47406506720767516,0.5983592657979556 +1995,0.7569492388110264,0.48965523558400864,0.5969779516716902 +1998,0.7603291991801189,0.49117441585169025,0.5774462841723348 +2001,0.7816118750507017,0.5239092994681133,0.604273964496734 +2004,0.7700355469522374,0.48843503839032487,0.5981432201792718 +2007,0.7821413776486984,0.5197156312086194,0.6263452195753234 +2010,0.8250825295193427,0.5195972120145639,0.6453653328291896 +2013,0.8227698931835266,0.5314001749843371,0.6498682917772642 +2016,0.8342975903562223,0.5541400068900839,0.6706846793375303 diff --git a/lectures/inequality.md b/lectures/inequality.md index 1e66f39a..af15ced3 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -77,6 +77,7 @@ import numpy as np import matplotlib.pyplot as plt import random as rd import wbgapi as wb +import plotly.express as px ``` ## The Lorenz curve @@ -596,7 +597,6 @@ mystnb: caption: Gini coefficients (USA) with trend name: gini_usa_trend --- - x = data_usa.dropna().index.values y = data_usa.dropna().values plt.scatter(x,y) @@ -619,7 +619,7 @@ As we have discussed the Gini coefficient can also be computed over different di We can use the data collected above {ref}`survey of consumer finances ` to look at the gini coefficient when using income when compared to wealth data. -We can compute the Gini coefficient for net wealth, total income, and labour income over many years. +We can compute the Gini coefficient for net wealth, total income, and labour income over many years. ```{code-cell} ipython3 df_income_wealth.year.describe() @@ -677,7 +677,7 @@ Let's plot the Gini coefficients for net wealth, labor income and total income. Looking at each data series we see an outlier in gini coefficient computed for 1965 for `labour income`. -We will smooth our data and take an average of the data either side of it for the time being. +We will smooth our data and take an average of the data either side of it for the time being. ```{code-cell} ipython3 ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 @@ -722,7 +722,7 @@ ax.legend() plt.show() ``` -Now we can compare net wealth and labour income. +Now we can compare net wealth and labour income. ```{code-cell} ipython3 fig, ax = plt.subplots() @@ -758,6 +758,21 @@ Let us compare three western economies: USA, United Kingdom, and Norway data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') ``` +We see that Norway has a shorter time series so let us take a closer look at the underlying data + +```{code-cell} ipython3 +data[['NOR']].dropna().head(n=5) +``` + +The data for Norway in this dataset goes back to 1979 but there are gaps in the time series and matplotlib is not showing those data points. + +We can use `dataframe.ffill()` to copy and bring forward the last known value in a series to fill in these gaps + +```{code-cell} ipython3 +data['NOR'] = data['NOR'].ffill() +data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') +``` + From this plot we can observe that the USA has a higher gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. Norway has the lowest gini coefficient over the three economies from the year 2003, and it is consistently substantially lower than the USA. @@ -781,19 +796,13 @@ plot_data.index.names = ['country', 'year'] plot_data.columns = ['gini'] ``` -Looking at the first 5 rows of data - -```{code-cell} ipython3 -plot_data.head(n=5) -``` - Now we can get the gdp per capita data into a shape that can be merged with `plot_data` ```{code-cell} ipython3 pgdppc = pd.DataFrame(gdppc.unstack()) pgdppc.index.names = ['country', 'year'] pgdppc.columns = ['gdppc'] -plot_data = pdata.merge(pgdppc, left_index=True, right_index=True) +plot_data = plot_data.merge(pgdppc, left_index=True, right_index=True) plot_data.reset_index(inplace=True) ``` @@ -803,11 +812,32 @@ We will transform the year column to remove the 'YR' text and return an integer. plot_data.year = plot_data.year.map(lambda x: int(x.replace('YR',''))) ``` -Now using plotly to build a plot with gdp per capita on the y-axis and the gini coefficient on the x-axis. +Now using plotly to build a plot with gdp per capita on the y-axis and the gini coefficient on the x-axis. ```{code-cell} ipython3 -import plotly.express as px -fig = px.line(plot_data, x="gini", y="gdppc", color="country", text="year", height=800) +min_year = plot_data.year.min() +max_year = plot_data.year.max() +``` + +```{note} +The time series for all three countries start and stop in different years. We will add a year mask to the data to +improve clarity in the chart including the different end years associated with each countries time series. +``` + +```{code-cell} ipython3 +labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + list(range(min_year,max_year,5)) +plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) +``` + +```{code-cell} ipython3 +fig = px.line(plot_data, + x = "gini", + y = "gdppc", + color = "country", + text = "year", + height = 800, + labels = {"gini" : "Gini coefficient", "gdppc" : "GDP per capita"} + ) fig.update_traces(textposition="bottom right") fig.show() ``` From de796df498bd29fb8fd7440341a4ed66f2806ca9 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:02:04 +1100 Subject: [PATCH 17/40] address @jstac comment 1 --- lectures/inequality.md | 53 +++++++++++++++++++++++++++++++++--------- 1 file changed, 42 insertions(+), 11 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index af15ced3..55b24ba5 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -341,20 +341,11 @@ G := {2n\sum_{i=1}^n w_i}. $$ (eq:gini) - The Gini coefficient is closely related to the Lorenz curve. In fact, it can be shown that its value is twice the area between the line of equality and the Lorenz curve (e.g., the shaded area in the following Figure below). -```{note} -Another way to think of the gini coefficient is the area between the 45-degree line of -perfect equality and the Lorenz curve minus the area below the Lorenz curve divided by -the total area below the 45-degree line. - -In other words, it is a measure of average deviation from the line of equality. -``` - The idea is that $G=0$ indicates complete equality, while $G=1$ indicates complete inequality. ```{code-cell} ipython3 @@ -363,8 +354,6 @@ mystnb: figure: caption: Shaded Lorenz curve of simulated data name: lorenz_gini - image: - alt: lorenz_gini --- fig, ax = plt.subplots() @@ -387,6 +376,48 @@ ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area') plt.show() ``` +Another way to think of the Gini coefficient is as a ratio of the area between the 45-degree line of +perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B). + +```{seealso} +The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient]) +``` + +```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Lorenz curve and Gini coefficient + name: lorenz_gini2 +--- +fig, ax = plt.subplots() + +f_vals, l_vals = lorenz_curve(sample) + +ax.plot(f_vals, l_vals, label='lognormal sample', lw=2) +ax.plot(f_vals, f_vals, label='equality', lw=2) + +ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) +ax.fill_between(f_vals, l_vals, np.zeros_like(f_vals), alpha=0.06) + +ax.set_ylim((0, 1)) +ax.set_xlim((0, 1)) + +ax.text(0.55, 0.4, 'A') +ax.text(0.75, 0.15, 'B') + +ax.legend() +plt.show() +``` + +$$ +G = \frac{A}{A+B} +$$ + +It is an average measure of deviation from the line of equality. + ++++ + ### Gini coefficient dynamics of simulated data Let's examine the Gini coefficient in some simulations. From ed9f3183dd5ab863fdb655bd475b1e14af9067f2 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:04:50 +1100 Subject: [PATCH 18/40] drop dynamics from title of section --- lectures/inequality.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 55b24ba5..7928abcc 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -416,9 +416,7 @@ $$ It is an average measure of deviation from the line of equality. -+++ - -### Gini coefficient dynamics of simulated data +### Gini coefficient of simulated data Let's examine the Gini coefficient in some simulations. From 976f16bffbd7d77fb65a69c2b4cd48b7f6c04728 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:06:00 +1100 Subject: [PATCH 19/40] use matplotlib default cycler --- lectures/inequality.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 7928abcc..86cc65aa 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -629,8 +629,8 @@ mystnb: x = data_usa.dropna().index.values y = data_usa.dropna().values plt.scatter(x,y) -plt.plot(x1, a1*x1+b1, 'r-') -plt.plot(x2, a2*x2+b2, 'y-') +plt.plot(x1, a1*x1+b1) +plt.plot(x2, a2*x2+b2) plt.title("USA gini coefficient dynamics") plt.legend(['Gini coefficient', 'Trend (before 1981)', 'Trend (after 1981)']) plt.ylim(25,45) From 059eaf67b7e7321576a07d50af9c084c2b5786a2 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:09:42 +1100 Subject: [PATCH 20/40] update gini to Gini --- lectures/inequality.md | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 86cc65aa..0e32e44f 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -519,7 +519,7 @@ Now let's look at the Gini coefficient using US data. We will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). -Let's use the `wbgapi` package we imported earlier to search the world bank data for gini to find the Series ID. +Let's use the `wbgapi` package we imported earlier to search the world bank data for Gini to find the Series ID. ```{code-cell} ipython3 wb.search("gini") @@ -562,7 +562,7 @@ ax.set_ylim(0,data_usa.max()+5) plt.show() ``` -As can be seen in {numref}`gini_usa1` the gini coefficient: +As can be seen in {numref}`gini_usa1` the Gini coefficient: 1. moves slowly over time, and 2. does not have significant variation in the full range from 0 to 100. @@ -631,11 +631,10 @@ y = data_usa.dropna().values plt.scatter(x,y) plt.plot(x1, a1*x1+b1) plt.plot(x2, a2*x2+b2) -plt.title("USA gini coefficient dynamics") +plt.title("US Gini coefficient dynamics") plt.legend(['Gini coefficient', 'Trend (before 1981)', 'Trend (after 1981)']) -plt.ylim(25,45) plt.ylabel("Gini coefficient") -plt.xlabel("Year") +plt.xlabel("year") plt.show() ``` @@ -646,7 +645,7 @@ plt.show() As we have discussed the Gini coefficient can also be computed over different distributions such as *income* and *wealth*. -We can use the data collected above {ref}`survey of consumer finances ` to look at the gini coefficient when using income when compared to wealth data. +We can use the data collected above {ref}`survey of consumer finances ` to look at the Gini coefficient when using income when compared to wealth data. We can compute the Gini coefficient for net wealth, total income, and labour income over many years. @@ -704,7 +703,7 @@ ginis.head(n=5) Let's plot the Gini coefficients for net wealth, labor income and total income. -Looking at each data series we see an outlier in gini coefficient computed for 1965 for `labour income`. +Looking at each data series we see an outlier in Gini coefficient computed for 1965 for `labour income`. We will smooth our data and take an average of the data either side of it for the time being. @@ -770,9 +769,9 @@ The wealth time series exhibits a strong U-shape. ### Cross-country comparisons of income inequality -As we saw earlier in this lecture we used `wbgapi` to get gini data across many countries and saved it in a variable called `gini_all` +As we saw earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all` -In this section we will compare a few countries and the evolution in their respective gini coefficients +In this section we will compare a few countries and the evolution in their respective Gini coefficients ```{code-cell} ipython3 data = gini_all.unstack() # Obtain data for all countries as a table @@ -802,9 +801,9 @@ data['NOR'] = data['NOR'].ffill() data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') ``` -From this plot we can observe that the USA has a higher gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. +From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. -Norway has the lowest gini coefficient over the three economies from the year 2003, and it is consistently substantially lower than the USA. +Norway has the lowest Gini coefficient over the three economies from the year 2003, and it is consistently substantially lower than the USA. ### Gini Coefficient and GDP per capita (over time) @@ -817,7 +816,7 @@ countries = ['USA', 'NOR', 'GBR'] gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries).T ``` -We can rearrange the data so that we can plot gdp per capita and the gini coefficient across years +We can rearrange the data so that we can plot gdp per capita and the Gini coefficient across years ```{code-cell} ipython3 plot_data = pd.DataFrame(data[countries].unstack()) @@ -841,7 +840,7 @@ We will transform the year column to remove the 'YR' text and return an integer. plot_data.year = plot_data.year.map(lambda x: int(x.replace('YR',''))) ``` -Now using plotly to build a plot with gdp per capita on the y-axis and the gini coefficient on the x-axis. +Now using plotly to build a plot with gdp per capita on the y-axis and the Gini coefficient on the x-axis. ```{code-cell} ipython3 min_year = plot_data.year.min() @@ -872,7 +871,7 @@ fig.show() ``` This plot shows that all three western economies gdp per capita has grown over time with some fluctuations -in the gini coefficient. However the appears to be significant structural differences between Norway and the USA. +in the Gini coefficient. However the appears to be significant structural differences between Norway and the USA. ## Top shares From dd87e385db86dfa514a4a3f9633067de4c73c763 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:14:01 +1100 Subject: [PATCH 21/40] set x and y labels --- lectures/inequality.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 0e32e44f..fde7fa22 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -558,7 +558,9 @@ mystnb: --- fig, ax = plt.subplots() ax = data_usa.plot(ax=ax) -ax.set_ylim(0,data_usa.max()+5) +ax.set_ylim(0, data_usa.max() + 5) +ax.set_ylabel("Gini coefficient") +ax.set_xlabel("year") plt.show() ``` @@ -582,7 +584,7 @@ gini_all = gini_all.unstack(level='economy').dropna() gini_all.plot(kind="hist", bins=20, title="Gini coefficient" - ) + ) plt.show() ``` From 50f75e324264a1a214665365ecbee814d6bb32c2 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:26:15 +1100 Subject: [PATCH 22/40] review in jupyter-lab --- .../usa-gini-nwealth-tincome-lincome.csv | 40 +++++++++---------- lectures/inequality.md | 31 ++++++-------- 2 files changed, 33 insertions(+), 38 deletions(-) diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv index 4bf8d779..a80120ae 100644 --- a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -1,21 +1,21 @@ year,n_wealth,t_income,l_income -1950,0.825733203436636,0.44248654139458754,0.5342948198773422 -1953,0.8059487586599333,0.42645440609359464,0.5158978980963698 -1956,0.8121790488050616,0.4442694287339925,0.5349293526208134 -1959,0.7952068741637915,0.43749348077061606,0.5213985948309418 -1962,0.8086945076579374,0.4435843103853642,0.5345127915054336 -1965,0.7904149225687952,0.43763715466663355,0.7487860020887757 -1968,0.7982885066993517,0.42086207944388965,0.5242396427381534 -1971,0.7911574835420259,0.42333442460902565,0.5576454812313468 -1977,0.7571418922185198,0.46187678800902515,0.5704448110072063 -1983,0.7494335400643009,0.43934561846446973,0.5662220844385935 -1989,0.7715705301674317,0.5115249581654214,0.6013995687471423 -1992,0.7508126614055307,0.47406506720767516,0.5983592657979556 -1995,0.7569492388110264,0.48965523558400864,0.5969779516716902 -1998,0.7603291991801189,0.49117441585169025,0.5774462841723348 -2001,0.7816118750507017,0.5239092994681133,0.604273964496734 -2004,0.7700355469522374,0.48843503839032487,0.5981432201792718 -2007,0.7821413776486984,0.5197156312086194,0.6263452195753234 -2010,0.8250825295193427,0.5195972120145639,0.6453653328291896 -2013,0.8227698931835266,0.5314001749843371,0.6498682917772642 -2016,0.8342975903562223,0.5541400068900839,0.6706846793375303 +1950,0.8257332034366344,0.44248654139458693,0.5342948198773417 +1953,0.8059487586599325,0.42645440609359514,0.5158978980963705 +1956,0.8121790488050629,0.4442694287339931,0.5349293526208135 +1959,0.7952068741637919,0.4374934807706156,0.5213985948309419 +1962,0.8086945076579375,0.4435843103853644,0.5345127915054356 +1965,0.7904149225687939,0.43763715466663433,0.7487860020887759 +1968,0.7982885066993506,0.42086207944389026,0.5242396427381537 +1971,0.7911574835420256,0.42333442460902515,0.5576454812313486 +1977,0.7571418922185218,0.46187678800902543,0.5704448110072071 +1983,0.749433540064304,0.43934561846446973,0.5662220844385909 +1989,0.7715705301674298,0.51152495816542,0.6013995687471444 +1992,0.7508126614055317,0.4740650672076807,0.5983592657979544 +1995,0.7569492388110282,0.48965523558400603,0.596977951671693 +1998,0.7603291991801175,0.4911744158516888,0.5774462841723299 +2001,0.7816118750507037,0.5239092994681126,0.6042739644967319 +2004,0.7700355469522371,0.48843503839032426,0.5981432201792735 +2007,0.782141377648699,0.5197156312086187,0.6263452195753223 +2010,0.8250825295193419,0.5195972120145633,0.6453653328291933 +2013,0.8227698931835327,0.5314001749843346,0.6498682917772663 +2016,0.8342975903562247,0.5541400068900854,0.670684679337527 diff --git a/lectures/inequality.md b/lectures/inequality.md index fde7fa22..429146fd 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -303,6 +303,8 @@ ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth') ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income') ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income') ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality') +ax.set_xlabel("household percentile") +ax.set_ylabel("income/wealth percentile") ax.legend() plt.show() ``` @@ -356,23 +358,18 @@ mystnb: name: lorenz_gini --- fig, ax = plt.subplots() - f_vals, l_vals = lorenz_curve(sample) ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) - -ax.legend() - ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') - ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) - ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) - ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area') - +ax.set_xlabel("household percentile") +ax.set_ylabel("income/wealth percentile") +ax.legend() plt.show() ``` @@ -391,21 +388,17 @@ mystnb: name: lorenz_gini2 --- fig, ax = plt.subplots() - f_vals, l_vals = lorenz_curve(sample) - ax.plot(f_vals, l_vals, label='lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) - ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) ax.fill_between(f_vals, l_vals, np.zeros_like(f_vals), alpha=0.06) - ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) - ax.text(0.55, 0.4, 'A') ax.text(0.75, 0.15, 'B') - +ax.set_xlabel("household percentile") +ax.set_ylabel("income/wealth percentile") ax.legend() plt.show() ``` @@ -711,7 +704,9 @@ We will smooth our data and take an average of the data either side of it for th ```{code-cell} ipython3 ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 -ginis["l_income"].plot() +ax = ginis["l_income"].plot() +ax.set_ylabel("Gini coefficient") +plt.show() ``` Now we can focus on US net wealth @@ -728,7 +723,7 @@ mystnb: fig, ax = plt.subplots() ax.plot(years, ginis["n_wealth"], marker='o') ax.set_xlabel("year") -ax.set_ylabel("gini coefficient") +ax.set_ylabel("Gini coefficient") plt.show() ``` @@ -747,7 +742,7 @@ fig, ax = plt.subplots() ax.plot(years, ginis["l_income"], marker='o', label="labor income") ax.plot(years, ginis["t_income"], marker='o', label="total income") ax.set_xlabel("year") -ax.set_ylabel("gini coefficient") +ax.set_ylabel("Gini coefficient") ax.legend() plt.show() ``` @@ -759,7 +754,7 @@ fig, ax = plt.subplots() ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth") ax.plot(years, ginis["l_income"], marker='o', label="labour income") ax.set_xlabel("year") -ax.set_ylabel("gini coefficient") +ax.set_ylabel("Gini coefficient") ax.legend() plt.show() ``` From 2f43f2642f8579269ad60845e95b289668d1ca49 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 13:41:16 +1100 Subject: [PATCH 23/40] tweaks from review of preview --- lectures/inequality.md | 45 +++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 20 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 429146fd..8d1777c9 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -698,27 +698,12 @@ ginis.head(n=5) Let's plot the Gini coefficients for net wealth, labor income and total income. -Looking at each data series we see an outlier in Gini coefficient computed for 1965 for `labour income`. - -We will smooth our data and take an average of the data either side of it for the time being. - -```{code-cell} ipython3 -ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 -ax = ginis["l_income"].plot() -ax.set_ylabel("Gini coefficient") -plt.show() -``` - -Now we can focus on US net wealth - ```{code-cell} ipython3 --- mystnb: figure: caption: Gini coefficients of US net wealth name: gini_wealth_us - image: - alt: gini_wealth_us --- fig, ax = plt.subplots() ax.plot(years, ginis["n_wealth"], marker='o') @@ -727,7 +712,15 @@ ax.set_ylabel("Gini coefficient") plt.show() ``` -and look at US income for both labour and a total income (excl. capital gains) +Looking at each data series we see an outlier in Gini coefficient computed for 1965 for `labour income`. + +We will smooth our data and take an average of the data either side of it for the time being. + +```{code-cell} ipython3 +ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 +``` + +Now looking at US income for both labour and a total income. ```{code-cell} ipython3 --- @@ -735,8 +728,6 @@ mystnb: figure: caption: Gini coefficients of US income name: gini_income_us - image: - alt: gini_income_us --- fig, ax = plt.subplots() ax.plot(years, ginis["l_income"], marker='o', label="labor income") @@ -750,6 +741,12 @@ plt.show() Now we can compare net wealth and labour income. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients of US net wealth and labour income + name: gini_wealth_us +--- fig, ax = plt.subplots() ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth") ax.plot(years, ginis["l_income"], marker='o', label="labour income") @@ -791,11 +788,14 @@ data[['NOR']].dropna().head(n=5) The data for Norway in this dataset goes back to 1979 but there are gaps in the time series and matplotlib is not showing those data points. -We can use `dataframe.ffill()` to copy and bring forward the last known value in a series to fill in these gaps +We can use the `.ffill()` method to copy and bring forward the last known value in a series to fill in these gaps ```{code-cell} ipython3 data['NOR'] = data['NOR'].ffill() -data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') +ax = data[['USA','GBR', 'NOR']].plot() +ax.set_xlabel('year') +ax.set_ylabel('Gini coefficient') +plt.show() ``` From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. @@ -854,6 +854,7 @@ labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + list(range(min_year, plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) ``` +(fig:plotly-gini-gdppc-years)= ```{code-cell} ipython3 fig = px.line(plot_data, x = "gini", @@ -867,6 +868,10 @@ fig.update_traces(textposition="bottom right") fig.show() ``` +```{only} latex +This figure is built using `plotly` and is {ref}` available on the website ` +``` + This plot shows that all three western economies gdp per capita has grown over time with some fluctuations in the Gini coefficient. However the appears to be significant structural differences between Norway and the USA. From 31a0f135d0aa23c1f9101614c43f2dad93f22914 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 14:03:28 +1100 Subject: [PATCH 24/40] remove incorrect statement --- lectures/inequality.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 8d1777c9..7e83e990 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -583,8 +583,6 @@ plt.show() We can see that across 50 years of data and all countries (including low and high income countries) the measure varies between 20 and 65. -This variation would be even smaller for the subset of wealthy countries. - Let us zoom in a little on the US data and add some trendlines. {numref}`gini_usa1` suggests there is a change in trend around the year 1981 From db3a8dcb004f6ef9b420cc382562052444a43e1c Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 14:07:23 +1100 Subject: [PATCH 25/40] minor edit --- lectures/inequality.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 7e83e990..91e3be7d 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -798,7 +798,7 @@ plt.show() From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. -Norway has the lowest Gini coefficient over the three economies from the year 2003, and it is consistently substantially lower than the USA. +Norway has the lowest Gini coefficient over the three economies it is substantially lower than the US. ### Gini Coefficient and GDP per capita (over time) From 2707b5d14d84f367042e5095f2e4aaab4dacb7c2 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 14:12:36 +1100 Subject: [PATCH 26/40] improve narrative --- lectures/inequality.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 91e3be7d..23cd9623 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -870,8 +870,12 @@ fig.show() This figure is built using `plotly` and is {ref}` available on the website ` ``` -This plot shows that all three western economies gdp per capita has grown over time with some fluctuations -in the Gini coefficient. However the appears to be significant structural differences between Norway and the USA. +This plot shows that all three western economies GDP per capita has grown over time with some fluctuations +in the Gini coefficient. From the early 80's the United Kingdom and the US economies both saw increases in income +inequality. + +Interestingly, since the year 2000, the United Kingdom saw a decline in income inequality while +the US exhibits persistent but stable levels around a Gini coefficient of 40. ## Top shares From cdf794dfa314ea0ef3fbd506989547159ee9aaa2 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 12 Mar 2024 15:48:57 +1100 Subject: [PATCH 27/40] incorporate some of @jstac comments --- .../usa-gini-nwealth-tincome-lincome.csv | 40 +++++++++---------- lectures/inequality.md | 29 ++++++++------ 2 files changed, 36 insertions(+), 33 deletions(-) diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv index a80120ae..7339431e 100644 --- a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -1,21 +1,21 @@ year,n_wealth,t_income,l_income -1950,0.8257332034366344,0.44248654139458693,0.5342948198773417 -1953,0.8059487586599325,0.42645440609359514,0.5158978980963705 -1956,0.8121790488050629,0.4442694287339931,0.5349293526208135 -1959,0.7952068741637919,0.4374934807706156,0.5213985948309419 -1962,0.8086945076579375,0.4435843103853644,0.5345127915054356 -1965,0.7904149225687939,0.43763715466663433,0.7487860020887759 -1968,0.7982885066993506,0.42086207944389026,0.5242396427381537 -1971,0.7911574835420256,0.42333442460902515,0.5576454812313486 -1977,0.7571418922185218,0.46187678800902543,0.5704448110072071 -1983,0.749433540064304,0.43934561846446973,0.5662220844385909 -1989,0.7715705301674298,0.51152495816542,0.6013995687471444 -1992,0.7508126614055317,0.4740650672076807,0.5983592657979544 -1995,0.7569492388110282,0.48965523558400603,0.596977951671693 -1998,0.7603291991801175,0.4911744158516888,0.5774462841723299 -2001,0.7816118750507037,0.5239092994681126,0.6042739644967319 -2004,0.7700355469522371,0.48843503839032426,0.5981432201792735 -2007,0.782141377648699,0.5197156312086187,0.6263452195753223 -2010,0.8250825295193419,0.5195972120145633,0.6453653328291933 -2013,0.8227698931835327,0.5314001749843346,0.6498682917772663 -2016,0.8342975903562247,0.5541400068900854,0.670684679337527 +1950,0.8257332034366353,0.4424865413945867,0.5342948198773424 +1953,0.8059487586599343,0.42645440609359475,0.5158978980963699 +1956,0.8121790488050622,0.4442694287339929,0.5349293526208142 +1959,0.7952068741637924,0.43749348077061573,0.5213985948309421 +1962,0.8086945076579368,0.4435843103853639,0.5345127915054342 +1965,0.790414922568795,0.43763715466663367,0.7487860020887751 +1968,0.7982885066993514,0.42086207944388976,0.5242396427381543 +1971,0.7911574835420238,0.4233344246090258,0.5576454812313485 +1977,0.7571418922185226,0.461876788009026,0.5704448110072052 +1983,0.7494335400643025,0.4393456184644705,0.5662220844385915 +1989,0.7715705301674318,0.511524958165423,0.6013995687471408 +1992,0.7508126614055309,0.4740650672076755,0.5983592657979545 +1995,0.7569492388110265,0.4896552355840044,0.5969779516716882 +1998,0.760329199180118,0.4911744158516898,0.5774462841723345 +2001,0.7816118750507034,0.5239092994681134,0.6042739644967283 +2004,0.7700355469522369,0.4884350383903255,0.5981432201792665 +2007,0.7821413776486987,0.5197156312086179,0.6263452195753251 +2010,0.825082529519343,0.5195972120145644,0.6453653328291921 +2013,0.8227698931835268,0.5314001749843339,0.6498682917772639 +2016,0.8342975903562239,0.5541400068900838,0.6706846793375301 diff --git a/lectures/inequality.md b/lectures/inequality.md index 23cd9623..316aa790 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -102,18 +102,18 @@ The curve $L$ is just a function $y = L(x)$ that we can plot and interpret. To create it we first generate data points $(x_i, y_i)$ according to +```{prf:definition} $$ x_i = \frac{i}{n}, \qquad y_i = \frac{\sum_{j \leq i} w_j}{\sum_{j \leq n} w_j}, \qquad i = 1, \ldots, n $$ +``` Now the Lorenz curve $L$ is formed from these data points using interpolation. -```{tip} If we use a line plot in `matplotlib`, the interpolation will be done for us. -``` The meaning of the statement $y = L(x)$ is that the lowest $(100 \times x)$\% of people have $(100 \times y)$\% of all wealth. @@ -337,11 +337,15 @@ smallest to largest. The Gini coefficient is defined for the sample above as +```{prf:definition} +:label: define-gini + $$ G := \frac{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} {2n\sum_{i=1}^n w_i}. -$$ (eq:gini) +$$ () +``` The Gini coefficient is closely related to the Lorenz curve. @@ -529,6 +533,7 @@ Let us fetch the data for the USA and request for it to be returned as a `DataFr ```{code-cell} ipython3 data = wb.data.DataFrame("SI.POV.GINI", "USA") data.head(n=5) +data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int ``` ```{tip} @@ -559,8 +564,9 @@ plt.show() As can be seen in {numref}`gini_usa1` the Gini coefficient: -1. moves slowly over time, and -2. does not have significant variation in the full range from 0 to 100. +1. trended upward from 1980 to 2020 and then dropped slightly following the COVID pandemic +1. moves slowly over time +3. does not have significant variation in the full range from 0 to 100 Using `pandas` we can take a quick look across all countries and all years in the World Bank dataset. @@ -569,6 +575,7 @@ By leaving off the `"USA"` this function returns all Gini data that is available ```{code-cell} ipython3 # Fetch gini data for all countries gini_all = wb.data.DataFrame("SI.POV.GINI") +gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int # Create a long series with a multi-index of the data to get global min and max values gini_all = gini_all.unstack(level='economy').dropna() @@ -588,7 +595,6 @@ Let us zoom in a little on the US data and add some trendlines. {numref}`gini_usa1` suggests there is a change in trend around the year 1981 ```{code-cell} ipython3 -data_usa.index = data_usa.index.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int # Use pandas filters to find data before 1981 pre_1981 = data_usa[data_usa.index <= 1981] # Use pandas filters to find data after 1981 @@ -808,7 +814,9 @@ Let's take another look at the USA, Norway, and the United Kingdom. ```{code-cell} ipython3 countries = ['USA', 'NOR', 'GBR'] -gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries).T +gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries) +gdppc.columns = gdppc.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int +gdppc = gdppc.T ``` We can rearrange the data so that we can plot gdp per capita and the Gini coefficient across years @@ -829,12 +837,6 @@ plot_data = plot_data.merge(pgdppc, left_index=True, right_index=True) plot_data.reset_index(inplace=True) ``` -We will transform the year column to remove the 'YR' text and return an integer. - -```{code-cell} ipython3 -plot_data.year = plot_data.year.map(lambda x: int(x.replace('YR',''))) -``` - Now using plotly to build a plot with gdp per capita on the y-axis and the Gini coefficient on the x-axis. ```{code-cell} ipython3 @@ -853,6 +855,7 @@ plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) ``` (fig:plotly-gini-gdppc-years)= + ```{code-cell} ipython3 fig = px.line(plot_data, x = "gini", From ec2a8d6080d5f888f4f9e52088b60a54ccb50542 Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 18 Mar 2024 14:49:02 +1100 Subject: [PATCH 28/40] fix equation in prf:definition --- lectures/inequality.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 316aa790..93b7db56 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -344,7 +344,7 @@ $$ G := \frac{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} {2n\sum_{i=1}^n w_i}. -$$ () +$$ ``` The Gini coefficient is closely related to the Lorenz curve. From 0d8a64a57bf5db961d9ac64839b9bb2fe7a3abf7 Mon Sep 17 00:00:00 2001 From: mmcky Date: Thu, 21 Mar 2024 11:34:50 +1100 Subject: [PATCH 29/40] address remaining @jstac direct comments --- lectures/inequality.md | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 93b7db56..1756a81a 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -103,6 +103,8 @@ The curve $L$ is just a function $y = L(x)$ that we can plot and interpret. To create it we first generate data points $(x_i, y_i)$ according to ```{prf:definition} +:label: define-lorenz + $$ x_i = \frac{i}{n}, \qquad @@ -137,7 +139,6 @@ income or wealth data into the cumulative share of individuals (or households) and the cumulative share of income (or wealth). ```{code-cell} ipython3 -:tags: [hide-input] def lorenz_curve(y): """ @@ -350,7 +351,7 @@ $$ The Gini coefficient is closely related to the Lorenz curve. In fact, it can be shown that its value is twice the area between the line of -equality and the Lorenz curve (e.g., the shaded area in the following Figure below). +equality and the Lorenz curve (e.g., the shaded area in {numref}`lorenz_gini`). The idea is that $G=0$ indicates complete equality, while $G=1$ indicates complete inequality. @@ -524,9 +525,7 @@ wb.search("gini") We now know the series ID is `SI.POV.GINI`. -```{tip} Another, and often useful way to find series ID, is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data. -``` Let us fetch the data for the USA and request for it to be returned as a `DataFrame`. @@ -536,9 +535,7 @@ data.head(n=5) data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int ``` -```{tip} -This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting -``` +**Note:** This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting ```{code-cell} ipython3 data = data.T # transpose to get data series as columns and years as rows @@ -631,7 +628,7 @@ plt.scatter(x,y) plt.plot(x1, a1*x1+b1) plt.plot(x2, a2*x2+b2) plt.title("US Gini coefficient dynamics") -plt.legend(['Gini coefficient', 'Trend (before 1981)', 'Trend (after 1981)']) +plt.legend(['Gini coefficient', 'trend (before 1981)', 'trend (after 1981)']) plt.ylabel("Gini coefficient") plt.xlabel("year") plt.show() @@ -781,7 +778,7 @@ There are 167 countries represented in this dataset. Let us compare three western economies: USA, United Kingdom, and Norway ```{code-cell} ipython3 -data[['USA','GBR', 'NOR']].plot(ylabel='gini coefficient') +data[['USA','GBR', 'NOR']].plot(ylabel='Gini coefficient') ``` We see that Norway has a shorter time series so let us take a closer look at the underlying data From 25414fcce6829428714c2e66fecdd905a41176ff Mon Sep 17 00:00:00 2001 From: mmcky Date: Thu, 21 Mar 2024 11:37:22 +1100 Subject: [PATCH 30/40] save branch conflict --- lectures/inequality.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index a344ebb6..1756a81a 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -66,11 +66,7 @@ We will need to install the following packages ```{code-cell} ipython3 :tags: [hide-output] -<<<<<<< HEAD !pip install wbgapi plotly -======= -!pip install quantecon ->>>>>>> main ``` We will also use the following imports. @@ -80,11 +76,8 @@ import pandas as pd import numpy as np import matplotlib.pyplot as plt import random as rd -<<<<<<< HEAD import wbgapi as wb import plotly.express as px -======= ->>>>>>> main ``` ## The Lorenz curve From de9b5d22fac45e2f5fe3002ed3bbfa6a0b1f4128 Mon Sep 17 00:00:00 2001 From: mmcky Date: Thu, 21 Mar 2024 12:36:22 +1100 Subject: [PATCH 31/40] @mmcky final review --- .../usa-gini-nwealth-tincome-lincome.csv | 40 +++--- lectures/inequality.md | 118 +++++++----------- 2 files changed, 65 insertions(+), 93 deletions(-) diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv index 7339431e..bf820364 100644 --- a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -1,21 +1,21 @@ year,n_wealth,t_income,l_income -1950,0.8257332034366353,0.4424865413945867,0.5342948198773424 -1953,0.8059487586599343,0.42645440609359475,0.5158978980963699 -1956,0.8121790488050622,0.4442694287339929,0.5349293526208142 -1959,0.7952068741637924,0.43749348077061573,0.5213985948309421 -1962,0.8086945076579368,0.4435843103853639,0.5345127915054342 -1965,0.790414922568795,0.43763715466663367,0.7487860020887751 -1968,0.7982885066993514,0.42086207944388976,0.5242396427381543 -1971,0.7911574835420238,0.4233344246090258,0.5576454812313485 -1977,0.7571418922185226,0.461876788009026,0.5704448110072052 -1983,0.7494335400643025,0.4393456184644705,0.5662220844385915 -1989,0.7715705301674318,0.511524958165423,0.6013995687471408 -1992,0.7508126614055309,0.4740650672076755,0.5983592657979545 -1995,0.7569492388110265,0.4896552355840044,0.5969779516716882 -1998,0.760329199180118,0.4911744158516898,0.5774462841723345 -2001,0.7816118750507034,0.5239092994681134,0.6042739644967283 -2004,0.7700355469522369,0.4884350383903255,0.5981432201792665 -2007,0.7821413776486987,0.5197156312086179,0.6263452195753251 -2010,0.825082529519343,0.5195972120145644,0.6453653328291921 -2013,0.8227698931835268,0.5314001749843339,0.6498682917772639 -2016,0.8342975903562239,0.5541400068900838,0.6706846793375301 +1950,0.8257332034366338,0.44248654139458626,0.5342948198773412 +1953,0.8059487586599329,0.4264544060935945,0.5158978980963702 +1956,0.8121790488050616,0.44426942873399283,0.5349293526208142 +1959,0.795206874163792,0.43749348077061573,0.5213985948309416 +1962,0.8086945076579359,0.4435843103853645,0.5345127915054341 +1965,0.7904149225687935,0.43763715466663444,0.7487860020887753 +1968,0.7982885066993497,0.4208620794438902,0.5242396427381545 +1971,0.7911574835420259,0.4233344246090255,0.5576454812313466 +1977,0.7571418922185215,0.46187678800902543,0.5704448110072049 +1983,0.7494335400643013,0.439345618464469,0.5662220844385915 +1989,0.7715705301674302,0.5115249581654197,0.601399568747142 +1992,0.7508126614055308,0.4740650672076798,0.5983592657979563 +1995,0.7569492388110265,0.48965523558400603,0.5969779516716903 +1998,0.7603291991801185,0.49117441585168614,0.5774462841723305 +2001,0.7816118750507056,0.5239092994681135,0.6042739644967272 +2004,0.7700355469522361,0.4884350383903255,0.5981432201792727 +2007,0.7821413776486978,0.5197156312086187,0.626345219575322 +2010,0.8250825295193438,0.5195972120145615,0.6453653328291903 +2013,0.8227698931835303,0.531400174984336,0.6498682917772644 +2016,0.8342975903562234,0.5541400068900825,0.6706846793375284 diff --git a/lectures/inequality.md b/lectures/inequality.md index 1756a81a..87072ddf 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -139,7 +139,6 @@ income or wealth data into the cumulative share of individuals (or households) and the cumulative share of income (or wealth). ```{code-cell} ipython3 - def lorenz_curve(y): """ Calculates the Lorenz Curve, a graphical representation of @@ -216,9 +215,9 @@ ax.plot(f_vals, f_vals, label='equality', lw=2) ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') ax.set_xlim((0, 1)) -ax.set_xlabel("Cumulative share of households (%)") +ax.set_xlabel("share of households (%)") ax.set_ylim((0, 1)) -ax.set_ylabel("Cumulative share of income (%)") +ax.set_ylabel("share of income (%)") ax.legend() plt.show() ``` @@ -304,8 +303,8 @@ ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth') ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income') ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income') ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality') -ax.set_xlabel("household percentile") -ax.set_ylabel("income/wealth percentile") +ax.set_xlabel("share of households (%)") +ax.set_ylabel("share of income/wealth (%)") ax.legend() plt.show() ``` @@ -372,18 +371,14 @@ ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area') -ax.set_xlabel("household percentile") -ax.set_ylabel("income/wealth percentile") +ax.set_xlabel("share of households (%)") +ax.set_ylabel("share of income/wealth (%)") ax.legend() plt.show() ``` Another way to think of the Gini coefficient is as a ratio of the area between the 45-degree line of -perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B). - -```{seealso} -The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient]) -``` +perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B) as shown in {numref}`lorenz_gini2`. ```{code-cell} ipython3 --- @@ -402,8 +397,8 @@ ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) ax.text(0.55, 0.4, 'A') ax.text(0.75, 0.15, 'B') -ax.set_xlabel("household percentile") -ax.set_ylabel("income/wealth percentile") +ax.set_xlabel("share of households (%)") +ax.set_ylabel("share of income/wealth (%)") ax.legend() plt.show() ``` @@ -414,6 +409,10 @@ $$ It is an average measure of deviation from the line of equality. +```{seealso} +The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient]) +``` + ### Gini coefficient of simulated data Let's examine the Gini coefficient in some simulations. @@ -463,10 +462,8 @@ In each case we set $\mu = - \sigma^2 / 2$. This implies that the mean of the distribution does not change with $\sigma$. -```{note} You can check this by looking up the expression for the mean of a lognormal distribution. -``` ```{code-cell} ipython3 k = 5 @@ -504,18 +501,18 @@ fix, ax = plot_inequality_measures(σ_vals, ginis, 'simulated', '$\sigma$', - 'gini coefficients') + 'Gini coefficients') plt.show() ``` The plots show that inequality rises with $\sigma$, according to the Gini coefficient. -### Gini coefficient dynamics for US data (income) +### Gini coefficient for US data (income) Now let's look at the Gini coefficient using US data. -We will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). +We will get pre-computed Gini coefficients (based on income) from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). Let's use the `wbgapi` package we imported earlier to search the world bank data for Gini to find the Series ID. @@ -578,63 +575,28 @@ gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # rem gini_all = gini_all.unstack(level='economy').dropna() # Build a histogram -gini_all.plot(kind="hist", - bins=20, - title="Gini coefficient" - ) +ax = gini_all.plot(kind="hist", bins=20) +ax.set_xlabel("Gini coefficient") +ax.set_ylabel("frequency") plt.show() ``` -We can see that across 50 years of data and all countries (including low and high income countries) the measure varies between 20 and 65. - -Let us zoom in a little on the US data and add some trendlines. - -{numref}`gini_usa1` suggests there is a change in trend around the year 1981 +We can see that across 50 years of data and all countries (including low and high income countries) the measure only varies between 20 and 65. -```{code-cell} ipython3 -# Use pandas filters to find data before 1981 -pre_1981 = data_usa[data_usa.index <= 1981] -# Use pandas filters to find data after 1981 -post_1981 = data_usa[data_usa.index > 1981] -``` - -We can use `numpy` to compute a linear line of best fit. - -```{code-cell} ipython3 -# Pre 1981 Data Trend -x1 = pre_1981.dropna().index.values -y1 = pre_1981.dropna().values -a1, b1 = np.polyfit(x1, y1, 1) - -# Post 1981 Data Trend -x2 = post_1981.dropna().index.values -y2 = post_1981.dropna().values -a2, b2 = np.polyfit(x2, y2, 1) -``` +{numref}`gini_usa1` suggests there is a change in trend around the year 1980. -We can now built a plot that includes trend and a range that offers a closer -look at the dynamics over time in the Gini coefficient for the USA. +Let us zoom on the US data so we can more clearly observe trends. ```{code-cell} ipython3 ---- -mystnb: - figure: - caption: Gini coefficients (USA) with trend - name: gini_usa_trend ---- -x = data_usa.dropna().index.values -y = data_usa.dropna().values -plt.scatter(x,y) -plt.plot(x1, a1*x1+b1) -plt.plot(x2, a2*x2+b2) -plt.title("US Gini coefficient dynamics") -plt.legend(['Gini coefficient', 'trend (before 1981)', 'trend (after 1981)']) -plt.ylabel("Gini coefficient") -plt.xlabel("year") +fig, ax = plt.subplots() +ax = data_usa.plot(ax=ax) +ax.set_ylim(data_usa.min()-1, data_usa.max()+1) +ax.set_ylabel("Gini coefficient") +ax.set_xlabel("year") plt.show() ``` -{numref}`gini_usa_trend` shows inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time. +{numref}`gini_usa_trend` shows inequality was falling in the USA until 1980 when it appears to have started to change course and steadily rise over time. (compare-income-wealth-usa-over-time)= ### Comparing income and wealth inequality (the US case) @@ -766,7 +728,7 @@ The wealth time series exhibits a strong U-shape. As we saw earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all` -In this section we will compare a few countries and the evolution in their respective Gini coefficients +In this section we will compare a few western economies and look at the evolution in their respective Gini coefficients ```{code-cell} ipython3 data = gini_all.unstack() # Obtain data for all countries as a table @@ -778,7 +740,11 @@ There are 167 countries represented in this dataset. Let us compare three western economies: USA, United Kingdom, and Norway ```{code-cell} ipython3 -data[['USA','GBR', 'NOR']].plot(ylabel='Gini coefficient') +ax = data[['USA','GBR', 'NOR']].plot() +ax.set_xlabel('year') +ax.set_ylabel('Gini coefficient') +ax.legend(title="") +plt.show() ``` We see that Norway has a shorter time series so let us take a closer look at the underlying data @@ -796,12 +762,13 @@ data['NOR'] = data['NOR'].ffill() ax = data[['USA','GBR', 'NOR']].plot() ax.set_xlabel('year') ax.set_ylabel('Gini coefficient') +ax.legend(title="") plt.show() ``` From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. -Norway has the lowest Gini coefficient over the three economies it is substantially lower than the US. +Norway has the lowest Gini coefficient over the three economies and is substantially lower than the US. ### Gini Coefficient and GDP per capita (over time) @@ -841,10 +808,9 @@ min_year = plot_data.year.min() max_year = plot_data.year.max() ``` -```{note} -The time series for all three countries start and stop in different years. We will add a year mask to the data to + +**Note:** The time series for all three countries start and stop in different years. We will add a year mask to the data to improve clarity in the chart including the different end years associated with each countries time series. -``` ```{code-cell} ipython3 labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + list(range(min_year,max_year,5)) @@ -871,7 +837,9 @@ This figure is built using `plotly` and is {ref}` available on the website Date: Thu, 21 Mar 2024 12:49:53 +1100 Subject: [PATCH 32/40] update figures with mystnb figure settings --- lectures/inequality.md | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 87072ddf..45c732d0 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -567,6 +567,12 @@ Using `pandas` we can take a quick look across all countries and all years in th By leaving off the `"USA"` this function returns all Gini data that is available. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram of Gini coefficients + name: gini_histogram +--- # Fetch gini data for all countries gini_all = wb.data.DataFrame("SI.POV.GINI") gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int @@ -588,6 +594,12 @@ We can see that across 50 years of data and all countries (including low and hig Let us zoom on the US data so we can more clearly observe trends. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients (USA) + name: gini_usa_trend +--- fig, ax = plt.subplots() ax = data_usa.plot(ax=ax) ax.set_ylim(data_usa.min()-1, data_usa.max()+1) @@ -740,6 +752,12 @@ There are 167 countries represented in this dataset. Let us compare three western economies: USA, United Kingdom, and Norway ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients (USA, United Kingdom, and Norway) + name: gini_usa_gbr_nor1 +--- ax = data[['USA','GBR', 'NOR']].plot() ax.set_xlabel('year') ax.set_ylabel('Gini coefficient') @@ -758,6 +776,12 @@ The data for Norway in this dataset goes back to 1979 but there are gaps in the We can use the `.ffill()` method to copy and bring forward the last known value in a series to fill in these gaps ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients (USA, United Kingdom, and Norway) + name: gini_usa_gbr_nor2 +--- data['NOR'] = data['NOR'].ffill() ax = data[['USA','GBR', 'NOR']].plot() ax.set_xlabel('year') @@ -820,6 +844,12 @@ plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) (fig:plotly-gini-gdppc-years)= ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients and GDP per capita (USA, United Kingdom, and Norway) + name: gini_gdppc_usa_gbr_nor1 +--- fig = px.line(plot_data, x = "gini", y = "gdppc", @@ -928,8 +958,6 @@ mystnb: figure: caption: US top shares name: top_shares_us - image: - alt: top_shares_us --- fig, ax = plt.subplots() ax.plot(years, df_topshares["topshare_l_income"], From 3b65128e433a313fcadeb2ee853275b420d78af9 Mon Sep 17 00:00:00 2001 From: mmcky Date: Thu, 21 Mar 2024 13:06:33 +1100 Subject: [PATCH 33/40] review from html --- lectures/inequality.md | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 45c732d0..d3265c74 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -420,7 +420,6 @@ Let's examine the Gini coefficient in some simulations. First the code below enables us to compute the Gini coefficient. ```{code-cell} ipython3 -:tags: [hide-input] def gini_coefficient(y): r""" @@ -529,14 +528,15 @@ Let us fetch the data for the USA and request for it to be returned as a `DataFr ```{code-cell} ipython3 data = wb.data.DataFrame("SI.POV.GINI", "USA") data.head(n=5) -data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int +# remove 'YR' in index and convert to integer +data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) ``` **Note:** This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting ```{code-cell} ipython3 -data = data.T # transpose to get data series as columns and years as rows -data_usa = data['USA'] # obtain a simple series of USA data +data = data.T # Obtain years as rows +data_usa = data['USA'] # Series of US data ``` The `data_usa` series can now be plotted using the pandas `.plot` method. @@ -575,7 +575,8 @@ mystnb: --- # Fetch gini data for all countries gini_all = wb.data.DataFrame("SI.POV.GINI") -gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int +# remove 'YR' in index and convert to integer +gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # Create a long series with a multi-index of the data to get global min and max values gini_all = gini_all.unstack(level='economy').dropna() @@ -743,7 +744,7 @@ As we saw earlier in this lecture we used `wbgapi` to get Gini data across many In this section we will compare a few western economies and look at the evolution in their respective Gini coefficients ```{code-cell} ipython3 -data = gini_all.unstack() # Obtain data for all countries as a table +data = gini_all.unstack() data.columns ``` @@ -803,7 +804,8 @@ Let's take another look at the USA, Norway, and the United Kingdom. ```{code-cell} ipython3 countries = ['USA', 'NOR', 'GBR'] gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries) -gdppc.columns = gdppc.columns.map(lambda x: int(x.replace('YR',''))) # remove 'YR' in index and convert to int +# remove 'YR' in index and convert to integer +gdppc.columns = gdppc.columns.map(lambda x: int(x.replace('YR',''))) gdppc = gdppc.T ``` @@ -837,19 +839,14 @@ max_year = plot_data.year.max() improve clarity in the chart including the different end years associated with each countries time series. ```{code-cell} ipython3 -labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + list(range(min_year,max_year,5)) +labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] / + + list(range(min_year,max_year,5)) plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) ``` (fig:plotly-gini-gdppc-years)= ```{code-cell} ipython3 ---- -mystnb: - figure: - caption: Gini coefficients and GDP per capita (USA, United Kingdom, and Norway) - name: gini_gdppc_usa_gbr_nor1 ---- fig = px.line(plot_data, x = "gini", y = "gdppc", From 0ee53aad92924aedba87729ddcd1cea20bdedbf7 Mon Sep 17 00:00:00 2001 From: mmcky Date: Thu, 21 Mar 2024 13:09:33 +1100 Subject: [PATCH 34/40] fix line continuation --- lectures/inequality.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index d3265c74..d1ec05c8 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -839,8 +839,8 @@ max_year = plot_data.year.max() improve clarity in the chart including the different end years associated with each countries time series. ```{code-cell} ipython3 -labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] / - + list(range(min_year,max_year,5)) +labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + \ + list(range(min_year,max_year,5)) plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) ``` From e5d34b4c14dc58a73c48f26c00fa31c878dcfa3f Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 25 Mar 2024 09:58:23 +1100 Subject: [PATCH 35/40] Incorporate @jstac feedback and comments --- lectures/inequality.md | 114 ++++++++++++++--------------------------- 1 file changed, 38 insertions(+), 76 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index d1ec05c8..ad0ed300 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -410,7 +410,7 @@ $$ It is an average measure of deviation from the line of equality. ```{seealso} -The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient]) +The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient) ``` ### Gini coefficient of simulated data @@ -507,9 +507,9 @@ plt.show() The plots show that inequality rises with $\sigma$, according to the Gini coefficient. -### Gini coefficient for US data (income) +### Gini coefficient for income (US data) -Now let's look at the Gini coefficient using US data. +Let's look at the Gini coefficient for the distribution of income in the US. We will get pre-computed Gini coefficients (based on income) from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). @@ -523,49 +523,8 @@ We now know the series ID is `SI.POV.GINI`. Another, and often useful way to find series ID, is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data. -Let us fetch the data for the USA and request for it to be returned as a `DataFrame`. - -```{code-cell} ipython3 -data = wb.data.DataFrame("SI.POV.GINI", "USA") -data.head(n=5) -# remove 'YR' in index and convert to integer -data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) -``` - -**Note:** This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting - -```{code-cell} ipython3 -data = data.T # Obtain years as rows -data_usa = data['USA'] # Series of US data -``` - -The `data_usa` series can now be plotted using the pandas `.plot` method. - -```{code-cell} ipython3 ---- -mystnb: - figure: - caption: Gini coefficients (USA) - name: gini_usa1 ---- -fig, ax = plt.subplots() -ax = data_usa.plot(ax=ax) -ax.set_ylim(0, data_usa.max() + 5) -ax.set_ylabel("Gini coefficient") -ax.set_xlabel("year") -plt.show() -``` - -As can be seen in {numref}`gini_usa1` the Gini coefficient: - -1. trended upward from 1980 to 2020 and then dropped slightly following the COVID pandemic -1. moves slowly over time -3. does not have significant variation in the full range from 0 to 100 - Using `pandas` we can take a quick look across all countries and all years in the World Bank dataset. -By leaving off the `"USA"` this function returns all Gini data that is available. - ```{code-cell} ipython3 --- mystnb: @@ -588,9 +547,24 @@ ax.set_ylabel("frequency") plt.show() ``` -We can see that across 50 years of data and all countries (including low and high income countries) the measure only varies between 20 and 65. +We can see in {numref}`gini_histogram` that across 50 years of data and all countries +the measure only varies between 20 and 65. -{numref}`gini_usa1` suggests there is a change in trend around the year 1980. +Let us fetch the data `DataFrame` for the USA. + +```{code-cell} ipython3 +data = wb.data.DataFrame("SI.POV.GINI", "USA") +data.head(n=5) +# remove 'YR' in index and convert to integer +data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) +``` + +**Note:** This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting + +```{code-cell} ipython3 +data = data.T # Obtain years as rows +data_usa = data['USA'] # pd.Series of US data +``` Let us zoom on the US data so we can more clearly observe trends. @@ -598,33 +572,39 @@ Let us zoom on the US data so we can more clearly observe trends. --- mystnb: figure: - caption: Gini coefficients (USA) - name: gini_usa_trend + caption: Gini coefficients for income distribution (USA) + name: gini_usa1 --- fig, ax = plt.subplots() ax = data_usa.plot(ax=ax) ax.set_ylim(data_usa.min()-1, data_usa.max()+1) -ax.set_ylabel("Gini coefficient") +ax.set_ylabel("Gini coefficient (income)") ax.set_xlabel("year") plt.show() ``` -{numref}`gini_usa_trend` shows inequality was falling in the USA until 1980 when it appears to have started to change course and steadily rise over time. +As can be seen in {numref}`gini_usa1` the Gini coefficient: + +1. trended upward from 1980 to 2020 and then dropped slightly following at the start of the COVID pandemic +2. moves slowly over time (compare-income-wealth-usa-over-time)= -### Comparing income and wealth inequality (the US case) +### Gini coefficient for wealth (US data) + +In the previous section we looked at the Gini coefficient for income using US data. -As we have discussed the Gini coefficient can also be computed over different distributions such as *income* and *wealth*. +Now let's look at the Gini coefficient for the distribution of wealth. -We can use the data collected above {ref}`survey of consumer finances ` to look at the Gini coefficient when using income when compared to wealth data. +We can use the data collected above {ref}`survey of consumer finances ` to look at the Gini coefficient +computed over the wealth distribution. -We can compute the Gini coefficient for net wealth, total income, and labour income over many years. +The Gini coefficient for net wealth and labour income is computed over many years. ```{code-cell} ipython3 df_income_wealth.year.describe() ``` -This code can be used to compute this information over the full dataset. +**Note:** This code can be used to compute this information over the full dataset. ```{code-cell} ipython3 :tags: [skip-execution, hide-input, hide-output] @@ -672,7 +652,7 @@ ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincom ginis.head(n=5) ``` -Let's plot the Gini coefficients for net wealth, labor income and total income. +Let's plot the Gini coefficients for net wealth. ```{code-cell} ipython3 --- @@ -696,24 +676,6 @@ We will smooth our data and take an average of the data either side of it for th ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 ``` -Now looking at US income for both labour and a total income. - -```{code-cell} ipython3 ---- -mystnb: - figure: - caption: Gini coefficients of US income - name: gini_income_us ---- -fig, ax = plt.subplots() -ax.plot(years, ginis["l_income"], marker='o', label="labor income") -ax.plot(years, ginis["t_income"], marker='o', label="total income") -ax.set_xlabel("year") -ax.set_ylabel("Gini coefficient") -ax.legend() -plt.show() -``` - Now we can compare net wealth and labour income. ```{code-cell} ipython3 @@ -756,7 +718,7 @@ Let us compare three western economies: USA, United Kingdom, and Norway --- mystnb: figure: - caption: Gini coefficients (USA, United Kingdom, and Norway) + caption: Gini coefficients for income (USA, United Kingdom, and Norway) name: gini_usa_gbr_nor1 --- ax = data[['USA','GBR', 'NOR']].plot() @@ -780,7 +742,7 @@ We can use the `.ffill()` method to copy and bring forward the last known value --- mystnb: figure: - caption: Gini coefficients (USA, United Kingdom, and Norway) + caption: Gini coefficients for income (USA, United Kingdom, and Norway) name: gini_usa_gbr_nor2 --- data['NOR'] = data['NOR'].ffill() From 50cbfbef45559f3536539ea8fb42ec79d865a05f Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 26 Mar 2024 10:58:24 +1100 Subject: [PATCH 36/40] reinstate hardline error on warning for html build on CI --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 306b454c..5e9c5e60 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -64,7 +64,7 @@ jobs: shell: bash -l {0} run: | rm -r _build/.doctrees - jb build lectures --path-output ./ + jb build lectures --path-output ./ -nW --keep-going - name: Upload Execution Reports (HTML) uses: actions/upload-artifact@v2 if: failure() From 4b63d81a582bbd13d4f262e0cb49fb005c12b098 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 26 Mar 2024 11:21:32 +1100 Subject: [PATCH 37/40] Fix: duplicate label --- lectures/inequality.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index ad0ed300..db5197cc 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -683,7 +683,7 @@ Now we can compare net wealth and labour income. mystnb: figure: caption: Gini coefficients of US net wealth and labour income - name: gini_wealth_us + name: gini_wealth_us2 --- fig, ax = plt.subplots() ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth") @@ -1058,8 +1058,6 @@ Plot the top shares generated from Lorenz curve and the top shares approximated :class: dropdown ``` -+++ - Here is one solution: ```{code-cell} ipython3 From a15098bb6b2f056bb038069193656fa37fe9fede Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 26 Mar 2024 11:45:42 +1100 Subject: [PATCH 38/40] add ignore on myst.domain warnings --- lectures/_config.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/_config.yml b/lectures/_config.yml index d309fc39..1d5a36e6 100644 --- a/lectures/_config.yml +++ b/lectures/_config.yml @@ -44,7 +44,7 @@ sphinx: nb_render_image_options: width: 80% nb_code_prompt_show: "Show {type}" - suppress_warnings: [mystnb.unknown_mime_type] + suppress_warnings: [mystnb.unknown_mime_type, myst.domains] # ------------- html_js_files: - https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js From 44c69022ecdea6ea04106c9e68c4ed331ef858e6 Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 2 Apr 2024 10:04:07 +1100 Subject: [PATCH 39/40] address @jstac feedback --- lectures/inequality.md | 39 +++++---------------------------------- 1 file changed, 5 insertions(+), 34 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index db5197cc..0c26c7ee 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -566,7 +566,7 @@ data = data.T # Obtain years as rows data_usa = data['USA'] # pd.Series of US data ``` -Let us zoom on the US data so we can more clearly observe trends. +Let us take a look at the data for the US. ```{code-cell} ipython3 --- @@ -668,42 +668,13 @@ ax.set_ylabel("Gini coefficient") plt.show() ``` -Looking at each data series we see an outlier in Gini coefficient computed for 1965 for `labour income`. - -We will smooth our data and take an average of the data either side of it for the time being. - -```{code-cell} ipython3 -ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2 -``` - -Now we can compare net wealth and labour income. - -```{code-cell} ipython3 ---- -mystnb: - figure: - caption: Gini coefficients of US net wealth and labour income - name: gini_wealth_us2 ---- -fig, ax = plt.subplots() -ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth") -ax.plot(years, ginis["l_income"], marker='o', label="labour income") -ax.set_xlabel("year") -ax.set_ylabel("Gini coefficient") -ax.legend() -plt.show() -``` - -We see that, by this measure, inequality in both wealth and income has risen -substantially since 1980. - The wealth time series exhibits a strong U-shape. ### Cross-country comparisons of income inequality -As we saw earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all` +Earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all` -In this section we will compare a few western economies and look at the evolution in their respective Gini coefficients +In this section we will compare a few Western economies and look at the evolution in their respective Gini coefficients ```{code-cell} ipython3 data = gini_all.unstack() @@ -712,7 +683,7 @@ data.columns There are 167 countries represented in this dataset. -Let us compare three western economies: USA, United Kingdom, and Norway +Let us compare three Western economies: USA, United Kingdom, and Norway ```{code-cell} ipython3 --- @@ -825,7 +796,7 @@ fig.show() This figure is built using `plotly` and is {ref}` available on the website ` ``` -This plot shows that all three western economies GDP per capita has grown over time with some fluctuations +This plot shows that all three Western economies GDP per capita has grown over time with some fluctuations in the Gini coefficient. From the early 80's the United Kingdom and the US economies both saw increases in income From bb24ba7580a306f8944d4e06ed778267e82e6602 Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 3 Apr 2024 12:53:48 +1100 Subject: [PATCH 40/40] misc --- lectures/inequality.md | 135 ++++++++++++++++++++++------------------- 1 file changed, 73 insertions(+), 62 deletions(-) diff --git a/lectures/inequality.md b/lectures/inequality.md index 0c26c7ee..acfded55 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -25,8 +25,8 @@ In this section we Many historians argue that inequality played a key role in the fall of the Roman Republic. -After defeating Carthage and invading Spain, money flowed into Rome and -greatly enriched those in power. +Following the defeat of Carthage and the invasion of Spain, money flowed into +Rome from across the empire, greatly enriched those in power. Meanwhile, ordinary citizens were taken from their farms to fight for long periods, diminishing their wealth. @@ -40,26 +40,23 @@ with Octavian (Augustus) in 27 BCE. This history is fascinating in its own right, and we can see some parallels with certain countries in the modern world. -Many recent political debates revolve around inequality. +Let's now look at inequality in some of these countries. -Many economic policies, from taxation to the welfare state, are -aimed at addressing inequality. ### Measurement + +Political debates often revolve around inequality. + One problem with these debates is that inequality is often poorly defined. Moreover, debates on inequality are often tied to political beliefs. -This is dangerous for economists because allowing political beliefs to -shape our findings reduces objectivity. - -To bring a truly scientific perspective to the topic of inequality we must -start with careful definitions. +This is dangerous for economists because allowing political beliefs to shape our findings reduces objectivity. -In this lecture we discuss standard measures of inequality used in economic research. +To bring a truly scientific perspective to the topic of inequality we must start with careful definitions. -For each of these measures, we will look at both simulated and real data. +Hence we begin by discussing ways that inequality can be measured in economic research. We will need to install the following packages @@ -91,7 +88,7 @@ In this section we define the Lorenz curve and examine its properties. The Lorenz curve takes a sample $w_1, \ldots, w_n$ and produces a curve $L$. -We suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. +We suppose that the sample has been sorted from smallest to largest. To aid our interpretation, suppose that we are measuring wealth @@ -224,10 +221,10 @@ plt.show() ### Lorenz curves for US data -Next let's look at data, focusing on income and wealth in the US in 2016. +Next let's look at US data for both income and wealth. (data:survey-consumer-finance)= -The following code block imports a subset of the dataset `SCF_plus`, +The following code block imports a subset of the dataset `SCF_plus` for 2016, which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF). ```{code-cell} ipython3 @@ -240,7 +237,7 @@ df_income_wealth = df.dropna() df_income_wealth.head(n=5) ``` -The following code block uses data stored in dataframe `df_income_wealth` to generate the Lorenz curves. +The next code block uses data stored in dataframe `df_income_wealth` to generate the Lorenz curves. (The code is somewhat complex because we need to adjust the data according to population weights supplied by the SCF.) @@ -289,6 +286,10 @@ l_vals_nw, l_vals_ti, l_vals_li = L_vals Now we plot Lorenz curves for net wealth, total income and labor income in the US in 2016. +Total income is the sum of households' all income sources, including labor income but excluding capital gains. + +(All income measures are pre-tax.) + ```{code-cell} ipython3 --- mystnb: @@ -309,31 +310,26 @@ ax.legend() plt.show() ``` -Here all the income and wealth measures are pre-tax. -Total income is the sum of households' all income sources, including labor income but excluding capital gains. +One key finding from this figure is that wealth inequality is more extreme than income inequality. + -One key finding from this figure is that wealth inequality is significantly -more extreme than income inequality. -We will take a look at this trend over time {ref}`in a later section`. ## The Gini coefficient The Lorenz curve is a useful visual representation of inequality in a distribution. -Another popular measure of income and wealth inequality is the Gini coefficient. - -The Gini coefficient is just a number, rather than a curve. +Another way to study income and wealth inequality is via the Gini coefficient. In this section we discuss the Gini coefficient and its relationship to the Lorenz curve. + ### Definition -As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from -smallest to largest. +As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. The Gini coefficient is defined for the sample above as @@ -377,8 +373,14 @@ ax.legend() plt.show() ``` -Another way to think of the Gini coefficient is as a ratio of the area between the 45-degree line of -perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B) as shown in {numref}`lorenz_gini2`. +In fact the Gini coefficient can also be expressed as + +$$ +G = \frac{A}{A+B} +$$ + +where $A$ is the area between the 45-degree line of +perfect equality and the Lorenz curve, while $B$ is the area below the Lorenze curve -- see {numref}`lorenz_gini2`. ```{code-cell} ipython3 --- @@ -403,11 +405,7 @@ ax.legend() plt.show() ``` -$$ -G = \frac{A}{A+B} -$$ -It is an average measure of deviation from the line of equality. ```{seealso} The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient) @@ -417,7 +415,7 @@ The World in Data project has a [nice graphical exploration of the Lorenz curve Let's examine the Gini coefficient in some simulations. -First the code below enables us to compute the Gini coefficient. +The code below computes the Gini coefficient from a sample. ```{code-cell} ipython3 @@ -521,9 +519,9 @@ wb.search("gini") We now know the series ID is `SI.POV.GINI`. -Another, and often useful way to find series ID, is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data. +(Another way to find the series ID is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data.) -Using `pandas` we can take a quick look across all countries and all years in the World Bank dataset. +To get a quick overview, let's histogram Gini coefficients across all countries and all years in the World Bank dataset. ```{code-cell} ipython3 --- @@ -547,8 +545,7 @@ ax.set_ylabel("frequency") plt.show() ``` -We can see in {numref}`gini_histogram` that across 50 years of data and all countries -the measure only varies between 20 and 65. +We can see in {numref}`gini_histogram` that across 50 years of data and all countries the measure varies between 20 and 65. Let us fetch the data `DataFrame` for the USA. @@ -559,7 +556,8 @@ data.head(n=5) data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) ``` -**Note:** This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting +(This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting.) + ```{code-cell} ipython3 data = data.T # Obtain years as rows @@ -583,10 +581,8 @@ ax.set_xlabel("year") plt.show() ``` -As can be seen in {numref}`gini_usa1` the Gini coefficient: - -1. trended upward from 1980 to 2020 and then dropped slightly following at the start of the COVID pandemic -2. moves slowly over time +As can be seen in {numref}`gini_usa1`, the income Gini +trended upward from 1980 to 2020 and then dropped following at the start of the COVID pandemic. (compare-income-wealth-usa-over-time)= ### Gini coefficient for wealth (US data) @@ -595,10 +591,9 @@ In the previous section we looked at the Gini coefficient for income using US da Now let's look at the Gini coefficient for the distribution of wealth. -We can use the data collected above {ref}`survey of consumer finances ` to look at the Gini coefficient +We can use the {ref}`Survey of Consumer Finances data ` to look at the Gini coefficient computed over the wealth distribution. -The Gini coefficient for net wealth and labour income is computed over many years. ```{code-cell} ipython3 df_income_wealth.year.describe() @@ -668,13 +663,24 @@ ax.set_ylabel("Gini coefficient") plt.show() ``` -The wealth time series exhibits a strong U-shape. +The time series for the wealth Gini exhibits a U-shape, falling until the early +1980s and then increasing rapidly. + + +One possibility is that this change is mainly driven by technology. + +However, we will see below that not all advanced economies experienced similar growth of inequality. + + + + ### Cross-country comparisons of income inequality Earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all` -In this section we will compare a few Western economies and look at the evolution in their respective Gini coefficients +In this section we will use this data to compare several advanced economies, and +to look at the evolution in their respective income Ginis. ```{code-cell} ipython3 data = gini_all.unstack() @@ -683,7 +689,7 @@ data.columns There are 167 countries represented in this dataset. -Let us compare three Western economies: USA, United Kingdom, and Norway +Let us compare three advanced economies: the US, the UK, and Norway ```{code-cell} ipython3 --- @@ -699,7 +705,9 @@ ax.legend(title="") plt.show() ``` -We see that Norway has a shorter time series so let us take a closer look at the underlying data +We see that Norway has a shorter time series. + +Let us take a closer look at the underlying data and see if we can rectify this. ```{code-cell} ipython3 data[['NOR']].dropna().head(n=5) @@ -724,15 +732,19 @@ ax.legend(title="") plt.show() ``` -From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. +From this plot we can observe that the US has a higher Gini coefficient (i.e. +higher income inequality) when compared to the UK and Norway. + +Norway has the lowest Gini coefficient over the three economies and, moreover, +the Gini coefficient shows no upward trend. + -Norway has the lowest Gini coefficient over the three economies and is substantially lower than the US. ### Gini Coefficient and GDP per capita (over time) We can also look at how the Gini coefficient compares with GDP per capita (over time). -Let's take another look at the USA, Norway, and the United Kingdom. +Let's take another look at the US, Norway, and the UK. ```{code-cell} ipython3 countries = ['USA', 'NOR', 'GBR'] @@ -742,7 +754,7 @@ gdppc.columns = gdppc.columns.map(lambda x: int(x.replace('YR',''))) gdppc = gdppc.T ``` -We can rearrange the data so that we can plot gdp per capita and the Gini coefficient across years +We can rearrange the data so that we can plot GDP per capita and the Gini coefficient across years ```{code-cell} ipython3 plot_data = pd.DataFrame(data[countries].unstack()) @@ -750,7 +762,7 @@ plot_data.index.names = ['country', 'year'] plot_data.columns = ['gini'] ``` -Now we can get the gdp per capita data into a shape that can be merged with `plot_data` +Now we can get the GDP per capita data into a shape that can be merged with `plot_data` ```{code-cell} ipython3 pgdppc = pd.DataFrame(gdppc.unstack()) @@ -760,15 +772,14 @@ plot_data = plot_data.merge(pgdppc, left_index=True, right_index=True) plot_data.reset_index(inplace=True) ``` -Now using plotly to build a plot with gdp per capita on the y-axis and the Gini coefficient on the x-axis. +Now we use Plotly to build a plot with GDP per capita on the y-axis and the Gini coefficient on the x-axis. ```{code-cell} ipython3 min_year = plot_data.year.min() max_year = plot_data.year.max() ``` - -**Note:** The time series for all three countries start and stop in different years. We will add a year mask to the data to +The time series for all three countries start and stop in different years. We will add a year mask to the data to improve clarity in the chart including the different end years associated with each countries time series. ```{code-cell} ipython3 @@ -796,24 +807,24 @@ fig.show() This figure is built using `plotly` and is {ref}` available on the website ` ``` -This plot shows that all three Western economies GDP per capita has grown over time with some fluctuations -in the Gini coefficient. +This plot shows that all three Western economies GDP per capita has grown over +time with some fluctuations in the Gini coefficient. -From the early 80's the United Kingdom and the US economies both saw increases in income -inequality. +From the early 80's the United Kingdom and the US economies both saw increases +in income inequality. Interestingly, since the year 2000, the United Kingdom saw a decline in income inequality while the US exhibits persistent but stable levels around a Gini coefficient of 40. + ## Top shares Another popular measure of inequality is the top shares. -Measuring specific shares is less complex than the Lorenz curve or the Gini -coefficient. In this section we show how to compute top shares. + ### Definition As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest.