Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inequality] Review of lecture and incorporate updates #384

Merged
merged 42 commits into from
Apr 3, 2024
Merged

Conversation

mmcky
Copy link
Contributor

@mmcky mmcky commented Feb 22, 2024

This PR is a full review of the inequality lecture

It incorporates the following feedback from RA reading groups (#380, #379, #297, #295)

General

  • review and add appropriate labels to figures x and y axis for figures
  • ensure figures are captioned and numbered using mystnb metadata

Code

Note: this has been done but it gives the lecture a long run time when calculating gini coefficients for the US data across 50 years.

  • switch to compute 1 year from the data (and then import the remaining years that are pre-computed from a data file) for comparing wealth and income.
  • the lecture makes use of data from https://github.com/QuantEcon/high_dim_data but that is external to the lecture. It looks like that repository makes use of git-lfs for larger file storage so there may be a good reason to keep this as a data repository. I recommend we update its name to data-lecture-python or lecture-python-data perhaps? This will be done in a separate PR due to complexity of changing names and the live site at the same time
  • update this to use pandas to replace outliers as code is more intuitive.
```{code-cell} ipython3
# use an average to replace an outlier in labor income gini
ginis_li_new = ginis_li
ginis_li_new[5] = (ginis_li[4] + ginis_li[6]) / 2
```
  • find out why there is such a bit outlier in 1965 in the computation of Gini from labour income for the USA

@mmcky mmcky added the in-work label Feb 22, 2024
Copy link

netlify bot commented Feb 22, 2024

Deploy Preview for taupe-gaufre-c4e660 ready!

Name Link
🔨 Latest commit bb24ba7
🔍 Latest deploy log https://app.netlify.com/sites/taupe-gaufre-c4e660/deploys/660cb6b6b640a40008e70097
😎 Deploy Preview https://deploy-preview-384--taupe-gaufre-c4e660.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

github-actions bot commented Feb 23, 2024

@github-actions github-actions bot temporarily deployed to pull request February 23, 2024 00:55 Inactive
@mmcky
Copy link
Contributor Author

mmcky commented Feb 23, 2024

idea from @jstac re: gini coefficient and high compute costs

How about we show how to compute the gini on small simulated data and then download the gini time series from the world bank or world inequality database?
That way we could get the gini time series for the UK and one or two Scandinavian countries as discussed in the meeting, and do a cross country comparison...

  • implement the above suggestion

@github-actions github-actions bot temporarily deployed to pull request March 1, 2024 02:35 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 1, 2024 02:56 Inactive
```{code-cell} ipython3
ginis_nw, ginis_ti, ginis_li = Ginis
ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this would need to be updated once the csv file is published on GitHub

@mmcky
Copy link
Contributor Author

mmcky commented Mar 7, 2024

@jstac the last thing I need to do is compare USA with some other countries. This lecture is getting pretty long so I was wondering if we wanted to split this up into:

  • Inequality: How do we measure inequality?
  • Inequality: Closer look at income and wealth in the USA
  • Inequality: Cross country comparisons of income inequality

@github-actions github-actions bot temporarily deployed to pull request March 7, 2024 03:21 Inactive
@jstac
Copy link
Contributor

jstac commented Mar 7, 2024

Thanks @mmcky . Perhaps I'll let you make those last additions and then we can review and discuss how to slice and dice...

@github-actions github-actions bot temporarily deployed to pull request March 8, 2024 04:18 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 8, 2024 04:31 Inactive
@mmcky
Copy link
Contributor Author

mmcky commented Mar 8, 2024

@jstac I am about to time out in 20min for this afternoon (home duties) but wanted to give you an update.

I have added:

  • cross-country comparisons in gini between US, UK and Norway
  • a plot of GDP per capita vs gini (for US, UK and Normay) for all years (as a year-path scatter plot). I think it is an interesting graph as it plots a couple of dimensions but I need to declutter the labels by adding a 5 year skip on the year labelling.

No need for any detailed review but I would be interested your thoughts on the additional charts in the new sections.

Preview: https://65ea94ae2e9af43492501c44--taupe-gaufre-c4e660.netlify.app/inequality.html

I need to:

  • add a 5 year skip to the year labels in the gdp per capita vs gini figure
  • update new additions to make them quantecon style-guide compliant (label figures etc.)
  • review all issues linked in the main comments section to make sure we address all comments
  • finalise narrative additions to make the flow of the lecture more cohesive with the new sections
  • proof-read.
  • hand over to @jstac

of people and the cumulative share of income (or wealth).

```{code-cell} ipython3
:tags: [hide-input]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for un-hiding this. It's nice clear code and not too long.

G :=
\frac{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|}
{2n\sum_{i=1}^n w_i}.
$$ (eq:gini)


The Gini coefficient is closely related to the Lorenz curve.

In fact, it can be shown that its value is twice the area between the line of
equality and the Lorenz curve (e.g., the shaded area in the following Figure below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g., ...the following Figure below) -> , as illustrated in "Figure xx"

@jstac
Copy link
Contributor

jstac commented Mar 21, 2024

It occurred to me that the computation of the Gini coefficient could be vectorized:

def gini(y):
    n = len(y)
    y_1 = np.reshape(y, (n, 1))
    y_2 = np.reshape(y, (1, n))
    g_sum = np.sum(np.abs(y_1 - y_2))
    return g_sum / (2 * n * np.sum(y))

If that's not needed, it could still be an exercise asking the reader to produce a faster NumPy version that doesn't use loops and checking that it produces the same output.

@jstac
Copy link
Contributor

jstac commented Mar 21, 2024

Please eliminate fig 5.6 --- the lecture is already long and I don't think it adds much value over 5.8. In 5.8, please add "income" to the figure label.

Please change

"Gini coefficient for US data (income)
Now let’s look at the Gini coefficient using US data."

to

"Gini coefficient for income (US data)"
Let's look at the Gini coefficient for the distribution of income in the US."

@jstac
Copy link
Contributor

jstac commented Mar 21, 2024

I think fig 5.10 is a little confusing when it follows on from fig 5.8. Why does the Gini coefficient for total income look so different? Why are we looking at the income Gini twice? I suggest we drop the comparison and this data, and change the title and start of section 5.3.4 to

"Gini coefficient for wealth (US data)"

In the previous section we looked at the Gini coefficient for income using US data.

Now let's look at the Gini coefficient for the distribution of wealth."

Please then delete

"As we have discussed the Gini coefficient can also be computed over different distributions such as income and wealth."

and suitably modify

"We can use the data collected above survey of consumer finances to look at the Gini coefficient when using income when compared to wealth data."

@jstac
Copy link
Contributor

jstac commented Mar 21, 2024

Please add "for income" to the figure titles for figs 5.12 and 5.13.

@jstac
Copy link
Contributor

jstac commented Mar 21, 2024

Hey @mmcky , many thanks. The lecture is looking very good. I have some more suggestions above. Sorry that they haven't come all at once. It's mainly a case of clarifying and cutting to emphasize the clearest results.

@mmcky
Copy link
Contributor Author

mmcky commented Mar 24, 2024

It occurred to me that the computation of the Gini coefficient could be vectorized:

def gini(y):
    n = len(y)
    y_1 = np.reshape(y, (n, 1))
    y_2 = np.reshape(y, (1, n))
    g_sum = np.sum(np.abs(y_1 - y_2))
    return g_sum / (2 * n * np.sum(y))

If that's not needed, it could still be an exercise asking the reader to produce a faster NumPy version that doesn't use loops and checking that it produces the same output.

Thanks @jstac I think keeping the code simpler is a nice thing -- but I love the idea of making this an exercise.

@github-actions github-actions bot temporarily deployed to pull request March 24, 2024 23:08 Inactive
@mmcky
Copy link
Contributor Author

mmcky commented Mar 25, 2024

@jstac I wrote an exercise but it had a long run time when using us data. So I want to think about making it better using simulation data. I have opened #410 to track the addition so it doesn't have to hold this PR up.

@mmcky mmcky added the ready label Mar 25, 2024
@jstac
Copy link
Contributor

jstac commented Mar 25, 2024

I'm surprised this is green. The build is failing after the code cell starting # Fetch gini data for all countries

@mmcky
Copy link
Contributor Author

mmcky commented Mar 25, 2024

Well that is fascinating. It is working locally -- so I suspect the wb database thought some ip was requested too much data and returned the 400. But fascinating that an API Error doesn't get recongised as a python error.

@github-actions github-actions bot temporarily deployed to pull request March 25, 2024 23:52 Inactive
@mmcky
Copy link
Contributor Author

mmcky commented Mar 25, 2024

@jstac the new run worked OK

https://66020e1f7cfe606404b17ca6--taupe-gaufre-c4e660.netlify.app/inequality

I have re-enabled build failure on warnings which takes an extra hardline on these types of warnings and failures so any build failures will prevent the CI from running fully.

@github-actions github-actions bot temporarily deployed to pull request March 26, 2024 00:54 Inactive
@jstac
Copy link
Contributor

jstac commented Mar 30, 2024

Thanks @mmcky ! It's looking great. Minor comments:

  • This sentence needs to be adjusted: "Let us zoom on the US data so we can more clearly observe trends."
  • I wonder if we should remove the discussion starting "Looking at each data series we see an outlier in Gini coefficient computed for 1965 for labour income." It's not clear why the labor income distribution is so different to the income distribution in fig 5.7. I feel like we either need to explain that or get rid of the labor income / wealth comparison.
  • "The wealth time series exhibits a strong U-shape." should probably be after fig 5.8.
  • "As we saw earlier in this lecture " -> "Earlier in this lecture"
  • "western economies" -> "Western economies"

@mmcky
Copy link
Contributor Author

mmcky commented Apr 1, 2024

I wonder if we should remove the discussion starting "Looking at each data series we see an outlier in Gini coefficient computed for 1965 for labour income." It's not clear why the labor income distribution is so different to the income distribution in fig 5.7. I feel like we either need to explain that or get rid of the labor income / wealth comparison.

@jstac I would agree with this comment - I think we should get rid of the labour income comparison. I don't have a definitive reason why the series looks so different to Fig 5.7 (income). #412

@github-actions github-actions bot temporarily deployed to pull request April 1, 2024 23:13 Inactive
@mmcky
Copy link
Contributor Author

mmcky commented Apr 1, 2024

thanks @jstac for your review and comments. I have addressed the latest round of feedback.

@jstac
Copy link
Contributor

jstac commented Apr 3, 2024

Many thanks @mmcky . The lecture looks great. I've pushed some minor edits. If it builds correctly then please go ahead and merge.

@mmcky
Copy link
Contributor Author

mmcky commented Apr 3, 2024

thanks @jstac

@github-actions github-actions bot temporarily deployed to pull request April 3, 2024 02:02 Inactive
@mmcky
Copy link
Contributor Author

mmcky commented Apr 3, 2024

thanks @jstac for all your excellent comments and feedback. Will incorporate lessons learnt re: style in the next review.

@mmcky mmcky merged commit bce65a8 into main Apr 3, 2024
6 checks passed
@mmcky mmcky deleted the review-inequality branch April 3, 2024 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[inequality] Update lecture to include Lorenz Curve (as a function) rather than from quantecon package
2 participants