If 80% of data science work is data wrangling, 80% of your impact is through visualization.
Hans Rosling is one of the most popular data scientists on the web. His original TED talk was viral among my friends when it came out. We are going to create some graphics using his formatted data as our weekly case study. Note that we need to remove Kuwait from the data (discussion on this)
- Complete a review of 2-3 different data visualizations used to answer specific questions. Some fun websites are pudding.cool, wonkblog, fivethiryeight, and priceonomics (but you can use any website, blog, or article with a good visualization).
- Make sure you are in our Slack workspace.
- Finish setting up VScode for programming in R and Python.
- Finish setting up Rstudio.
- Finish installing Git.
- Finish creating your Github account and connecting to our organization.
- Recreate the two graphics in this repo using
gapminder
dataset fromlibrary(gapminder)
(get them to match as closely as you can).- Use
library(tidyverse)
to load ggplot2 and dplyr and thetheme_bw()
to duplicate the first plot. - Use
scale_y_continuous(trans = "sqrt")
to get the correct scale on the y-axis. - Build weighted average data set using
weighted.mean()
and GDP withsummarise()
andgroup_by()
that will be the black continent average line on the second plot. - Use
theme_bw()
to duplicate the second plot. You will need to use the new data to make the black lines and dots showing the continent average. - Use
ggsave()
and save each plot as a .png with a width of 15 inches.
- Use
- Recreate the two graphics in this repo using the
gapminder
dataset fromlibrary(gapminder)
(get them to match as closely as you can).
- Slack quick start guide
- VScode and interactive Python in VScode
- Git, Python, R installation.
- Rstudio and Using Git within Rstudio
- Github (Please carefully think about your Github username. It is for business use.)