Skip to content

Latest commit

 

History

History
44 lines (38 loc) · 2.43 KB

README.md

File metadata and controls

44 lines (38 loc) · 2.43 KB

U.S. Pollution

A Python notebook for analyzing pollution CSV data.

About the Data:

  • Aims to analyze the change in air quality index (AQI) for 4 major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide, Ozone) that cover all 50 states of the United States from 2000 to 2016
  • Procured from the Environmental Protection Agency (EPA)
  • Consists of daily pollution data for the 4 major pollutants over 16 years for all 50 states (382 MB, 1746661 lines)
  • Key points:
    • Organized by Date, dates repeat per county where the source data was acquired
    • NO2 and SO2 are in parts per billion while O3 and CO are in parts per million
    • The max hour describes what hour of the day the AQI was highest

Methods:

  • Reading Data:
    • Pandas to read the CSV
    • Data relatively unorganized1 but contained diverse information
      • Grouped data by date
      • Calculated the means for each state
  • Visualization:
    • Matplotlib - Graphs
    • Cartopy - Maps
    • Types of Visualizations:
      • Multi-line charts
      • Bubble maps
      • Multivariate linear regressions
      • Heat maps

Results:

Line Charts Map Visualizations
image image
image image

Discussion:

  • CO and NO2 have decreased while O3 and SO2 have largely remained stagnant
  • O3 and NO2 are recently the more prevalent pollutants compared to CO and SO2
  • When looking at results, be aware of:
    • Results may be affected by holes in the data as shown in the multi line charts and the heat maps
    • Differentiating between interpreting results on actual AQI index versus the change on various factors
    • Analyzing on case-by-case basis per pollutant is necessary to understand trends and how factors such as laws may affect pollution
  • Awareness of air pollution is one big factor in the slow decline in some pollutants
1 In comparison to other existing datasets