ITAQA is a framework that aggregates Italy air quality data, collecting automatically measurements from different sources. It provides scripts and utilities to query data, analyze it and create interactive graphs
ITAQA was created in the beginning of 2020 to answer this question:
As a consequence of the COVID-19 pandemic in Italy, will there be a measurable effect on the air pollution due to the abrupt lifestyle change (more people working from home, less traffic) after the enforcement of lockdowns? If yes, can this reduction be correlated with precision to these measures?
To verify this conjecture it would be preferable to have a single uniform data set to use as input
The main providers of pollution data in Italy are ARPA agencies, (Agenzie Regionali per la Protezione Ambientale). Unfortunately, they are regional agency, providing data only for their region of competence (meaning 21 different agencies, providing data in different formats)
The main purpose of ITAQA is to automatically download and aggregate national air pollution data in an uniform and accessible way, providing data analysis tools
(You can read more on ITAQA's origin in this post)
- Create a single place to orchestrate the download of pollution data for all Italy regions (from ARPA websites) and to aggregate the collected data in an uniform data structure (the same for every region) that can be saved/loaded quickly
- Create a tool to plot pollution data, able also to create intuitive visualizations/comparison between regions
- Search correlation between big "behavior-changing" events and air pollution
- Make the entire framework usable also by non-technical users (GUI creation)
The core concepts of ITAQA are:
- AirQualityStation (or AQS) objects: single sensors holding time indexed pollution data for specific locations
- AirQualityStationCollection (or AQSC) objects: collections of AQS easy to update, save, query
- Crawlers: scripts that obtain pollution data for a specific region and time, downloading and aggregating it from ARPA websites directly into AQSC
Clone the repository (git clone git@github.com:albertosantagostino/ITAQA-air-quality-aggregator.git
), check that you have Python 3.8 (python --version
) and install all the needed packages (directly via poetry install
, refer to Poetry's documentation)
The main entrypoint is the script itaqa.py
. Run it using the -h
parameter to see the built-in help
itaqa.py [MODE] [parameters] [-h]
Available modes
download Download new data, save the AQSC in dump/REGION/
update Update existing AQS collection with new data
view Enter interactive GUI mode to view and plot AQS data
test Run unit tests (pytest)
sandbox Run sandbox
If you want to perform a data download to check if everything is working, you can perform the following steps:
Data download
Download data from Lombardia for the first month of 2020:
python3 itaqa.py download --region lombardia --min_date 20200101 --max_date 20200201
The message "Download completed!" indicates the successful download and serialization of all the requested data. Air quality information for the specified period has been stored in a AQSC object (a special collection that encapsulates multiple AQS, saved as a .msgpack
file)
Data analysis and visualization
While ITAQA is currently meant for developers, a basic GUI to explore and view data is ready to be used. Start the GUI with the command:
python3 itaqa.py view
Select the folder where the data is stored to see the AQS objects contained. Select and plot the desired ones using the correspondent button.
As an alternative, data can be loaded and explored directly editing the sandbox section of itaqa.py
, and invoking the main script in this way:
python3 itaqa.py sandbox
Why make this? There are already air quality websites that collect this kind of data
First of all: there is no public place (as far as I know) where air pollution data is collected in an uniform way for the entire country
The main use case is to investigate the thesis above ("Did the lockdown have a measurable effect on air quality? In what measure?"), meaning that I want to produce plots where this correlation is clearly visible
Nevertheless, the aim of this project is broader: create a set of reusable air quality analysis tools that unify all sources from the regions of Italy
You said ARPAs provide data in different formats. Why?
I assume that each ARPA can decide how to handle and distribute data to the public and unfortunately it seems they never talked to each other
Even the websites are all completely different (ARPA Piemonte vs ARPA Lombardia vs ARPA Emilia-Romagna vs ARPA Toscana)
Maybe there is some "internal uniform data format" they use to exchange data, but if it exists, is not available to the public
Do you plan to publish visualizations/plots produced? Where?
Yes! As soon as I have something worth showing, I will write an article in my blog
Can I contribute? What could I do?
Yes! Refer to CONTRIBUTING to find all the needed information (setup, developer environment, workflow)
The main thing that needs heavy work right now is the creation of region-specific crawlers. Right now only a couple of them are fully implemented. Refer to the table below to see the status
This project and its source code are distributed under the GNU General Public License v3.0