AQI Stations Scraper

I made this small Python utility in order to keep an updated record of the historical data for the Air Quality Index for the >180 Japanese Air Monitoring stations, as I needed this data for my PhD research. I am currently scraping from the site aqicn.org, which collects data from over 12,000 air monitoring stations.

Since I couldn't find an API to access the historical data (at the time of writing, you can only fetch current AQI values for any given location through the current API) and I had been wanting to test the web scraping capabilities of the selenium package for a while, I developed a (quite hacky) way of automatically fetching all of the individual csv files with the complete historical data, which can be found in the data/japan-aqi directory.

I wanted to test the CI/CD capabilities of Github Actions too (see the .github/workflows/actions.yml directory for the instructions), so I set up a scheduled trigger to run the workflow every Sunday at 2:00 AM UTC and update the datasets with new data.

Running locally

In case you want to run a local instance, you will need to first clone the repo and generate a .env file in the root directory including the following variables which will then be used when doing the requests to the site:

USER_FULL_NAME = 'Your name'
USER_EMAIL = 'Your email'
USER_ORGANIZATION = 'Your org'

To reproduce the environment you will need to use poetry to install the dependencies, which you can install either by running (recommended):

curl -sSL https://install.python-poetry.org | python3 -

Or if you want to use pipx:

pipx install poetry

You can check the official poetry docs to see the up-to-date installation instructions.

With a working version of poetry running in your system, just run:

poetry install

Which will install the dependencies defined in pyproject.toml. You should now be able to use:

poetry run python japan_aqi.py

To run the scraping script which should download the available files at the time.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github/workflows		.github/workflows
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_stations_info.py		get_stations_info.py
japan_aqi.py		japan_aqi.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AQI Stations Scraper

Running locally

About

Releases

Packages

Contributors 2

Languages

License

AlFontal/aqi-stations-scraper

Folders and files

Latest commit

History

Repository files navigation

AQI Stations Scraper

Running locally

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages