Skip to content

Product based application for measuring urban expansion using three machine learning algorithms: Random Forest (RF), Extratrees (ET), and Logistic Regression with regularization

Notifications You must be signed in to change notification settings

andreanr/UrbanExpansion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UrbanExpansion

In this study we are proposing the application of three machine learning algorithms: Random Forest (RF), Extratrees (ET), and Logistic Regression with regularization. Extremely Randomized Trees, or Extratrees, are a variant of the RF classifier (Geurts et al. 2006) that use the entire sample at each step with randomly picked decision boundaries (variables). Some advantages of ET against RF are: (1) ET have less computational cost, (2) the randomization makes the decision boundaries smoother, and (3) tends to avoid overfitting.

Data Sources

Built Up Grid
Data contain an information layer on built-up presence as derived from Sentinel1 image collections

  • Source: Global Human Settlements
  • Temporality: 1990, 2000 and 2014
  • Format: Raster with 250 m2 resolution

Population Grid
Generated using census data combined with built-up index and aerial weights to generate the spatial distribution expressed as the number of people per cell.

  • Source: Global Human Settlements
  • Temporality: 1990, 2000 and 2015
  • Format: raster with 250 m2 resolution

Digital Elevation model (DEM)
SRTM 90m Digital Elevation Database v4.1

  • Source: NASA
  • Format: raster with 90m2 resolution

City Lights

  • Source: NOAA
  • Temporality: 1995, 2000 and 2013
  • Format: raster with 250 m2 resolution

Highways

  • Source: Open Street Maps
  • Temporality: starting from 2008
  • Format: lines geometry

Geolocations: airports, schools, universities, worship places and hospitals

  • Source: Open Street Maps
  • Temporality: starting from 2008
  • Format: points geometry

Water Bodies
Provides a basemap for the lakes, seas, oceans, large rivers, and dry salt flats of the world.

  • Source: Esri Data and Maps
  • Format: polygons geometry

Dependencies

  • Python 3.5.2
  • luigi
  • psql (PostgreSQL) 9.4
  • PostGIS 2.1.4
  • geos
  • gdal
  • geopandas
  • ...and many Python packages (see requirements.txt)

Repo Structure and How to Run

In order to run the pipeline you have to change these configuration files for the new values and run the following commands.

Configuration Files:

-pipeline/luigi.cfg will need to be configured to run luigi
-pipeline/experiment.yaml will need to be configured for the models and features to run
-pipeline/.env will need to be configured to connect to databases (make a copy from pipeline/_env)

Run the following commands:

If run locally (choose the number of workers):

python -m luigi --local-scheduler --workers 10 --module UrbanExpansion RunUrbanExpansion

If run on luigi server:

python3 -m luigi --workers 10 --module UrbanExpansion RunUrbanExpansion

Data Pipeline

Once you have set up the environment, you can start using the pipeline. The general process of the pipeline is:

  • Process of downloading data
  • Preprocess (to generate slope and city center)
  • Inserting to db
  • Generating Grids
  • Generating Feature Grids
  • Generating Urban Clusters
  • Generating Urban Feature Grids
  • Generating Features and Labels
  • Run Models
  • Store Models in results schema

The results schema is populated in this stage. The schema includes the tables:

  • evaluations: metrics and values for each model (ex. precision@100)
  • feature_importances: for each model, gives feature importance values as well as rank (abs and pct)
  • models: stores all information pertinent to each model
  • predictions: for each model, stores the value for each cell

About

Product based application for measuring urban expansion using three machine learning algorithms: Random Forest (RF), Extratrees (ET), and Logistic Regression with regularization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published