UNIFIED GLOBAL LANDSLIDE CATALOGUE

Point catalogue

🔴 Authors

@Saverio Mancino - PhD Student (Dept. Geo-enviromental science - University of Bari).
@Anna Sblano - Research Fellow (Dept. Geo-enviromental science - University of Bari).
@Francesco Paolo Lovergine - Researcher (Institute for the Electromagnetic Survey of the Atmosphere - National Research Council of Italy).
@Giuseppe Amatulli - PhD Researcher (School of the Environment - Yale University).
Domenico Capolongo - PhD Professor (Dept. Geo-enviromental science - University of Bari).

🔴 Project description

The main purpose of this project is to generate a single global-scale landslide dataset that collects and standardizes within it all open source global, national, and sub-national landslide datasets, provided with spatial and temporal accuracy details, along with several general information about each single record.

🔴 License

The whole code is published under the MIT License.

🔴 Publication Notice

The catalogue, along with its associated analyses, methodologies, and results, is planned for publication by Spring 2025.
This release will include comprehensive documentation and datasets, making the information fully accessible to the scientific community and the public.
Stay tuned for updates as we approach the release date.

🔴 Data availability

The catalogues data are available in these googledrive repositories, as full catalogue in .csv format (with '|' as separator) and as a tile in .gpkg format (see tiling scheme). They're distributed under the licence Creative Commons Attribution 4.0 International (CC BY 4.0)

CATALOGUE TYPE	NUMBER OF RECORDS	FILES REPOSITORY
UGLC Point Catalogue files	1061450 points	FULL CATALOGUE download ⬇️
		TILED CATALOGUE download ⬇️
UGLC Polygonal Catalogue files	984126 polygons	POLYGONAL CATALOGUE

🔴 Attribute fields summary

ATTRIBUTE	TYPE
WKT_GEOM	Well known text
NEW DATASET	String
ID	Int
OLD DATASET	String
OLD ID	String
VERSION	String
COUNTRY	String
ACCURACY	Int
START DATE	Date
END DATE	Date
TYPE	String
PHYSICAL FACTORS	String
RELIABILITY	Int
RECORD TYPE	String
FATALITIES	Int
INJURIES	Int
NOTES	String
LINK	String

🔴 Attributes description

WKT_GEOM: The contents of this field contain information about the georeferencing of each point described in the dataframe using the WGS84 reference system.
NEW DATASET: the content of this field represents the name of the new dataframe's identifying abbreviation: "UGLC".
ID: the content of this field contains a unique ID for each landslide event included into the UGLC dataset.

OLD DATASET: the contents of this field represent the name of the native dataset used into the UGLC creation:

POINT DATASET

REFERING	NAME	N° POINTS	LICENSE	DOWNLOAD	IMPLEMENTED
01_COOLR	Cooperative Open Online Landslide Repository (NASA) Event + Report points (with no duplicates)	49718	LICENSE	free	✔️
02_GFLD	Global fatal landslide occurrence from 2004 to 2016	5490	LICENSE	free	✔️
03_ITALICA	ITAlian rainfall-induced LandslIdes CAtalogue (CNR - IRPI)	6312	LICENSE	free	✔️
04_UAP	Landslide Inventories across the United States version2 (USGS)	176427	LICENSE	free	✔️
05_ALC	Australia Landslide Catalogue	1653	LICENSE	free	✔️
06_PCLD	Preliminary Canadian Landslide Database	10134	LICENSE	free	✔️
07_RBR	Shallow Landslide Inventory for 2000-2019 (eastern DRC, Rwanda, Burundi)	7945	LICENSE	free	✔️
08_NZK	Map of co-seismic Landslides for the 7.8 Kaikoura earthquake, New Zealand	7355	LICENSE	free	✔️
09_CA	Mass Movements Information System (SIMMA) of the Colombian Geological Service	1065	LICENSE	free	✔️
10_BGS	National Landslide Database - Index data (BGS)	15050	LICENSE	on demand	✔️
11_NTMI	Landslide Events Data (GSI)	2811	LICENSE	free	✔️
12_VLS	Vermont Geological Survey's preliminary landslide inventory	3049	LICENSE	free	✔️
13_SLIDO	Statewide Landslide Information Database for Oregon (DOGAMI)	15378	LICENSE	free	✔️
14_1N	1N (2015-2027): French Landslide Observatory – OMIV (Temporary data)	194	LICENSE	free	✔️
15_CAFLAG	The CAmpi Flegrei LAndslide Geodatabase	2302	LICENSE	free	✔️
16_ETGFI	ETGFI - Earthquake-Triggered Ground-Failure Inventories (POINTS) - USGS	115402	LICENSE	free	✔️
17_IFFI	IFFI - Inventario fenomeni franosi in Italia (ISPRA)	622447	LICENSE	free	✔️

OLD ID: the contents of this field represent the identifying id assigned to this row in the source dataset (if any)
VERSION: the contents of this field represent the latest updated version of the original dataset used (if specified)
COUNTRY: the content of this field represents the country where the record was located (where missing it was derived using its coordinates)
ACCURACY: the content of this field represents the precision in meters of the relative deviation of the geo-referenced point from the actual landslide (if there is one), where it is not clearly specified is inferred based on the information present in the record. While the total absence of accuracy information becomes a NaN value for identify spatially uncertain records, represented by the value '-99999'.
START DATE: the contents of this field represent the date of the record (if specified exactly in the source dataset) and in that case it will coincide with the END DATE field (format:ISO 8601:YYYY/MM/DD). In case the record date is not present or clearly explicit, this field will contain the start date of the dataset acquisition time range; so the date inside this field will not be coincident with the END DATE field, implying the temporal uncertainty of that record. In case of records where start date could not be derived at all, or if the record start date is before '1677/12/31', this field will be set as '1678/01/01' due to pandas time limit.
END DATE: the contents of this field represent the date of the record (if specified exactly in the source dataset) and in that case it will coincide with the START DATE field (format:ISO 8601:YYYY/MM/DD). In case the record date is not present or clearly explicit, this field will contain the end date of the dataset acquisition time range; so the date inside this field will not be coincident with the START DATE field, implying the temporal uncertainty of that record.

TYPE: Contains information about the geological and kinematic type of the landslide record, standardized using the extended classification of Varnes including also other common gravitational surface instability phenomena (Hungr et al., 2014). These type categories are standardized using this reference table:

LANDSLIDE CATEGORY
(description)
complex
soil creep
debris flow
earth flow
lahar
earth slide
mudslide
riverbank collapse
rock slide
rock fall
rotational sliding
translational sliding
earth spreading
rock spreading
mud flow
sinkhole
ND

PHYSICAL FACTORS: This field encompasses the physical factors contributing actively to the landslide activation, categorized into predisposing (PR), preparatory (P) and triggering (T) factors. Predisposing factors include invariant characteristics such as geology, topography, and land use; preparatory factors refer to monitorable cyclical changes like seasonal variations in saturation, weathering, or fire-induced alterations while triggering factors involve impulsive events such as earthquakes, intense rainfall, or volcanic activity. The category of Predisposing factors (PR) was not considered in our classification because it was absent in the native data. Therefore, only the categories of Preparatory (P) and Triggering (T) factors were considered in the classification of physical factors of landslides in this catalog. These categories are standardized using this reference table:

PHYSICAL FACTORS	IDENTIFYING ABBREVIATION
(description)	(value)
Rainfall activity	rainfall (T)
Seismic activity	seismic (T)
Volcanic activity	volcanic (T)
Human-induced factors	anthropic (T,P)
Climatic factors	climate (T,P)
Post-fire conditions	postfire (P)
Post-deforestation processes conditions	deforestation (P)
Erosional and biological factors	natural (T,P)
Not defined	ND

RELIABILITY: the content of this field represents the reliability of the data based on a decision table that takes into account spatial accuracy (ACCURACY) and temporal accuracy (START DATE, END DATE):

SPATIAL RELIABILITY	TEMPORAL RELIABILITY	RELIABILITY DESCRIPTION	CLASS
(meters)	(START DATE = END DATE)	(Description)	(value)
( <100 m )	TRUE	Exact point	1
( <100 m )	FALSE	Almost exact point	2
( >100 m and <250 m )	TRUE	Very high reliability point	3
( >100 m and <250 m )	FALSE	High reliability point	4
( >250 m and <500 m )	TRUE	Medium reliability point	5
( >250 m and <500 m )	FALSE	Low reliability point	6
( >500 m and <1000 m )	TRUE	Very low reliability point	7
( >500 m and <1000 m )	FALSE	Poor reliability point	8
( >1000 m )	TRUE and FALSE	Point with uncertain reliability	9
( -99999)	TRUE and FALSE	Unreliable point	10

RECORD TYPE: The contents of this field contain information regarding the record type: report, event.
- Report catalogs are usually landslide reports that typically collect a lot of detailed technical information about individual landslide events.
- Event catalogs, on the other hand, generally focus on summarizing landslide events triggered by episodic events (such as heavy rains, earthquakes, eruptions, etc.) with less technical information and more statistical details, without delving into the specifics of each event.
FATALITIES: the content of this field contains the number of fatalities related to the record (if explicit), where the NaN values are represented by the value -99999
INJURIES: the content of this field contains the number of injuries related to the record (if explicit), where the NaN values are represented by the value -99999
NOTES: the content of this field contains the notes and information relate to the record (if explicit)
LINKS: the content of this field contains the link to the source of the record report or study (if explicit)

🔴 Folder Structure

Folder Structure Scheme

The entire UGLC structure is allocated in 2 main repositories:

GitHub Scripts Repository (GSR)
Drive Files Repository (DFR)

The GSR contains 5 main folders :

/input

This folder contains the "native_datasets" subfolder, which contains the standardizer scripts ("N_DATAFRAME_standardizer.py") which read the downloaded files into the DFR 'input/download' subfolder (containing native datasets as .csv/.shp/.gpkg etc. downloaded from the source sites (Entities, Government agencies, Universities, Various repositories etc.) and create a standardized .csv ready to be converted into the UGLC format, and save it (as "N_DATAFRAME_native.csv") into the DFR 'input/native_dataset' subfolder.
/csv

This folder contains one subfolder named after each different native datasets ("N_DATAFRAME") contains the converting scripts ("N_DATAFRAME_converter.py") and the lookup tables ("NN_DATAFRAME_lookuptables.json") which read the native datasets from the DFR 'input/native_dataset' subfolder, then filter and convert each one into the UGLC standard format, using also the functions from the 'lib' folder, and save them (as "N_DATAFRAME_converted.csv") into the DFR 'output/converted_csv' subfolder.
/output

This folder contains the unifier script ("unifier.py") that read all the converted datasets from the DFR 'output/converted_csv' subfolder, then merge and filter them for generating the final UGLC dataframe ("UGLC_point_full.csv") and the tiled verion ("UGLC_point_tile_i_j.gpkg"), saving everything into the DFR 'output' folder.
/lib

This folder contains the functions script ("function_collection.py") which are called from the converter scripts into the GPR for various data conversion.
/files

This folder contains all the files used by this readme file, like pictures and the license file.

🔴 Tiling system

The UGLC catalog is also available in GeoPackage format, divided into 105 tiles that cover the entire Earth's surface. Each tile includes a Tile_ID attribute (_i_j) for unique identification within the grid:

i (longitude step) = [0-15]
j (latitude step) = [0-7]

Empty tiles are automatically excluded from storage, ensuring optimized file management and performance.

🔴 Catalogue Data Analysis

In order to better understand the information content of both catalogues (point and polygonal), several statistical analysis were conducted to explore key aspects of the contained data. This information is essential to ensure appropriate and targeted use of the catalogues, highlighting their potential for future scientific developments.
The analysis demonstrates a pronounced disparity in the geographical distribution of landslide records across continents. Within the point catalogue, Europe exhibits the highest representation (61.55%) followed by North America (19.63%) and Asia (10.17%). Africa, South America and Oceania collectively constitute a really low share (below 3.97%).
While, the polygonal catalogue presents a different distribution pattern, with Asia leading with Europe (45.09% and 43.40%), followed by North America (8.73%). Also in this case, Africa, South America and Oceania collectively constitute a negligible portion (1.43%).

This imbalance becomes more apparent by going into more detail with a state-by-state analysis, showing how native datasets represent landslide records with an unbalanced distribution in both density and geographic distribution.
Particularly from the state-wise density data, it can be seen that some relatively small states like Italy, UK, New Zealand, etc. lead the landslide data collection along with large countries such as the USA and China (sometimes heavily surpassing them, as in the case of Italy, which alone contributes more than 57% of the whole catalogue).
This shows a different attention to landslide phenomena in more affected countries, also highlighting a different socioeconomic influence devoted to the study and analysis of landslides in different countries. Although, in contrast to the high density of studies available for these regions, much of the data (particularly from European and Asian areas) are not openly accessible.
Consequently, the analysis was affected by restrictions applied to certain datasets that are not publicly available.

Temporal consistency analysis highlighted the heterogeneous data time consistency across datasets, that required a significant effort to standardize and interpret temporal data while addressing discrepancies in formatting and granularity.
Native datasets varied widely in their time precision, ranging from exact event dates to broader temporal ranges (e.g., decades or centuries). For records with incomplete or poorly formatted temporal data, standardization efforts involved assigning representative time ranges based on the available context, ensuring logical alignment with the recorded phenomena. This approach not only improved temporal consistency but also enhanced the utility of the catalogue by preserving valuable, albeit imprecise, historical data. This aims to mitigate the risk of data misinterpretation resulting from inconsistent native data formats, providing a temporal reliable catalogue.

Along with temporal accuracy, spatial accuracy is a critical factor in cataloguing these phenomena, as it determines the geospatial reliability of each record.
Native data sets often presented difficulties, including poorly formatted coordinates, varying levels of precision, and inconsistencies in georeferencing methods. To address these issues, a standardized spatial accuracy parameter was established, allowing for a consistent representation on a meter scale of the spatial reliability of each record.
Accuracy was converted when native data provided were on other scales, while for records with incomplete or ambiguous location data, an expert interpretation was employed to estimate the probable accuracy. This process involved cross-referencing auxiliary information, such as nearby landmarks or descriptive metadata, to determine coordinates that closely approximated the event's actual location.
This methodology ensured that even imprecise data could be meaningfully integrated, significantly reducing the proportion of records categorized as no-data in spatial accuracy. The resulting accuracy distribution ranges from highly precise values (<10 meters) to broader approximations (>10 kilometers), reflecting the inherent variability in quality and reporting practices of the source

Therefore, the reliability attribute introduced in this catalogue, calculated on the basis of spatial and temporal accuracy, reflects the general robustness of each individual record after the standardization processes.
Showing for both catalogues (point and polygonal), an extremely high record reliability (class 1 and 2), whereas only in the point catalogue, the data with a lower reliability class together do not exceed 15% of the catalogue.
However, all the spatial and temporal standardization process establishes a reliable framework, summarized by the reliability class parameter.
Making the catalogue suitable for future precise applications such as spatial modelling, risk assessment, and policy development.

From further data analysis, it was also possible to highlight the distribution of the different standardized landslide types found within the unified catalogue with detail also on the variance of each type based on the information in the native record.
A major difficulty in the creation of this huge standardized catalogue was the condensation of heterogeneous data to achieve information consistency. Especially in a context such as geology, where extreme variance in the nomenclature of different types is often a stumbling block in data intercommunication.
In fact, the observed variance reflects the extent to which native data sources contributed to the different interpretation of each standardized landslide type.
This diversity comes from the consolidation of extremely heterogeneous datasets, in which different terminologies, classification schemes, and levels of granularity were harmonized into standardized categories, while also recovering data on the large amount of typos and data entry errors. Types with greater variance, such as “complex” or “earth flow,” therefore indicate the presence of a higher rate of interpreted data than data on the natively more unambiguous and consistent and therefore easily interpreted typology such as for “rockfall” or “sinkhole” types.

It was also possible to analyse the distribution of various physical factors associated to each landslide catalogued record.
The graph reveals a higher prevalence of missing informations about physical factors, followed by Triggering factors (T) and Preparatory factors (P), without representation of Predisposing factors (PR).
The dominance of common Triggering factors like rainfall and seismic activity, highlights the statistical prevalence of these phenomena in native catalogues. However, this distribution is also clearly influenced by the uneven geographical coverage of the data, where landslides tend to occur more frequently in regions where these triggering factors are more prominent, underscoring the need to address spatial heterogeneity in future data collection to enhance global representativeness.

Analyzing the overall distribution of the various standardized landslide types in the catalogue shows how the frequencies of each type of landslide vary widely.
The graph reveals how the undefined categories ('ND') are the majority, showing native datasets lack of information on the kinematics for each landslide record.
However, for non-null categories, the types 'complex', 'earth slide', 'rock fall' and 'soil creep' are the most prevalent, while types such as 'lahar' and 'earth spreading' are minimally represented.
The spatial heterogeneity of the dataset is evident, with dense clusters in regions widely studied as more climatically and geologically active, such as South Asia and Central America, and under-representation in areas such as Africa and Russia due to data gaps related to likely difficulty in mapping or restrictions in data availability.
The impact of data availability and uneven data resolution on the global representation of landslides is highlighted even more.

Name		Name	Last commit message	Last commit date
Latest commit History 382 Commits
.idea		.idea
csv		csv
files		files
input		input
lib		lib
output		output
README.md		README.md
config.env		config.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNIFIED GLOBAL LANDSLIDE CATALOGUE

Point catalogue

🔴 Authors

🔴 Project description

🔴 License

🔴 Publication Notice

🔴 Data availability

🔴 Attribute fields summary

🔴 Attributes description

🔴 Folder Structure

🔴 Tiling system

🔴 Catalogue Data Analysis

About

Releases

Packages

Contributors 2

Languages

UnibaGEO/UGLC_point

Folders and files

Latest commit

History

Repository files navigation

UNIFIED GLOBAL LANDSLIDE CATALOGUE

Point catalogue

🔴 Authors

🔴 Project description

🔴 License

🔴 Publication Notice

🔴 Data availability

🔴 Attribute fields summary

🔴 Attributes description

🔴 Folder Structure

🔴 Tiling system

🔴 Catalogue Data Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages