This repository contains the code, data, and figures for the Internet Intelligence Research Lab's Sigcomm '23 paper, Destination Unreachable: Characterizing Internet Outages and Shutdowns by authors Zachary S. Bischof, Kennedy Pitcher, Esteban Carisimo, Amanda Meng, Rafael Bezerra Nunes, Ramakrishna Padmanabhan, Molly Roberts, Alex C. Snoeren, and Alberto Dainotti. The full paper is publicly available on the ACM Digital Library.
The ./data/
directory contains the data files required for analysis. Currently, only the dataset created by the paper's authors is available in this repository. All other datasets mentioned below were created and managed by other groups. If you require specific versions of the datasets for the purposes of replication, please refer to our Read-Only snapshot of the project's notebooks and datasets on the lab's Deepnote page here. All dataset snapshots are available in the data
folder (described below). If you have any issues accessing this data, please feel free to open an issue on this repository or contact the lead author via email.
Description of files/directories:
./data/ioda/ioda_investigated_outages_cleaned_phase1+2.csv
: contains our manually curated list of IODA outages and shutdowns used throughout the paper. This includes all data validation work performed by DataWorks. (Included in this repository)./data/kio
: contains CSV files for each publication of AccessNow's #KeepItOn dataset (available here). This data is used throughout the full paper. To access current versions of this dataset, visit AccessNow's #KeepItOn page, scroll down to a button labelled "DOWNLOAD THE 2016-2022 STOP DATA" (note: the exact wording may change as additional annual updates are posted). This should open a Google Spreadsheet with multiple sheets, each sheet representing the annual dataset publications (e.g., 2016-2018, 2019, 2020, etc.). Our paper used a snapshot of this data, retrieved on July 12, 2022 by exporting each sheet as a CSV file../data/polisci
: contains the political and economic indicators that were used throughout the analysis in Sec 5.1 and 5.2 of the paper. Democratic indices are made available by V-dem, macroeconomic data (e.g., GDP per capita, prevalence of broadband Internet access) was obtained from The World Bank, information on coups was collected as part of the Global Instances of Coups dataset, data on elections made available by the International Foundation for Electoral Systems-Election Guide, and data identifying protest events was obtained from the Mass Mobilization in Autocracies Database. For our work, all datasets were downloaded on January 23, 2023.- The V-dem dataset can be downloaded here. We used the "Country-Year: V-Dem Full+Others" dataset. Downloading the dataset requires that you provide an email. Our analysis used Version 12, which can be downloaded via the dataset archive page.
- Macroeconomic and the prevalence of broadband Internet access data can be obtained by visiting The World Bank's DataBank's World Development Indicators. For our analysis, we used the DataBank portal to select the series "GDP per capita (current US$)" (code:
NY.GDP.PCAP.CD
) and "Fixed broadband subscriptions (per 100 people)" (code:IT.NET.BBND.P2
) for all countries during the period of study (2018-2021). After selecting the appropriate filters in the portal, data can be downloaded via the "Download options" button. - Data on coups can be downloaded on the Global Instances of Coups page. The dataset is available via the "List of coups by country" link.
- Data on elections can be obtained from the International Foundation for Electoral Systems-Election Guide. Downloading the full dataset requires that users click "Request Data Access" and complete a brief survey. Data can be downloaded as a spreadsheet or via an API. For our analysis, we obtained a spreadsheet of the full dataset. Additional information on accessing the data is available on Election Guide's about page.
- Data on protest events can be obtained from Mass Mobilization in Autocracies Database's downloads page. Our analysis used Version 4.0, via the link "MMAD Reports, Version 4.0". At the time of our work, this version was available for direct download. However, access to previous versions of the dataset requires users to create a free account. Upon subsequent updates to this dataset, downloading this specific version of the dataset may require an account.
./data/access-networks
: contains datasets related to autonomous system (AS) characteristics. This includes: CAIDA's prefix to AS mappings (available here), MaxMind's geolocation data (available here), and APNIC's eyeball dataset (available here). These datasets are used in Sec 5.1 of the paper.- CAIDA's prefix to AS mappings are publicly available here. Our analysis used a snapshot of CAIDA's prefix2as data from 2022-03-01 (available here).
- Our usage of MaxMind's geolocation data leveraged MaxMind's GeoLite Free version, using a snapshot from January 11, 2022. Though GeoLite Free access is freely available, accessing the data does require creating an account. More information on this process is available here.
- APNIC's eyeball dataset is updated regularly (available here). Our analysis uses data scraped from this page on March 22, 2022. Unfortunately, APNIC does not publish historical versions of this dataset. However, the specific snapshot used in our analysis is available on our Deepnote project.
./data/soe
: contains data on state ownership of ASes as part of Carisimo et. al's IMC '21 paper (available here). The version used in our analysis was downloaded on 2023-01-23 and is used in part of Sec 5.1 of the paper.
If any of the external links or information on this page are broken or out of date, please feel free to create an issue on this repository.
The ./notebooks
directory contains the Jupyter notebooks files used for preparing, cleaning, and merging the KIO and IODA datasets as well as notebook files for the paper's analysis.
Description of files:
merge-kio-ioda.ipynb
: Contains the code to load and merge the IODA and KIO datasets. It also contains code to standardize/unify the different conventions used across different versions of the KIO dataset. This will create the merged dataset required for each of the analysis notebooks, saving in multiple formats (SQLite database file, json, and csv).polisci-analysis.ipynb
: Contains code to generate the majority of the figures in Sec. 5.1 and 5.2 of the paper. This includes analysis of the distribution of Internet shutdowns and spontaneous outages across multiple political and socioeconomic indicators.state-ownership-analysis.ipynb
: Contains code to generate figures specifically in Sec 5.1.1. This includes the analysis on the relationship between Internet shutdowns and state ownership of the address space/eye-ball networks.summary-and-technical-analysis.ipynb
: Contains code for generating figures related to a high-level summary of the KIO dataset (Sec 3.2), examples of multiple IODA outages being mapped to a KIO shutdown (Sec 4), and the temporal and technical indicators of Internet shutdowns (Sec 5.3).