Movement Analysis Addition

datapartnership · May 28, 2024 · a6db77c · a6db77c
1 parent a78f0cd
commit a6db77c
Show file tree

Hide file tree

Showing 4 changed files with 625 additions and 1 deletion.
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -13,7 +13,11 @@ parts:
         sections:
           - file: notebooks/conflict/conflictAnalysis.ipynb
       - file: notebooks/agriculture/report.ipynb 
-      - file: notebooks/nighttime-lights/ntl_analysis.html
+      - url: https://raw.githack.com/datapartnership/niger-economic-monitoring/main/notebooks/nighttime-lights/ntl_analysis.html
+        title: Nighttime Lights
+      - file: notebooks/movement/README
+        sections:
+          - file: notebooks/movement/colocation.ipynb
   - caption: Acknowledgements
     chapters:
       - file: docs/team
diff --git a/notebooks/movement/README.md b/notebooks/movement/README.md
@@ -0,0 +1,37 @@
+# Movement Analysis
+
+A key driver in understanding economic activity is movement of people. Movement is useful in dtermining where people live, where they work and where the most activity takes place in a country. To understand this better, the Data Lab aims to use movement data from three datasets - Colocation Maps and Commuting Zones from Meta and Movement data from Mapbox. 
+
+The West African region has been politically unstable in the recent year. In September 2023, Niger, Mali and Burkina Faso created an Alliance of Sahel States (AES) and in January 2024, decided to withdraw from the Economic Community of West African States (ECOWAS). This occurence followed the coup d'état in Niger in July 2023. Given this interdependence, this movement analysis takes into consideration movement across borders as far as data is available. 
+
+## Data
+
+### Meta Colocation Maps
+
+Colocation Maps estimate how often people from different regions are in the same area at the same time, or “colocated.” In particular, for a pair of geographic regions x and y, these maps estimate the rate at which a randomly chosen person from x and a randomly chosen person from y are simultaneously located in the same general area during a randomly chosen time in a given week.
+
+- **Weekly measured coobservation rate (weekly_measured_coobservation_rate)**: The rate at which a randomly chosen person from polygon 1 and a randomly chosen person from polygon 2 were simultaneously observed in a randomly chosen 5-minute period, regardless of where they are in the world. This is different from the weekly measured colocation rate, in which the randomly chosen pair of people must have been observed not only in the same 5-minute period, but also in the same level-16 Bing tile.
+- **Weekly measured colocation rate (weekly_measured_colocation_rate)**: The rate at which a randomly chosen person from polygon 1 and a randomly chosen person from polygon 2 were simultaneously observed in the same level-16 Bing tile in a randomly chosen 5-minute period
+- **Weekly colocation rate (weekly_colocation_rate)**: Of all the times that people from polygon 1 and polygon 2 could have been observed to be colocated, how often were they colocated? Mathematically, this is the weekly measured colocation rate (weekly_measured_colocation_rate) over the weekly measured coobservation rate (weekly_measured_coobservation_rate).This adjusts the colocation rate in light of the fact that people from polygon 1 and polygon 2 were only simultaneously observed some fraction of the time, and only in those cases can we determine whether they were colocated. This adjusted value is what we propose that partners use in their analysis.
+- **Is home tile colocation (is_home_tile_colocation)**: The value in this column represents the response to this statement: These pairs of people are in their shared home tile at the same time. If the value is true, the rate of colocation in the shared home tile is reported in the weekly measured colocation rate (weekly_measured_colocation_rate) and weekly colocation rate (weekly_colocation_rate) columns, which report raw and corrected rates, respectively. If the value is false, the weekly measured colocation rate (weekly_measured_colocation_rate) and weekly colocation rate (weekly_colocation_rate) columns report raw and corrected rates, respectively, at which pairs of people are colocated but at least one person in each pair is outside their home tile.
+
+### Meta Commuting Zones
+
+Commuting zones represent the areas where people spend most of their time and conduct most of their economic activity. These areas of economic integration are independent from political boundaries and can illustrate how economic communities and commute patterns transcend regional boundaries.
+
+#### Methodology and Data Collection
+
+- **Population sample**: Commuting Zones draws from a sample of people who use the Facebook mobile app and have enabled the Location Services setting. 
+- **Spatial methods**: The Commuting Zones spatial shapes are built by defining a community network. They start with a node for each city or town where people who use Facebook live. They define the edges of the graph using a formula. They then reduce the complexity of this graph using a Louvain clustering algorithm. Once they have a simplified network/graph, they use Voronoi shapes to define the polygon for the commuting zone. 
+
+    (#residents moving from i to j + #residents moving from j to i) / (#residents in i + #residents in j)
+
+
+- **Temporal span**: Commuting zones are rebuilt at most every 3 months, when the commuting zone shapes are generated using the previous few weeks of Location Services data.
+- **Minimum counts/size**: To generate a commuting zone, there must be at least 50 people estimated to live within its boundary and a minimum size of at least 1 kilometer by 1 kilometer.
+- **Estimated population (win_population)**: Estimated population within the zone (calculated from the publicly available Facebook High-Resolution Population Density Maps or WorldPop estimates). These population estimates are provided as counts per grid tile on the earth’s surface. They map each of these tiles to a commuting zone, then aggregate population by taking the sum of all the tiles within the commuting zone polygon defined in the geometry field. They then winsorizethe bottom and top 5% of commuting zones. This means that the population counts per commuting zone below the 5% percentile are replaced by the 5% percentile value, and those above the 95% percentile are replaced by the 95% percentile value. All commuting zones have a population of at least 50.
+- **Local business count (local_business_locations_count)**: A scaled score of 1-1,000 calculated by counting all the Facebook Pages of this type within each commuting zone globally. We then winsorize those values. This means that the Page counts per commuting zone below the 2.5% percentile are replaced by the 2.5% percentile value, and those above the 97.5% percentile are replaced by the 97.5% percentile value. The winsorized count of Pages in each commuting zone is then scaled between 0 and 1,000. That means that the smallest winsorized value is given a score of 0 and the highest is given a score of 1,000, with all intermediate values assigned linearly.
+
+### Mapbox Movement
+
+Mapbox Movement data has currently been requested from Mapbox through the Development Data Partnership. The analysis will be published as soon as the data becomes available. 
diff --git a/notebooks/movement/colocation.ipynb b/notebooks/movement/colocation.ipynb
diff --git a/notebooks/movement/travel_patterns.ipynb b/notebooks/movement/travel_patterns.ipynb
@@ -0,0 +1,144 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import geopandas as gpd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import glob\n",
+    "\n",
+    "travel_patterns = pd.DataFrame()\n",
+    "\n",
+    "for file in glob.glob('../../data/movement/meta/travel_patterns_edges/*.csv'):\n",
+    "    df = pd.read_csv(file)\n",
+    "    if df[df['longitude1']==8.0].shape[0]>0:\n",
+    "        travel_patterns = pd.concat([travel_patterns, df])\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array(['norway', 'nigeria'], dtype=object)"
+      ]
+     },
+     "execution_count": 74,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "travel_patterns[travel_patterns['longitude1']==8.0]['polygon1_name'].unique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "KeyError",
+     "evalue": "'polygon1_name'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[70], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m travel_patterns[\u001b[43mtravel_patterns\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mpolygon1_name\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[38;5;241m==\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mchad\u001b[39m\u001b[38;5;124m'\u001b[39m][[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mlatitude1\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mlongitude1\u001b[39m\u001b[38;5;124m'\u001b[39m]]\u001b[38;5;241m.\u001b[39mdrop_duplicates()\n",
+      "File \u001b[1;32mc:\\WBG\\Anaconda3\\envs\\data-goods\\Lib\\site-packages\\pandas\\core\\frame.py:4102\u001b[0m, in \u001b[0;36mDataFrame.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m   4100\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumns\u001b[38;5;241m.\u001b[39mnlevels \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m   4101\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_getitem_multilevel(key)\n\u001b[1;32m-> 4102\u001b[0m indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m   4103\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_integer(indexer):\n\u001b[0;32m   4104\u001b[0m     indexer \u001b[38;5;241m=\u001b[39m [indexer]\n",
+      "File \u001b[1;32mc:\\WBG\\Anaconda3\\envs\\data-goods\\Lib\\site-packages\\pandas\\core\\indexes\\range.py:417\u001b[0m, in \u001b[0;36mRangeIndex.get_loc\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m    415\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n\u001b[0;32m    416\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(key, Hashable):\n\u001b[1;32m--> 417\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key)\n\u001b[0;32m    418\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_indexing_error(key)\n\u001b[0;32m    419\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key)\n",
+      "\u001b[1;31mKeyError\u001b[0m: 'polygon1_name'"
+     ]
+    }
+   ],
+   "source": [
+    "travel_patterns[travel_patterns['polygon1_name']=='chad'][['latitude1', 'longitude1']].drop_duplicates()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array(['philippines'], dtype=object)"
+      ]
+     },
+     "execution_count": 68,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "travel_patterns[travel_patterns['polygon1_name']=='mali']['polygon2_name'].unique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.read_csv('../../data/movement/meta/travel_patterns_edges/11622655397921190_2023-01-02.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df[df['polygon1_name']=='niger'].shape[0]"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "data-goods",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}