diff --git a/README.rst b/README.rst
index c03886b..a87601f 100644
--- a/README.rst
+++ b/README.rst
@@ -35,15 +35,15 @@ Sensitivity and Epsilon Analysis
* Sensitivity : In a single time stamp, ``1`` merchant can come only once in a particular zip code but can appear in upto ``3`` zip codes. So, if we wanted to release measures about a single zip code sensitivity would be ``1`` but since we want to release data for all zip codes, the sensitivity used for each zip code is ``3``.
* Scaling with Time: For multiple time stamps, sensitivity is ``3 * no_of_time_stamps``.
* Epsilon Budget: The epsilon spent for each query is ``∈``.
-* Scale Calculation: ``Scale = (sqrt(3) * no_of_time_stamps) / ∈``.
+* Scale Calculation: ``Scale = (3 * no_of_time_stamps* upper_bound) / ∈``.
-Mobility Detection (Airline Merch Category)
--------------------------------------------
+Mobility Detection
+------------------
Description
-This analysis tracks mobility by monitoring differential private time series release of financial transactions in the "Airlines" category, which reflects the transportation sector.
+This analysis tracks mobility by monitoring differential private time series release of financial transactions in the ``retail_and_recreation``, ``grocery_and_pharmacy`` and ``transit_stations`` super categories which matches with google mobility data for easy validation.
Assumptions
@@ -54,17 +54,18 @@ Assumptions
Algorithm
#. Add City Column: A new ``city`` column is added based on postal codes (``make_preprocess_location``).
+#. Add Super Category Column : A new ``merch_super_category`` column is added for classifying transactions into retail_and_recreation, grocery_and_pharmacy and transit_stations categories (``make_preprocess_merchant_mobility``).
#. Filter for City: Data for the selected city is filtered (``make_filter``).
-#. Filter for Airline Category: Only transactions in the ``Airline`` category are considered (``make_filter``).
+#. Filter for super category: data is filtered for retail_and_recreation, grocery_and_pharmacy and transit_stations categories (``make_filter``).
#. Filter by Time Frame: Data is filtered for the selected time frame (``make_truncate_time``).
#. Transaction Summing & Noise Addition: Sum the number of transactions by postal code for each timestep and add Gaussian noise (``make_private_sum_by``).
Sensitivity and Epsilon Analysis
-* Sensitivity per Merchant: Sensitivity is 3 for each merchant in the ``Airline`` category.
+* Sensitivity per Merchant: Sensitivity is 3 for each merchant.
* Scaling with Time: For multiple timesteps, sensitivity is ``3 * no_of_time_steps``.
* Epsilon Budget: The epsilon spent per timestep is ∈ .
-* Scale Calculation: ``Scale = (3 * no_of_time_steps) / ∈``.
+* Scale Calculation: ``Scale = (3 * no_of_time_steps* upper_bound) / ∈``.
Validation
@@ -100,7 +101,7 @@ Sensitivity and Epsilon Analysis
* Sensitivity per Category : Sensitivity is ``3`` for each category (essential or luxurious goods).
* Scaling with Time : For multiple timesteps, sensitivity is ``3 * no_of_time_steps``.
* Epsilon Budget : The epsilon spent per timestep is ∈.
-* Scale Calculation : ``Scale = (3 * no_of_time_steps) / ∈``.
+* Scale Calculation : ``Scale = (3 * no_of_time_steps* upper_bound) / ∈``.
diff --git a/dist/dp_epidemiology-0.0.8-py3-none-any.whl b/dist/dp_epidemiology-0.0.8-py3-none-any.whl
deleted file mode 100644
index 70fa40f..0000000
Binary files a/dist/dp_epidemiology-0.0.8-py3-none-any.whl and /dev/null differ
diff --git a/dist/dp_epidemiology-0.0.8.tar.gz b/dist/dp_epidemiology-0.0.8.tar.gz
deleted file mode 100644
index 11e66c3..0000000
Binary files a/dist/dp_epidemiology-0.0.8.tar.gz and /dev/null differ
diff --git a/dist/dp_epidemiology-0.0.9-py3-none-any.whl b/dist/dp_epidemiology-0.0.9-py3-none-any.whl
new file mode 100644
index 0000000..f58ac60
Binary files /dev/null and b/dist/dp_epidemiology-0.0.9-py3-none-any.whl differ
diff --git a/dist/dp_epidemiology-0.0.9.tar.gz b/dist/dp_epidemiology-0.0.9.tar.gz
new file mode 100644
index 0000000..03eb266
Binary files /dev/null and b/dist/dp_epidemiology-0.0.9.tar.gz differ
diff --git a/docs/requirements.txt b/docs/requirements.txt
index e69de29..ffba590 100644
Binary files a/docs/requirements.txt and b/docs/requirements.txt differ
diff --git a/docs/usage.rst b/docs/usage.rst
index d6cf342..46413c6 100644
--- a/docs/usage.rst
+++ b/docs/usage.rst
@@ -61,13 +61,14 @@ For example:
To do mobility inference,
-you can use the ``mobility_analyzer.mobility_analyzer()`` function to generate differential private time series of trnsactional data in the "Airlines" category:
+you can use the ``mobility_analyzer.mobility_analyzer()`` function to generate differential private time series of trnsactional data in the ``retail_and_recreation``, ``grocery_and_pharmacy`` and ``transit_stations`` super categories:
.. autofunction:: mobility_analyzer.mobility_analyzer
The ``df`` parameter take pandas dataframe as input with columns ``[ "ID", "date", "merch_category", "merch_postal_code", "transaction_type", "spendamt", "nb_transactions"]``.
The ``start_date`` and ``end_date`` parameters take the start and end date of the time frame for which the analysis is to be done.
The ``city`` parameter takes the name of the city for which the analysis is to be done.
+The ``category`` parameter takes the value of ``retail_and_recreation``, ``grocery_and_pharmacy`` or ``transit_stations`` for which the analysis is to be done.
The ``epsilon`` parameter takes the value of epsilon for differential privacy.
For example:
@@ -75,7 +76,7 @@ For example:
>>> from DP_epidemiology import mobility_analyzer
>>> from datetime import datetime
>>> df = pd.read_csv('data.csv')
->>> mobility_analyzer.mobility_analyzer(df,datetime(2020, 9, 1),datetime(2021, 3, 31),"Medellin",10)
+>>> mobility_analyzer.mobility_analyzer(df,datetime(2020, 9, 1),datetime(2021, 3, 31),"Medellin","retail_and_recreation",10)
nb_transactions date
0 1258 2020-09-01
1 1328 2020-09-08
diff --git a/pyproject.toml b/pyproject.toml
index d052481..e100188 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
[project]
name = "DP_epidemiology"
-version = "0.0.8"
+version = "0.0.9"
dependencies = [
"pandas>=2.1.4",
@@ -16,7 +16,8 @@ dependencies = [
"dash",
"nbformat",
"scipy",
- "matplotlib"
+ "matplotlib",
+ "dtw",
]
authors = [
diff --git a/src/DP_epidemiology/__pycache__/contact_matrix.cpython-310.pyc b/src/DP_epidemiology/__pycache__/contact_matrix.cpython-310.pyc
index 2bf2272..5b18607 100644
Binary files a/src/DP_epidemiology/__pycache__/contact_matrix.cpython-310.pyc and b/src/DP_epidemiology/__pycache__/contact_matrix.cpython-310.pyc differ
diff --git a/src/DP_epidemiology/__pycache__/hotspot_analyzer.cpython-310.pyc b/src/DP_epidemiology/__pycache__/hotspot_analyzer.cpython-310.pyc
index e4e7260..1037b28 100644
Binary files a/src/DP_epidemiology/__pycache__/hotspot_analyzer.cpython-310.pyc and b/src/DP_epidemiology/__pycache__/hotspot_analyzer.cpython-310.pyc differ
diff --git a/src/DP_epidemiology/__pycache__/mobility_analyzer.cpython-310.pyc b/src/DP_epidemiology/__pycache__/mobility_analyzer.cpython-310.pyc
index cab8470..06fa566 100644
Binary files a/src/DP_epidemiology/__pycache__/mobility_analyzer.cpython-310.pyc and b/src/DP_epidemiology/__pycache__/mobility_analyzer.cpython-310.pyc differ
diff --git a/src/DP_epidemiology/__pycache__/pandemic_adherence_analyzer.cpython-310.pyc b/src/DP_epidemiology/__pycache__/pandemic_adherence_analyzer.cpython-310.pyc
index 7a71a7e..354cf23 100644
Binary files a/src/DP_epidemiology/__pycache__/pandemic_adherence_analyzer.cpython-310.pyc and b/src/DP_epidemiology/__pycache__/pandemic_adherence_analyzer.cpython-310.pyc differ
diff --git a/src/DP_epidemiology/__pycache__/utilities.cpython-310.pyc b/src/DP_epidemiology/__pycache__/utilities.cpython-310.pyc
index 651b5c4..182b321 100644
Binary files a/src/DP_epidemiology/__pycache__/utilities.cpython-310.pyc and b/src/DP_epidemiology/__pycache__/utilities.cpython-310.pyc differ
diff --git a/src/DP_epidemiology/hotspot_analyzer.py b/src/DP_epidemiology/hotspot_analyzer.py
index 45a702c..4804d76 100644
--- a/src/DP_epidemiology/hotspot_analyzer.py
+++ b/src/DP_epidemiology/hotspot_analyzer.py
@@ -25,7 +25,7 @@ def hotspot_analyzer(df:pd.DataFrame, start_date:datetime,end_date:datetime,city
nb_timesteps = (end_date - start_date).days // 7
"""scale calculation"""
- scale=(np.sqrt(3.0)*nb_timesteps*upper_bound)/epsilon
+ scale=(3.0*nb_timesteps*upper_bound)/epsilon
new_df=df.copy()
diff --git a/src/DP_epidemiology/mobility_analyzer.py b/src/DP_epidemiology/mobility_analyzer.py
index 7ca1cd2..e59144a 100644
--- a/src/DP_epidemiology/mobility_analyzer.py
+++ b/src/DP_epidemiology/mobility_analyzer.py
@@ -5,6 +5,8 @@
from datetime import datetime
import scipy.stats as stats
import opendp.prelude as dp
+import matplotlib.pyplot as plt
+from dtw import dtw,accelerated_dtw
dp.enable_features("contrib", "floating-point", "honest-but-curious")
@@ -28,7 +30,7 @@ def mobility_analyzer_airline(df:pd.DataFrame,start_date:datetime,end_date:datet
nb_timesteps = (end_date - start_date).days // 7
"""scale calculation"""
- scale=(np.sqrt(3.0)*nb_timesteps*upper_bound)/epsilon
+ scale=(3.0*nb_timesteps*upper_bound)/epsilon
new_df=df.copy()
@@ -60,7 +62,7 @@ def mobility_analyzer(df:pd.DataFrame,start_date:datetime,end_date:datetime,city
nb_timesteps = (end_date - start_date).days // 7
"""scale calculation"""
- scale=(np.sqrt(3.0)*nb_timesteps*upper_bound)/epsilon
+ scale=(3.0*nb_timesteps*upper_bound)/epsilon
new_df=df.copy()
@@ -85,4 +87,15 @@ def mobility_validation_with_google_mobility(df_transactional_data:pd.DataFrame,
# print(df_transactional_mobility.head())
# print(df_google_mobility.head())
r, p = stats.pearsonr(df_transactional_mobility['nb_transactions'][:length], df_google_mobility[category][:length])
- print(f"Scipy computed Pearson r: {r} and p-value: {p}")
\ No newline at end of file
+ print(f"Scipy computed Pearson r: {r} and p-value: {p}")
+
+ d1 = df_transactional_mobility['nb_transactions'][:length].interpolate().values
+ d2 = df_google_mobility[category][:length].interpolate().values
+ d, cost_matrix, acc_cost_matrix, path = accelerated_dtw(d1,d2, dist='euclidean')
+
+ plt.imshow(acc_cost_matrix.T, origin='lower', cmap='gray', interpolation='nearest')
+ plt.plot(path[0], path[1], 'w')
+ plt.xlabel('Subject1')
+ plt.ylabel('Subject2')
+ plt.title(f'DTW Minimum Path with minimum distance: {np.round(d,2)}')
+ plt.show()
\ No newline at end of file
diff --git a/src/DP_epidemiology/pandemic_adherence_analyzer.py b/src/DP_epidemiology/pandemic_adherence_analyzer.py
index 75e966c..64507c0 100644
--- a/src/DP_epidemiology/pandemic_adherence_analyzer.py
+++ b/src/DP_epidemiology/pandemic_adherence_analyzer.py
@@ -26,7 +26,7 @@ def pandemic_adherence_analyzer(df:pd.DataFrame,start_date:datetime,end_date:dat
nb_timesteps = (end_date - start_date).days // 7
"""scale calculation"""
- scale=(np.sqrt(3.0)*nb_timesteps*upper_bound)/epsilon
+ scale=(3.0*nb_timesteps*upper_bound)/epsilon
new_df=df.copy()
diff --git a/src/DP_epidemiology/utilities.py b/src/DP_epidemiology/utilities.py
index 99555c4..3c663c7 100644
--- a/src/DP_epidemiology/utilities.py
+++ b/src/DP_epidemiology/utilities.py
@@ -118,15 +118,15 @@ def function(df):
def make_private_sum_by(column, by, bounds, scale):
"""Create a measurement that computes the grouped bounded sum of `column`"""
- space = dp.vector_domain(dp.atom_domain(T=int)), dp.l2_distance(T=float)
- m_gauss = space >> dp.m.then_gaussian(scale)
+ space = dp.vector_domain(dp.atom_domain(T=int)), dp.l1_distance(T=int)
+ m_lap = space >> dp.m.then_laplace(scale)
t_sum = make_sum_by(column, by, bounds)
def function(df):
exact = t_sum(df)
# print(exact)
noisy_sum = pd.Series(
- np.maximum(m_gauss(exact.to_numpy().flatten()), 0),
+ np.maximum(m_lap(exact.to_numpy().flatten()), 0),
)
# print(noisy_sum)
noisy_sum=noisy_sum.to_frame(name=column)
@@ -138,7 +138,7 @@ def function(df):
input_metric=dp.symmetric_distance(),
output_measure=dp.zero_concentrated_divergence(T=float),
function=function,
- privacy_map=lambda d_in: m_gauss.map(t_sum.map(d_in)),
+ privacy_map=lambda d_in: m_lap.map(t_sum.map(d_in)),
)
def make_filter(column,entry):
diff --git a/src/DP_epidemiology/viz.py b/src/DP_epidemiology/viz.py
index 8369230..4e62d24 100644
--- a/src/DP_epidemiology/viz.py
+++ b/src/DP_epidemiology/viz.py
@@ -96,60 +96,62 @@ def update_graph(start_date, end_date, epsilon, city):
return app
-def create_mobility_dash_app(df:pd.DataFrame):
+def create_mobility_dash_app(df: pd.DataFrame):
cities = {
"Medellin": (6.2476, -75.5658),
"Bogota": (4.7110, -74.0721),
"Brasilia": (-15.7975, -47.8919),
"Santiago": (-33.4489, -70.6693)
- }
+ }
+
app = dash.Dash(__name__)
- category_list = ['grocery_and_pharmacy', 'transit_stations', 'retail_and_recreation',"other"]
+ category_list = ['grocery_and_pharmacy', 'transit_stations', 'retail_and_recreation', "other"]
+
app.layout = html.Div([
- dcc.DatePickerSingle(
- id='start-date-picker',
- date='2019-01-01'
- ),
- dcc.DatePickerSingle(
- id='end-date-picker',
- date='2019-12-31'
- ),
- dcc.Slider(
- id='epsilon-slider',
- min=0,
- max=10,
- step=0.1,
- value=1,
- marks={i: str(i) for i in range(11)}
- ),
- dcc.Dropdown(
- id='city-dropdown',
- options=[{'label': city, 'value': city} for city in cities.keys()],
- value='Medellin'
- ),
- dcc.Dropdown(
- id='category-list-dropdown',
- options=[{'label': category, 'value': category} for category in category_list],
- value='transit_stations'
- ),
- dcc.Graph(id='mobility-graph')
- ])
+ dcc.DatePickerSingle(
+ id='start-date-picker',
+ date='2019-01-01'
+ ),
+ dcc.DatePickerSingle(
+ id='end-date-picker',
+ date='2019-12-31'
+ ),
+ dcc.Slider(
+ id='epsilon-slider',
+ min=0,
+ max=10,
+ step=0.1,
+ value=1,
+ marks={i: str(i) for i in range(11)}
+ ),
+ dcc.Dropdown(
+ id='city-dropdown',
+ options=[{'label': city, 'value': city} for city in cities.keys()],
+ value='Medellin'
+ ),
+ dcc.Dropdown(
+ id='category-list-dropdown',
+ options=[{'label': category, 'value': category} for category in category_list],
+ value='transit_stations'
+ ),
+ dcc.Graph(id='mobility-graph')
+ ])
# Callback to update the graph based on input values
@app.callback(
Output('mobility-graph', 'figure'),
[Input('start-date-picker', 'date'),
- Input('end-date-picker', 'date'),
- Input('city-dropdown', 'value'),
- Input('category-list-dropdown', 'value'),
- Input('epsilon-slider', 'value')]
+ Input('end-date-picker', 'date'),
+ Input('city-dropdown', 'value'),
+ Input('category-list-dropdown', 'value'),
+ Input('epsilon-slider', 'value')]
)
- def update_graph(start_date, end_date, city_filter,category, epsilon):
+ def update_graph(start_date, end_date, city_filter, category, epsilon):
# Convert date strings to datetime objects
start_date = datetime.strptime(start_date, '%Y-%m-%d')
end_date = datetime.strptime(end_date, '%Y-%m-%d')
- # Call the mobility_analyser function
+ # Call the mobility_analyzer function
filtered_df = mobility_analyzer(df, start_date, end_date, city_filter, category, epsilon)
# Plot using Plotly Express
@@ -161,76 +163,186 @@ def update_graph(start_date, end_date, city_filter,category, epsilon):
labels={'nb_transactions': 'Number of Transactions', 'date': 'Date'}
)
+ # Add events for Bogotá
+ if city_filter == "Bogota":
+ events = [
+ ("Isolation Start Drill", "2020-03-20"),
+ ("National Quarantine", "2020-03-26"),
+ ("Gender Restriction", "2020-04-16"),
+ ("Day Without VAT (IVA)", "2020-06-19"),
+ ("Lockdown 1", "2020-07-15"),
+ ("Lockdown 2", "2020-07-30"),
+ ("Lockdown 3", "2020-08-13"),
+ ("Lockdown 4", "2020-08-20"),
+ ("End of National Quarantine", "2020-09-04"),
+ ("Day Without VAT", "2020-11-19"),
+ ("Candle Day", "2020-12-07"),
+ ("Start of Novenas", "2020-12-16"),
+ ("Lockdown 1 (2021)", "2021-01-05"),
+ ("Lockdown 2 (2021)", "2021-01-12"),
+ ("Lockdown 3 (2021)", "2021-01-18"),
+ ("Lockdown 4 (2021)", "2021-01-28"),
+ ("Holy Week", "2021-03-28"),
+ ("Model 4x3", "2021-04-06"),
+ ("Model 4x3 (Extension)", "2021-04-06"),
+ ("Vaccination Stage 1", "2021-02-18"),
+ ("Vaccination Stage 2", "2021-03-08"),
+ ("Vaccination Stage 3", "2021-05-22"),
+ ("Vaccination Stage 4", "2021-06-17"),
+ ("Vaccination Stage 5", "2021-07-17"),
+ ("Riots and Social Unrest", "2021-05-01")
+ ]
+
+ for event, date in events:
+ fig.add_shape(
+ type="line",
+ x0=date,
+ y0=0,
+ x1=date,
+ y1=1,
+ xref='x',
+ yref='paper',
+ line=dict(color="Red", width=2, dash="dash")
+ )
+ fig.add_annotation(
+ x=date,
+ y=1,
+ xref='x',
+ yref='paper',
+ text=event,
+ showarrow=True,
+ arrowhead=1,
+ ax=-10,
+ ay=-40,
+ font=dict(color="Red")
+ )
+
return fig
+
return app
-def create_pandemic_adherence_dash_app(df:pd.DataFrame):
+def create_pandemic_adherence_dash_app(df: pd.DataFrame):
cities = {
"Medellin": (6.2476, -75.5658),
"Bogota": (4.7110, -74.0721),
"Brasilia": (-15.7975, -47.8919),
"Santiago": (-33.4489, -70.6693)
- }
- entry_types=["luxury","essential","other"]
+ }
+ entry_types = ["luxury", "essential", "other"]
app = dash.Dash(__name__)
-
+
app.layout = html.Div([
- dcc.DatePickerSingle(
- id='start-date-picker',
- date='2019-01-01'
- ),
- dcc.DatePickerSingle(
- id='end-date-picker',
- date='2019-12-31'
- ),
- dcc.Slider(
- id='epsilon-slider',
- min=0,
- max=10,
- step=0.1,
- value=1,
- marks={i: str(i) for i in range(11)}
- ),
- dcc.Dropdown(
- id='city-dropdown',
- options=[{'label': city, 'value': city} for city in cities.keys()],
- value='Medellin'
- ),
- dcc.Dropdown(
- id='entry-type-dropdown',
- options=[{'label': entry_type, 'value': entry_type} for entry_type in entry_types],
- value='luxury'
- ),
- dcc.Graph(id='pandemic-adherence-graph')
- ])
+ dcc.DatePickerSingle(
+ id='start-date-picker',
+ date='2019-01-01'
+ ),
+ dcc.DatePickerSingle(
+ id='end-date-picker',
+ date='2019-12-31'
+ ),
+ dcc.Slider(
+ id='epsilon-slider',
+ min=0,
+ max=10,
+ step=0.1,
+ value=1,
+ marks={i: str(i) for i in range(11)}
+ ),
+ dcc.Dropdown(
+ id='city-dropdown',
+ options=[{'label': city, 'value': city} for city in cities.keys()],
+ value='Medellin'
+ ),
+ dcc.Dropdown(
+ id='entry-type-dropdown',
+ options=[{'label': entry_type, 'value': entry_type} for entry_type in entry_types],
+ value='luxury'
+ ),
+ dcc.Graph(id='pandemic-adherence-graph')
+ ])
# Callback to update the graph based on input values
@app.callback(
Output('pandemic-adherence-graph', 'figure'),
[Input('start-date-picker', 'date'),
- Input('end-date-picker', 'date'),
- Input('city-dropdown', 'value'),
- Input('entry-type-dropdown', 'value'),
- Input('epsilon-slider', 'value')]
+ Input('end-date-picker', 'date'),
+ Input('city-dropdown', 'value'),
+ Input('entry-type-dropdown', 'value'),
+ Input('epsilon-slider', 'value')]
)
- def update_graph(start_date, end_date, city_filter,essential_or_luxury, epsilon):
+ def update_graph(start_date, end_date, city_filter, essential_or_luxury, epsilon):
# Convert date strings to datetime objects
start_date = datetime.strptime(start_date, '%Y-%m-%d')
end_date = datetime.strptime(end_date, '%Y-%m-%d')
- # Call the mobility_analyser function
- filtered_df = pandemic_adherence_analyzer(df, start_date, end_date, city_filter,essential_or_luxury, epsilon)
+ # Call the pandemic_adherence_analyzer function
+ filtered_df = pandemic_adherence_analyzer(df, start_date, end_date, city_filter, essential_or_luxury, epsilon)
# Plot using Plotly Express
fig = px.line(
filtered_df,
x='date',
y='nb_transactions',
- title=f"Pandemic Stage Analysis for {city_filter} from {start_date.date()} to {end_date.date()} with epsilon={epsilon}",
+ title=f"Pandemic adherence Analysis for {city_filter} from {start_date.date()} to {end_date.date()} with epsilon={epsilon}",
labels={'nb_transactions': 'Number of Transactions', 'date': 'Date'}
)
+ # Add events for Bogotá
+ if city_filter == "Bogota":
+ events = [
+ ("Isolation Start Drill", "2020-03-20"),
+ ("National Quarantine", "2020-03-26"),
+ ("Gender Restriction", "2020-04-16"),
+ ("Day Without VAT (IVA)", "2020-06-19"),
+ ("Lockdown 1", "2020-07-15"),
+ ("Lockdown 2", "2020-07-30"),
+ ("Lockdown 3", "2020-08-13"),
+ ("Lockdown 4", "2020-08-20"),
+ ("End of National Quarantine", "2020-09-04"),
+ ("Day Without VAT", "2020-11-19"),
+ ("Candle Day", "2020-12-07"),
+ ("Start of Novenas", "2020-12-16"),
+ ("Lockdown 1 (2021)", "2021-01-05"),
+ ("Lockdown 2 (2021)", "2021-01-12"),
+ ("Lockdown 3 (2021)", "2021-01-18"),
+ ("Lockdown 4 (2021)", "2021-01-28"),
+ ("Holy Week", "2021-03-28"),
+ ("Model 4x3", "2021-04-06"),
+ ("Model 4x3 (Extension)", "2021-04-06"),
+ ("Vaccination Stage 1", "2021-02-18"),
+ ("Vaccination Stage 2", "2021-03-08"),
+ ("Vaccination Stage 3", "2021-05-22"),
+ ("Vaccination Stage 4", "2021-06-17"),
+ ("Vaccination Stage 5", "2021-07-17"),
+ ("Riots and Social Unrest", "2021-05-01")
+ ]
+
+ for event, date in events:
+ fig.add_shape(
+ type="line",
+ x0=date,
+ y0=0,
+ x1=date,
+ y1=1,
+ xref='x',
+ yref='paper',
+ line=dict(color="Red", width=2, dash="dash")
+ )
+ fig.add_annotation(
+ x=date,
+ y=1,
+ xref='x',
+ yref='paper',
+ text=event,
+ showarrow=True,
+ arrowhead=1,
+ ax=-10,
+ ay=-40,
+ font=dict(color="Red")
+ )
+
return fig
+
return app
def create_contact_matrix_dash_app(df:pd.DataFrame):
@@ -375,7 +487,7 @@ def update_graph(start_date, end_date, city_filter, category, epsilon):
offset = filtered_df_transactional["date"].iloc[0]
filtered_df_google = preprocess_google_mobility(df_google_mobility_data, start_date, end_date, city_filter, category, offset)
- # Create the plot
+ # Create the plot with two y-axes
fig = go.Figure()
# Add transactional mobility data
@@ -383,7 +495,8 @@ def update_graph(start_date, end_date, city_filter, category, epsilon):
x=filtered_df_transactional['date'],
y=filtered_df_transactional['nb_transactions'],
mode='lines',
- name='Transactional Mobility'
+ name='Transactional Mobility',
+ yaxis='y1'
))
# Add Google mobility data
@@ -391,15 +504,14 @@ def update_graph(start_date, end_date, city_filter, category, epsilon):
x=filtered_df_google['date'],
y=filtered_df_google[category],
mode='lines',
- name='Google Mobility'
+ name='Google Mobility',
+ yaxis='y2'
))
- # Update layout
+ # Update layout for two y-axes
fig.update_layout(
title=f"Mobility Analysis for {city_filter} and category {category} from {start_date.date()} to {end_date.date()} with epsilon={epsilon}",
xaxis_title='Date',
- # yaxis_title='Mobility Change',
- # legend_title='Data Source'
yaxis=dict(
title='Transactional Mobility',
titlefont=dict(color='blue'),
diff --git a/tests/test_viz.ipynb b/tests/test_viz.ipynb
index 3e3d84a..60fe32d 100644
--- a/tests/test_viz.ipynb
+++ b/tests/test_viz.ipynb
@@ -28,21 +28,261 @@
"outputs": [],
"source": [
"path = \"C:\\\\Users\\kshub\\\\OneDrive\\\\Documents\\\\PET_phase_2\\\\Technical_Phase_Data\\\\technical_phase_data.csv\"\n",
- "df_tran = pd.read_csv(path)\n",
- "df_mobility = pd.read_csv(\"C:\\\\Users\\\\kshub\\\\OneDrive\\\\Documents\\\\PET_phase_2\\\\Global_Mobility_Report (1).csv\", low_memory=False)\n"
+ "df_tran = pd.read_csv(path)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " country_region_code | \n",
+ " country_region | \n",
+ " sub_region_1 | \n",
+ " sub_region_2 | \n",
+ " metro_area | \n",
+ " iso_3166_2_code | \n",
+ " census_fips_code | \n",
+ " place_id | \n",
+ " date | \n",
+ " retail_and_recreation_percent_change_from_baseline | \n",
+ " grocery_and_pharmacy_percent_change_from_baseline | \n",
+ " parks_percent_change_from_baseline | \n",
+ " transit_stations_percent_change_from_baseline | \n",
+ " workplaces_percent_change_from_baseline | \n",
+ " residential_percent_change_from_baseline | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " AE | \n",
+ " United Arab Emirates | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ChIJvRKrsd9IXj4RpwoIwFYv0zM | \n",
+ " 2020-02-15 | \n",
+ " 0.0 | \n",
+ " 4.0 | \n",
+ " 5.0 | \n",
+ " 0.0 | \n",
+ " 2.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " AE | \n",
+ " United Arab Emirates | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ChIJvRKrsd9IXj4RpwoIwFYv0zM | \n",
+ " 2020-02-16 | \n",
+ " 1.0 | \n",
+ " 4.0 | \n",
+ " 4.0 | \n",
+ " 1.0 | \n",
+ " 2.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " AE | \n",
+ " United Arab Emirates | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ChIJvRKrsd9IXj4RpwoIwFYv0zM | \n",
+ " 2020-02-17 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " 5.0 | \n",
+ " 1.0 | \n",
+ " 2.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " AE | \n",
+ " United Arab Emirates | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ChIJvRKrsd9IXj4RpwoIwFYv0zM | \n",
+ " 2020-02-18 | \n",
+ " -2.0 | \n",
+ " 1.0 | \n",
+ " 5.0 | \n",
+ " 0.0 | \n",
+ " 2.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " AE | \n",
+ " United Arab Emirates | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ChIJvRKrsd9IXj4RpwoIwFYv0zM | \n",
+ " 2020-02-19 | \n",
+ " -2.0 | \n",
+ " 0.0 | \n",
+ " 4.0 | \n",
+ " -1.0 | \n",
+ " 2.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " country_region_code country_region sub_region_1 sub_region_2 \\\n",
+ "0 AE United Arab Emirates NaN NaN \n",
+ "1 AE United Arab Emirates NaN NaN \n",
+ "2 AE United Arab Emirates NaN NaN \n",
+ "3 AE United Arab Emirates NaN NaN \n",
+ "4 AE United Arab Emirates NaN NaN \n",
+ "\n",
+ " metro_area iso_3166_2_code census_fips_code place_id \\\n",
+ "0 NaN NaN NaN ChIJvRKrsd9IXj4RpwoIwFYv0zM \n",
+ "1 NaN NaN NaN ChIJvRKrsd9IXj4RpwoIwFYv0zM \n",
+ "2 NaN NaN NaN ChIJvRKrsd9IXj4RpwoIwFYv0zM \n",
+ "3 NaN NaN NaN ChIJvRKrsd9IXj4RpwoIwFYv0zM \n",
+ "4 NaN NaN NaN ChIJvRKrsd9IXj4RpwoIwFYv0zM \n",
+ "\n",
+ " date retail_and_recreation_percent_change_from_baseline \\\n",
+ "0 2020-02-15 0.0 \n",
+ "1 2020-02-16 1.0 \n",
+ "2 2020-02-17 -1.0 \n",
+ "3 2020-02-18 -2.0 \n",
+ "4 2020-02-19 -2.0 \n",
+ "\n",
+ " grocery_and_pharmacy_percent_change_from_baseline \\\n",
+ "0 4.0 \n",
+ "1 4.0 \n",
+ "2 1.0 \n",
+ "3 1.0 \n",
+ "4 0.0 \n",
+ "\n",
+ " parks_percent_change_from_baseline \\\n",
+ "0 5.0 \n",
+ "1 4.0 \n",
+ "2 5.0 \n",
+ "3 5.0 \n",
+ "4 4.0 \n",
+ "\n",
+ " transit_stations_percent_change_from_baseline \\\n",
+ "0 0.0 \n",
+ "1 1.0 \n",
+ "2 1.0 \n",
+ "3 0.0 \n",
+ "4 -1.0 \n",
+ "\n",
+ " workplaces_percent_change_from_baseline \\\n",
+ "0 2.0 \n",
+ "1 2.0 \n",
+ "2 2.0 \n",
+ "3 2.0 \n",
+ "4 2.0 \n",
+ "\n",
+ " residential_percent_change_from_baseline \n",
+ "0 1.0 \n",
+ "1 1.0 \n",
+ "2 1.0 \n",
+ "3 1.0 \n",
+ "4 1.0 "
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# Define the path to the CSV file\n",
+ "file_path = \"C:\\\\Users\\\\kshub\\\\OneDrive\\\\Documents\\\\PET_phase_2\\\\Global_Mobility_Report (1).csv\"\n",
+ "\n",
+ "# Initialize an empty list to store the chunks\n",
+ "chunks = []\n",
+ "\n",
+ "# Read the CSV file in chunks\n",
+ "chunk_size = 10000 # Adjust the chunk size as needed\n",
+ "for chunk in pd.read_csv(file_path, chunksize=chunk_size, low_memory=False):\n",
+ " chunks.append(chunk)\n",
+ "\n",
+ "# Concatenate the chunks into a single DataFrame\n",
+ "df_mobility = pd.concat(chunks, ignore_index=True)\n",
+ "\n",
+ "# Display the first few rows of the DataFrame\n",
+ "df_mobility.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df_mobility = pd.read_csv(\"C:\\\\Users\\\\kshub\\\\OneDrive\\\\Documents\\\\PET_phase_2\\\\Global_Mobility_Report (1).csv\", low_memory=False)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "Scipy computed Pearson r: 0.3056463348470299 and p-value: 0.03886009367628722\n"
+ "Scipy computed Pearson r: 0.2726605353538447 and p-value: 0.06675963402694846\n"
]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "