-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise export ports data #1175
Revise export ports data #1175
Changes from 9 commits
a9e89e1
54df2b0
27cdfb1
8587ec9
3018d0f
1f417fc
2a2a4bd
ad826c4
45fa5f3
a33aa89
7a271a8
2b93191
66d04d6
426f41f
5796bcf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -478,6 +478,7 @@ custom_data: | |
add_existing: false | ||
custom_sectors: false | ||
gas_network: false # If "True" then a custom .csv file must be placed in "resources/custom_data/pipelines.csv" , If "False" the user can choose btw "greenfield" or Model built-in datasets. Please refer to ["sector"] below. | ||
export_data: false # If "True" then a custom .csv file must be placed in "data/custom/export_ports.csv" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest to name it: |
||
|
||
industry: | ||
reference_year: 2015 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here it is better to have an example file (not empty) where the needed columns are shown and at least an example row is also available. This way it is easier for the users to insert their custom data in the right format. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah we should also delete the export_ports.csv from data folder as it is not needed anymore. Maybe just move it here as it is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I have done is to move the export_ports.csv file to the custom folder which should serve as an example. Let me know your thoughts on this. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,36 @@ def download_ports(): | |
return wpi_csv | ||
|
||
|
||
def filter_ports(dataframe): | ||
""" | ||
Filters ports based on their harbor size and returns a DataFrame containing | ||
only the largest port for each country. | ||
""" | ||
# Filter large sized ports | ||
large_ports = dataframe[dataframe["Harbor Size"] == "Large"] | ||
countries_with_large_ports = large_ports["country"].unique() | ||
|
||
# Filter out countries with large ports | ||
remaining_ports = dataframe[~dataframe["country"].isin(countries_with_large_ports)] | ||
|
||
# Filter medium sized ports from remaining ports | ||
medium_ports = remaining_ports[remaining_ports["Harbor Size"] == "Medium"] | ||
countries_with_medium_ports = medium_ports["country"].unique() | ||
|
||
# Filter out countries with medium ports | ||
remaining_ports = remaining_ports[ | ||
~remaining_ports["country"].isin(countries_with_medium_ports) | ||
] | ||
|
||
# Filter small sized ports from remaining ports | ||
small_ports = remaining_ports[remaining_ports["Harbor Size"] == "Small"] | ||
|
||
# Combine all filtered ports | ||
filtered_ports = pd.concat([large_ports, medium_ports, small_ports]) | ||
|
||
return filtered_ports | ||
|
||
|
||
if __name__ == "__main__": | ||
if "snakemake" not in globals(): | ||
from _helpers import mock_snakemake | ||
|
@@ -102,3 +132,6 @@ def download_ports(): | |
ports["fraction"] = ports["Harbor_size_nr"] / ports["Total_Harbor_size_nr"] | ||
|
||
ports.to_csv(snakemake.output[0], sep=",", encoding="utf-8", header="true") | ||
filter_ports(ports).to_csv( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we can put an if clause that i suggested in my Snakefile comment. |
||
snakemake.output[1], sep=",", encoding="utf-8", header="true" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer that all files in resources to be the ones that are actually used in the workflow. This way the user can track a file easily if the file is not in resources then the workflow used the file from the data folder.
Therefore, I suggest to have this rule read only
export_ports="resources/" + SECDIR + "export_ports.csv",
and the decision on using custom data or workflow generated one stays in
prepare_ports.py
.Thus in
prepare_ports.py
, either:Please note that all my comments are subject to discussion if needed.