Skip to content

Commit

Permalink
handling GCP links
Browse files Browse the repository at this point in the history
  • Loading branch information
callahantiff committed Oct 30, 2023
1 parent 8082267 commit 36fd2d1
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 14 deletions.
13 changes: 6 additions & 7 deletions main.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,15 @@
"\n",
"<br>\n",
"\n",
"**Assumptions:**\n",
"1. Hyperlinks to all downloaded and generated data sources are provided through [this](https://console.cloud.google.com/storage/browser/pheknowlator/release_v2.0.0?project=pheknowlator) dedicated Google Cloud Storage Bucket. <u>This includes examples, from prior builds, of the required input documents mentioned below</u>. \n",
"2. Make sure that the following input documents have been constructed (see the [Dependencies Wiki](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) for more information): \n",
"**Assumptions:** \n",
"1. Make sure that the following input documents have been constructed (see the [Dependencies Wiki](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies) for more information): \n",
" - [`resource_info.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/resource_info.txt)\n",
" - [`ontology_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/ontology_source_list.txt)\n",
" - [`edge_source_list.txt`](https://github.com/callahantiff/PheKnowLator/blob/master/resources/edge_source_list.txt) \n",
"\n",
"3. Prepare [relations](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) and [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) files prior to running the scripts. \n",
"2. Prepare [relations](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#relations-data) and [node metadata](https://github.com/callahantiff/PheKnowLator/wiki/Dependencies#node-metadata) files prior to running the scripts. \n",
"\n",
"4. Select a knowledge graph build type (i.e. `full`, `partial`, or `post-closure`) and construction method (i.e. `instance-based` or `subclass-based`). \n",
"3. Select a knowledge graph build type (i.e. `full`, `partial`, or `post-closure`) and construction method (i.e. `instance-based` or `subclass-based`). \n",
"\n",
"<br>\n",
"\n",
Expand Down Expand Up @@ -125,7 +124,7 @@
"\n",
"**Assumptions:** \n",
"- All sources used to construct our knowledge graph need to be preprocessed and ready to download prior to running this code. All mapping, filtering, and label data have been generated prior to this step. For assistance with creating these datasets, see the [`Data_Preparation.ipynb`](https://github.com/callahantiff/PheKnowLator/blob/master/notebooks/Data_Preparation.ipynb) Jupyter Notebook. \n",
"- All downloaded and generated data sources for all PheKnowLator builds can be accessed through [this](https://console.cloud.google.com/storage/browser/pheknowlator/release_v2.0.0?project=pheknowlator) dedicated Google Cloud Storage Bucket.\n"
"- All downloaded and generated data sources for all PheKnowLator builds can be accessed through [Zenodo](https://zenodo.org/doi/10.5281/zenodo.7030039).\n"
]
},
{
Expand Down Expand Up @@ -436,7 +435,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
"version": "3.6.8"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
Expand Down
8 changes: 4 additions & 4 deletions notebooks/Data_Preparation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"\n",
"**Dependencies:** \n",
"- **Scripts**: This notebook utilizes several helper functions, which are stored in the [`data_utils.py`](https://github.com/callahantiff/PheKnowLator/blob/master/pkt_kg/utils/data_utils.py) and [`kg_utils.py`](https://github.com/callahantiff/PheKnowLator/blob/master/pkt_kg/utils/kg_utils.py) scripts. \n",
"- **Data**: Hyperlinks to all downloaded and generated data sources are provided through [this](https://console.cloud.google.com/storage/browser/pheknowlator/release_v2.0.0?project=pheknowlator) dedicated Google Cloud Storage Bucket. <u>This notebook will download everything that is needed for you</u>. \n",
"- **Data**: Hyperlinks to all downloaded and generated data sources are provided through [Zenodo](https://zenodo.org/doi/10.5281/zenodo.7030039). <u>This notebook will download everything that is needed for you</u>. \n",
"_____\n",
"***"
]
Expand Down Expand Up @@ -966,7 +966,7 @@
"outputs": [],
"source": [
"# download data\n",
"url = 'https://storage.googleapis.com/pheknowlator/curated_data/genomic_typing_dict.pkl'\n",
"url = 'https://zenodo.org/records/10056198/files/genomic_typing_dict.pkl.zip?download=1'\n",
"if not os.path.exists(unprocessed_data_location + 'genomic_typing_dict.pkl'):\n",
" data_downloader(url, unprocessed_data_location)\n",
"\n",
Expand Down Expand Up @@ -2583,7 +2583,7 @@
"outputs": [],
"source": [
"# download data\n",
"url='https://storage.googleapis.com/pheknowlator/curated_data/zooma_tissue_cell_mapping_04JAN2020.xlsx'\n",
"url='https://zenodo.org/records/10056198/files/zooma_tissue_cell_mapping_04JAN2020.xlsx.zip?download=1'\n",
"if not os.path.exists(unprocessed_data_location + 'zooma_tissue_cell_mapping_04JAN2020.xlsx'):\n",
" data_downloader(url, unprocessed_data_location)\n",
" \n",
Expand Down Expand Up @@ -3102,7 +3102,7 @@
"outputs": [],
"source": [
"# download data\n",
"url='https://storage.googleapis.com/pheknowlator/curated_data/genomic_sequence_ontology_mappings.xlsx'\n",
"url='https://zenodo.org/records/10056198/files/genomic_sequence_ontology_mappings.xlsx.zip?download=1'\n",
"if not os.path.exists(unprocessed_data_location + 'genomic_sequence_ontology_mappings.xlsx'):\n",
" data_downloader(url, unprocessed_data_location)\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/Ontology_Cleaning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"\n",
"<br>\n",
"\n",
"Details on the data utilized in this script can be found on the [Data Sources](https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) Wiki. Data can be downloaded from [this](https://console.cloud.google.com/storage/browser/pheknowlator/release_v2.0.0?project=pheknowlator) dedicated Google Cloud Storage Bucket. Please note that all build data are freely available and organized by release and build date. "
"Details on the data utilized in this script can be found on the [Data Sources](https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) Wiki. Data can be downloaded from [Zenodo](https://zenodo.org/doi/10.5281/zenodo.7030039). Please note that all build data are freely available and organized by release and build date. "
]
},
{
Expand Down Expand Up @@ -108,7 +108,7 @@
"<u>Normalize Existing Ontology Classes</u> \n",
" - **Description:** Checks for inconsistencies in ontology classes that overlap with non-ontology entity identifiers (e.g. if HP includes `HGNC` identifiers, but PheKnowLator utilizes `Entrez` identifiers). \n",
"\n",
" - **Solution:** While there are other types of identifiers, we currently focus primarily on resolving errors involving the genomic identifiers, since we have a master dictionary we can use([`Merged_gene_rna_protein_identifiers.pkl`](https://storage.googleapis.com/pheknowlator/release_v2.0.0/current_build/data/processed_data/Merged_gene_rna_protein_identifiers.pkl)). This check can be updated in future iterations to include other types of identifiers, but given our detailed examination of the `v2.0.0` ontologies, these were the identifier types that needed repair.\n",
" - **Solution:** While there are other types of identifiers, we currently focus primarily on resolving errors involving the genomic identifiers, since we have a master dictionary we can use (`Merged_gene_rna_protein_identifiers.pkl` -- which is generated during the data preporcessing steps of the build). This check can be updated in future iterations to include other types of identifiers, but given our detailed examination of the `v2.0.0` ontologies, these were the identifier types that needed repair.\n",
"\n",
"<u>Normalize Duplicate Ontology Concepts</u> \n",
" - **Description:** Make sure that all classes that represent the same entity are connected to each other. For example, consider the following: the [Sequence Ontology](http://www.sequenceontology.org/), [ChEBI](https://www.ebi.ac.uk/chebi), and [PRotein Ontology](https://proconsortium.org/) all include terms for protein, but none of these classes are connected to each other.\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/RDF_Graph_Processing_Example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@
"# download data to the temp_dir directory\n",
"data_urls = [\n",
" 'http://purl.obolibrary.org/obo/vo.owl',\n",
" 'https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_instance_relationsOnly_noOWL_OWLNETS.nt'\n",
" 'https://zenodo.org/records/10055990/files/PheKnowLator_v2.0.0_full_instance_relationsOnly_noOWL_OWLNETS.nt.tar.gz?download=1'\n",
"]\n",
"\n",
"for url in data_urls:\n",
Expand Down

0 comments on commit 36fd2d1

Please sign in to comment.