GitHub - lozuponelab/knowledge-source-mappings

Visualize the integrated resources relationships as network diagram

The following steps will create the output necessary to visualize the relationships among integrated resources and primary sources as a network diagram. In order to finish the figure, Cytoscape must be installed. Manual instructions to create the figure are included.

Create network where size of primary sources and aggregated DBs represent number of integrated resources that use them as mappings

In Cytoscape:

Import network from file: Resource Interaction Table.xlsx, Sheet 1 (set as source, interaction, target)
Import table from file: Resource Interaction Table.xlsx, Sheet 2 (set as node, catergory)
Set node style, fill color, discrete mapping to unique colors for each category
Position integrated DB nodes in following order: mdad, gutmgene, gutmdisorder, disbiome, amadis, gimica, bugsigdb, dbbact, mikg4md, preprobiotickg, kg-microbe, biochem4j, unifuncnet
Remove labels of edges
Change label size to circle, inDegree, continuous mapping a. Go to tools, analyze network, analyze as directed graph to change node size b. Toggle with Continuous Mapping Editor for node size to make peak ~10 up to ~70-80 c. Select integrated databases, set bypass for shape (rectangle) and size (15)
Only include edges between integrated DBs, aggregated DBs, and primary sources a. Select all integrated db nodes, select - edges - select all edges, then select - nodes - deselect all nodes to remove edges b. Select all aggregated db nodes, , select - edges - select all edges, then select - nodes - deselect all nodes to remove edges
Save figure as Network_sizeByDegree.svg

Create network where size of primary sources and aggregated DBs represent number of integrated resources that use them as mappings

Run the following:

cd ./scripts/

python db_expansion.py
Rscript integrated_db_plotting.R
Rscript collapse_categories.R

In Cytoscape: 2. Import network from file: ~/data/category_edges.tsv, Sheet 1 (set as source, interaction, target) 3. Import table from file: Resource Interaction Table.xlsx, Sheet 2 (set as node, catergory) 4. Align integrated db’s in order above categories a. Select integrated db’s only, Layout Tools, align and distribute 5. Select all categories, set size to 100 6. Change line width to 1, ensure no arrowhead is there (arrowhead will be added in AdobeIllustrator) 7. Save as Network_categories.svg

Create network with edges

In Adobe Illustrator:

Open Network_sizeByDegree.svg
Open Network_categories.svg a. Update colors to chosen palette b. For large category circles, make 50% opacity c. Rotate rectangle, text for integrated DB rectangles
Change arrowhead to shape to edit colors a. Add target arrowhead b. Select same, fill & stroke c. Object, path, outline stroke

Visualize the Reference Matrix

The following code will create Figure 2b, the matrix of inegrated resource relationships.

Environment Installation

mamba env create -f db_review.yml

Generate the Reference Matrix Visualization

snakemake --cores 1

Child Database Expansion

The db_expansion.py script generates the edge distance between a given database i and all child databases that it references. An example case for WikiPathways is given below (note that this diagram may not render in some versions of Safari).

---
title: Order example
---
erDiagram
    WikiPathways ||--o{ "NCBIGene" : "functional link"
    WikiPathways ||--o{ "ChEBI" : "chemical link"
    WikiPathways ||--o{ "HMDB" : "chemical link"
    HMDB ||--o{ "GenBank" : "taxonomic link"
    HMDB ||--o{ "ChEBI" : "chemical link"
    HMDB ||--o{ "PubChem" : "chemical link"
    HMDB ||--o{ "UniProt" : "functional link"
    HMDB ||--o{ "PDB" : "functional link"
    HMDB ||--o{ "OMIM" : "disease link"

Source DB	Target DB	Edge Distance
WikiPathways	NCBIGene	1
WikiPathways	ChEBI	1
WikiPathways	HMDB	1
WikiPathways	UniProt	2
WikiPathways	PDB	2
WikiPathways	OMIM	2
WikiPathways	PubChem	2
WikiPathways	GenBank	2

Reference Matrix Visualization

We then use our expanded reference table to hierarchically cluster the Source Databases (plotted along the y-axis) based off edge distance to the child nodes. This can be done with the Resource Interaction Table-withIntegrated.xlsx file to include all integrated database, or with the Resource Interaction Table.xlsx file for only aggregate databases (as shown here):

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
plots		plots
scripts		scripts
workflow		workflow
.gitignore		.gitignore
README.md		README.md
db_review.yml		db_review.yml
db_viz_final.png		db_viz_final.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualize the integrated resources relationships as network diagram

Create network where size of primary sources and aggregated DBs represent number of integrated resources that use them as mappings

Create network where size of primary sources and aggregated DBs represent number of integrated resources that use them as mappings

Create network with edges

Visualize the Reference Matrix

Environment Installation

Generate the Reference Matrix Visualization

Child Database Expansion

Reference Matrix Visualization

About

Releases

Packages

Contributors 2

Languages

lozuponelab/knowledge-source-mappings

Folders and files

Latest commit

History

Repository files navigation

Visualize the integrated resources relationships as network diagram

Create network where size of primary sources and aggregated DBs represent number of integrated resources that use them as mappings

Create network where size of primary sources and aggregated DBs represent number of integrated resources that use them as mappings

Create network with edges

Visualize the Reference Matrix

Environment Installation

Generate the Reference Matrix Visualization

Child Database Expansion

Reference Matrix Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages