Skip to content

Edit Data Table Then Load

Barry Demchak edited this page Dec 15, 2020 · 7 revisions

Intent

Show how a Cytoscape table can be loaded from a data file that needs editing before loading.

Motivation

Cytoscape can load table data from a number of file formats, provided that the data file is already formatted for Cytoscape consumption. If this is true, see the Load Data Table From File recipe. If not, Python can easily make such edits, as described below.

Applicability

The data table must be in a table-oriented format that Pandas' CSV reader can load directly, and it must have a column whose values can be used as a key into the Cytoscape table (i.e., they correspond to values in the Cytoscape table).

Consequences

Data tables that don't quite match the standard Cytoscape table formats can be loaded into Cytoscape tables anyway.

Implications

Because the data table is first loaded by Python and manipulated in the Python memory space, it must be transferred to Cytoscape via API call. This requires both Python memory and wall-clock time to transfer to Cytoscape. Contrast this to Cytoscape directly loading the table file (see Load Data Table From File), which requires no Python memory or transfer time.

Sample Code

Suppose the data is a tab-separated table in Barabasi/supplementary_tablesS2.txt with the column names as the second line of the file, and data in subsequent lines:

Supporting Information Table 2. Network characteristics of diseases.
Disease ID	Name	                                                Disorder class	Size (s) Degree (k)
1	        "17,20-lyase_deficiency"	                        Endocrine	1	 0
3	        2-methyl-3-hydroxybutyryl-CoA_dehydrogenase_deficiency	Metabolic	1	 0
...

Suppose, too, that there is a Cytoscape node table into which this table should be merged:

shared name    name
1              1
2              2
3              3

Assume that the Cytoscape node table's Name column values correspond to Disease ID column values in the new table. There are three issues that need solving before loading the new table into Cytoscape's node table:

  1. The first line is meaningless, and should be discarded.
  2. The new table's Disease ID column appears to be a number, but it will be used as a key to match Name values in the Cytoscape node table. Cytoscape Name values are already of type String.
  3. The new table's Name column (in the second line) conflicts with the Name column already present in Cytoscape's node table.

The following code achieves all three objectives, and then downloads the table to Cytoscape as a node table:

import pandas as df
disease_table = df.read_csv('Barabasi/supplementary_tablesS2.txt', sep='\t', header=1, dtype={'Disease ID':str})
disease_table.rename({'Name': 'Disease Name'}, axis=1, inplace=True)
disease_table

import py4cytoscape as p4c
p4c.load_table_data(disease_table, data_key_column='Disease ID')
  1. The sep='\t' parameter recognizes the file as tab-separated, and the header=1 parameter causes the first line (i.e., 0) to be skipped.
  2. The dtype= parameter defines Disease ID as a string instead of a number.
  3. The .rename() function renames the Name column as Disease Name.

Finally, the 'load_table_data()' function transfers the new disease_table to Cytoscape, and matches its Disease ID column values with the Cytoscape node table's Name values (per the data_key_column= parameter). The result is a node table containing the new data values:

shared name    name Disease ID	Name	                                                Disorder class	Size (s) Degree (k)
1              1    1	        "17,20-lyase_deficiency"	                        Endocrine	1	 0
2              2
3              3    3	        2-methyl-3-hydroxybutyryl-CoA_dehydrogenase_deficiency	Metabolic	1	 0

Related Recipes

Load Data Table From File