Calculate semantic distance for sets of Gene Ontology terms.
These instructions will get you a copy of the project up and running on your local machine.
Scripts are written in python 3. One easy way to get started is installing miniconda 3.
On linux:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Clone the repository:
git clone https://github.com/MEGA-GO/Mega-Go.git
Install package:
cd Mega-Go
pip install -U .
Execute example analysis:
megago sample7.txt sample8.txt
These files can be found here:
MegaGO calculates the similarity between GO terms with the Lin semantic similarity (simLin) metric 1.
where:
- MICA: most informative common ancestor.
- IC(goi): information content of the term goi.
The information content of a go term is calculated as follows:
The frequency p of a term go is defined as:
where:
- c: children of go.
- N: total number of terms in GO corpus.
- ngo': number of occurences of a term go' in a reference data set.
To calculate the similarity of two sets of terms, the best match average (BMA)1 is used.
where:
- m,n: number of terms in set gi and gj, respectively
- sim(go1i,go2j): similarity between two GO terms
1: Lin, Dekang. 1998. “An Information-Theoretic Definition of Similarity.” In Proceedings of the 15th International Conference on Machine Learning, 296—304.
The relative similarity ranges between 0 and 1.
sim(go1i,go2j) value | Interpretation |
---|---|
>0.9 | highly similar functions |
0.3-0.9 | functionally related |
<0.3 | not functionally similar |
This project is licensed under the MIT License - see the LICENSE file for details