Skip to content
/ ac Public

A lossless compression tool for Amino Acid sequences

License

Notifications You must be signed in to change notification settings

cobilab/ac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Conda License: GPL v3

AC

AC: a lossless compression tool for amino acid sequences.


AC is a new lossless compressor to compress efficiently amino acid sequences (proteins). It uses a cooperation between multiple context and substitutional tolerant context models. The cooperation between models is balanced with weights that benefit the models with better performance according to a forgetting function specific for each model.

1. INSTALLATION

Downloading and installing AC:

git clone https://github.com/pratas/ac.git
cd ac/src/
cmake .
make

Cmake is needed for the installation (http://www.cmake.org/). You can download it directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate packet manager, such as:

sudo apt-get install cmake

An alternative to cmake, but limited to Linux, can be set using the following instructions:

cp Makefile.linux Makefile
make

2. USAGE

To see the possible options of AC type

./AC

or

./AC -h

These will print the following options:

Usage: AC [OPTION]... -r [FILE] [FILE]:[...] Compression of amino acid sequences. Non-mandatory arguments: -h give this help, -s show AC compression levels, -v verbose mode (more information), -V display version number, -f force overwrite of output, -l <level> level of compression [1;7] (lazy -tm setup), -t <threshold> threshold frequency to discard from alphabet, -e it creates a file with the extension ".iae" with the respective information content. -rm <c>:<d>:<g>/<m>:<e>:<a> reference model (-rm 1:10:0.9/0:0:0), -rm <c>:<d>:<g>/<m>:<e>:<a> reference model (-rm 5:90:0.9/1:50:0.8), ... -tm <c>:<d>:<g>/<m>:<e>:<a> target model (-tm 1:1:0.8/0:0:0), -tm <c>:<d>:<g>/<m>:<e>:<a> target model (-tm 7:100:0.9/2:10:0.85), ... target and reference templates use <c> for context-order size, <d> for alpha (1/<d>), <g> for gamma (decayment forgetting factor) [0;1), <m> to the maximum sets the allowed mutations, on the context without being discarded (for deep contexts), under the estimator <e>, using <a> for gamma (decayment forgetting factor) [0;1) (tolerant model), -r <FILE> reference file ("-rm" are loaded here), Mandatory arguments: <FILE>:<...>:<...> file to compress (last argument). For more files use splitting ":" characters. Example: [Compress] ./AC -v -tm 1:1:0.8/0:0:0 -tm 5:20:0.9/3:20:0.9 seq.txt [Decompress] ./AD -v seq.txt.co Report bugs to <{pratas,seyedmorteza,ap}@ua.pt>.

3. EXAMPLE

After AC intallation, run the following:

wget http://sweet.ua.pt/pratas/datasets/AminoAcidsCorpus.zip
unzip AminoAcidsCorpus.zip
cp AminoAcidsCorpus/HI .
./AC -v -l 2 HI
./AD -v HI.co
cmp HI HI.de

It will download nine amino acid sequences and compress and decompress one of the smallest (HI). Finally, it compares if the uncompressed sequence is equal to the original.

4. CITATION

On using this tool/method, please cite:

  • Hosseini, M., Pratas, D. & Pinho, A.J., 2019, Feb. AC: A Compression Tool for Amino Acid Sequences. Interdiscip Sci Comput Life Sci (2019). https://doi.org/10.1007/s12539-019-00322-1

  • Pratas, D., Hosseini, M. and Pinho, A.J., 2018, May. Compression of Amino Acid Sequences. In International Conference on Practical Applications of Computational Biology & Bioinformatics (pp. 105-113). Springer, Cham.

5. ISSUES

For any issue let us know at issues link.

6. LICENSE

GPL v3.

For more information:

http://www.gnu.org/licenses/gpl-3.0.html