Skip to content

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.

License

Notifications You must be signed in to change notification settings

althonos/pymuscle5

Repository files navigation

pyMUSCLE5 Stars

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.

Actions

🗺️ Overview

MUSCLE is widely-used software for making multiple alignments of biological sequences. Version 5 of MUSCLE achieves highest scores on several benchmark tests and scales to thousands of sequences on a commodity desktop computer.

pyMUSCLE5 is a Python module that provides bindings to MUSCLE v5 using Cython. It directly interacts with the MUSCLE internals, which has the following advantages:

  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pymuscle5 as a dependency to your project, and stop worrying about the MUSCLE binaries being properly setup on the end-user machine.
  • no intermediate files: Everything happens in memory, in a Python object you fully control, so you don't have to invoke the MUSCLE CLI using a sub-process and temporary files. Sequences can be passed directly as strings or bytes, which avoids the overhead of formatting your input to FASTA for MUSCLE.
  • no OpenMP: The original MUSCLE code uses OpenMP to parallelize embarassingly-parallel tasks. In pyMUSCLE5 the dependency on OpenMP has been removed in favor of the Python threading module for better portability.

This library is in a very experimental stage at the moment, and consistency of the results across versions or platforms is not guaranteed yet.

🔧 Installing

At the moment pyMUSCLE5 is not available on PyPI. You can however install it directly from GitHub with:

$ pip install git+https://github.com/althonos/pymuscle5

💡 Example

Let's load some sequences sequence from a FASTA file, use an Aligner to align proteins together, and print the alignment in two-line FASTA format.

import os

import Bio.SeqIO
import pymuscle5

path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(Bio.SeqIO.parse(path, "fasta"))

sequences = [
    pymuscle5.Sequence(record.id.encode(), bytes(record.seq))
    for record in records
]

aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)

for seq in msa.sequences:
    print(f">{seq.name.decode()}")
    print(seq.sequence.decode())
import os

import skbio.io
import pymuscle5

path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(skbio.io.read(path, "fasta"))

sequences = [
    pymuscle5.Sequence(record.metadata["id"].encode(), record.values.view('B'))
    for record in records
]

aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)

for seq in msa.sequences:
    print(f">{seq.name.decode()}")
    print(seq.sequence.decode())

We need to use the view method to get the sequence viewable by Cython as an array of unsigned char.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the GNU General Public License v3.0. The MUSCLE code was written by Robert Edgar and is distributed under the terms of the GPLv3 as well. See vendor/muscle/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original MUSCLE authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

About

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published