Parse Massive Pfam Stockholm Alignment file

Purpose

Output a query alignment of domain family from the Pfam Stockholm Alignment files

Motivation

In Pfam website, some domain family alignments are not downloadable. e.g The FN3 domain using NCBI sequence database . Thus we need to retrieve this data from the big files from the Pfam ftp site (ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/database_files/).

However, there is not such an existing tool to extract all the alignments given a domain ID from the Pfam multiple concatenated alignments files. The most relevant tool Biopython supports parsing Stockholm alignment format but does not support output alignment given a query domain accession ID.

Here, a handy script is written to extract alignments from Pfam file in Stockholm format given a domain ID from Pfam.

Usage

Example:

Python parse_pfam_stockholm.py Pfam-A.full.ncbi.gz PF00041 > output.file

Input 1: The Pfam file that contains full alignments of all Pfam-A families e.g. Pfam-A.full.ncbi.gz

Input 2: Pfam Domain ID e.g PF00041

Output: lines of sequences belong to that domain family with IDs and aligned sequences (including gaps)

Sequence_ID_1	Aligned_Sequence_1
Sequence_ID_2	Aligned_Sequence_2

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
parse_pfam_stockholm.py		parse_pfam_stockholm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parse Massive Pfam Stockholm Alignment file

Purpose

Motivation

Usage

Citation

About

Releases

Packages

Languages

XiaoleiZ/parse_pfam_stockholm

Folders and files

Latest commit

History

Repository files navigation

Parse Massive Pfam Stockholm Alignment file

Purpose

Motivation

Usage

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages