We can download UniRef
(e.g., UniRef50, UniRef90, UniRef100) protein database from the given website (https://www.uniprot.org/downloads). After successfully downloading, we have to uncompress the file.
user@machine:~$ wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/unirefX/unirefX.fasta.gz ### (e.g., X = {50, 90, 100})
user@machine:~$ gunzip unirefX.fasta.gz ### (e.g., X = {50, 90, 100})
user@machine:~$ /home/user/ncbi-blast-2.12.0+/bin/makeblastdb -in protein.fa -dbtype prot -out Pluto -parse_seqids
After that, we will get below files.
Pluto.pdb
Pluto.phr
Pluto.pin
Pluto.pog
Pluto.pos
Pluto.pot
Pluto.psq
Pluto.ptf
Pluto.pto
File = '/home/user/Bioinformatics/multiSequences.fa'
from Bio import SeqIO # Install (If you don't have it.): pip install biopython
C= 1
for record in SeqIO.parse(File, 'fasta'):
openFile = open(str(C) + '.fasta', 'w')
SeqIO.write(record, openFile, 'fasta')
C += 1
#end-for
- I renamed the origial name of FASTA sequence as it is helpful for tracking the implementation.
- I used sequential numerical order rather than the original sequence name.
- Renaming the sequence is optional.
- We will find the updated FASTA splitting procedure from given URL.
- We can also use Colab for the splitting Multiple FASTA sequence into single sequences [Update Implementation].
import glob
import os
### Parameters:
iteration = 3 # If we increase the number iteration, then we will get the good quality of PSSM.
evalue = 0.001 # E-value
###
C = 0
for file in sorted(glob.glob('*.fasta')):
os.system('/home/user/ncbi-blast-2.12.0+/bin/psiblast -query {} -db Pluto -num_iterations={} -evalue={} -out psiblastout.txt -out_ascii_pssm {}.pssm'.format(file, iteration, evalue, file))
#end-for
- We will find the updated implementation from given URL.