You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently had a chance to try out DIAMOND using a GitHub Codespace VM with 32GB of RAM and I'm amazed by the program. The sequence I aligned was the sequenced human genome around a 3.2GB file, with a gene (OCA2) that has several proteins that control eye colour. It found 25 out of 27 other proteins that are identical to this gene. It outputted the sequence alignment in a TSV file (file format can change) and it looks like this,
For instance, XP_047288567.1 is the name of the protein inside the OCA2 Gene and this same protein XP_047288567.1 is identical to the protein NT_187660.1.
The entire DIAMOND program ran the sequence alignment for 6 minutes with a 16 Core computer with 32GB of RAM. It's a bit much but I think we can substantially decrease the performance intensity of it by following the documentation and changing the input into the program. The tutorial to run this program is located here. The DB file I used was just the gene OCA2 and the input file was the human genome. The prompt I used to run the program was, ./diamond blastx -q GRCh38_latest_genomic.fna -d eyecolorprotein_Database.dmnd -o out.tsv --threads 16 --very-sensitive --masking 0 incase you want to try it yourself. The files I used are below incase you want to run the program yourself. The human genome can be downloaded from here.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I recently had a chance to try out DIAMOND using a GitHub Codespace VM with 32GB of RAM and I'm amazed by the program. The sequence I aligned was the sequenced human genome around a 3.2GB file, with a gene (OCA2) that has several proteins that control eye colour. It found 25 out of 27 other proteins that are identical to this gene. It outputted the sequence alignment in a TSV file (file format can change) and it looks like this,
For instance,
XP_047288567.1
is the name of the protein inside the OCA2 Gene and this same proteinXP_047288567.1
is identical to the proteinNT_187660.1
.The entire DIAMOND program ran the sequence alignment for 6 minutes with a 16 Core computer with 32GB of RAM. It's a bit much but I think we can substantially decrease the performance intensity of it by following the documentation and changing the input into the program. The tutorial to run this program is located here. The DB file I used was just the gene OCA2 and the input file was the human genome. The prompt I used to run the program was,
./diamond blastx -q GRCh38_latest_genomic.fna -d eyecolorprotein_Database.dmnd -o out.tsv --threads 16 --very-sensitive --masking 0
incase you want to try it yourself. The files I used are below incase you want to run the program yourself. The human genome can be downloaded from here.DIAMOND-Test-File.zip
Beta Was this translation helpful? Give feedback.
All reactions