Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent InterProScan results between GAAS and manual run #110

Open
EmilieSmeets22 opened this issue Apr 9, 2024 · 10 comments
Open
Labels
bug Something isn't working

Comments

@EmilieSmeets22
Copy link

EmilieSmeets22 commented Apr 9, 2024

Hi,

I have different InterProScan results between running GAAS and running InterProScan manually, with the same input files.
I do not see any difference in the input arguments. As you know the backend very well maybe you could help me identifying what causes these differences.

What questions are:

  • Why can I not see the IPR, Pfam and GO codes/IDs in the merged GFF file?
  • Why genes are annotated differently between InterProScan (local install or web) and the install in GAAS?

Running InterProScan within GAAS:

[doutree@plop] $ module load Nextflow
[doutree@plop] $ cat ~/workspace/GFF/functional_annotation_param_chr01.yml
subworkflow: 'functional_annotation'
genome: '~/input/DAUCA_Kuroda_chr01.fa'
gff_annotation: '~/input/Daucus_carota.gene_chr_AGAT_chr01.gff'
blast_db_fasta: '~/input/uniprot_sprot.fasta'
outdir: '~/output/20240408_chr01'
[doutree@plop] $ cat ~/workspace/GFF/custom_config_chr01.txt
process {
    withName: 'INTERPROSCAN' {
        cpus     = 20
        memory   = 300.GB
        ext.args = [
            '--iprlookup',
            '--goterms',
            '-t p',
            '-dra',
            '-appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam'
        ].join(" ").trim()
    }
    withName: 'BLAST_BLASTP' {
        ext.args = '-max_target_seqs 1 -evalue 1e-6 -outfmt 6'
    }
}
[doutree@plop] $ nextflow run NBISweden/pipelines-nextflow -profile conda -params-file functional_annotation_param_chr01.yml -c custom_config_chr01.txt

N E X T F L O W  ~  version 22.10.1
Launching \`https://github.com/NBISweden/pipelines-nextflow\` [maniac_spence] DSL2 - revision: 5f66ae3cf2
[master]

         _  _ ___ ___ ___
        | \| | _ )_ _/ __|
        | .` | _ \| |\__ \
        |_|\_|___/___|___/ Annotation Service



        Functional annotation workflow
        ===================================================
[f9/834f23] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 1 of 1 ✔
[15/7856b3] process > FUNCTIONAL_ANNOTATION:GFF2P... [100%] 1 of 1 ✔
[3a/fbbf8e] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 6 of 6 ✔
[49/81bd45] process > FUNCTIONAL_ANNOTATION:INTER... [100%] 6 of 6 ✔
[b0/e757df] process > FUNCTIONAL_ANNOTATION:MERGE... [100%] 1 of 1 ✔

        Workflow completed successfully.

        Thank you for using our workflow.
        Results are located in the folder: ~/output/20240408_chr01

Completed at: 08-Apr-2024 16:57:29
Duration    : 10m 10s
CPU hours   : 5.9
Succeeded   : 15

[doutree@plop] $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   exon    24795   24945   .       -       .       ID=NBISE00000000001;Parent=NBISM00000000001;makerName=nbis-exon-1
chr01   maker   exon    26435   26604   .       -       .       ID=NBISE00000000002;Parent=NBISM00000000001;makerName=nbis-exon-2
chr01   maker   exon    27851   27929   .       -       .       ID=NBISE00000000003;Parent=NBISM00000000001;makerName=nbis-exon-3
chr01   maker   exon    28302   28423   .       -       .       ID=NBISE00000000004;Parent=NBISM00000000001;makerName=nbis-exon-4
chr01   maker   exon    30953   31012   .       -       .       ID=NBISE00000000005;Parent=NBISM00000000001;makerName=nbis-exon-5
chr01   maker   CDS     24795   24945   .       -       1       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-5
chr01   maker   CDS     26435   26604   .       -       0       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-4
[doutree@plop] $ grep mRNA Daucus_carota.gene_chr_AGAT_chr01.gff | head
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   mRNA    33922   37446   .       +       .       ID=NBISM00000000002;Parent=NBISG00000000002;makerName=DcarChr1G00000020.1;product=hypothetical protein
chr01   exonerate       mRNA    45536   50728   .       -       .       ID=NBISM00000000003;Parent=NBISG00000000003;Name=At5g41760;makerName=DcarChr1G00000030.1;product=CMP-sialic acid transporter 1;uniprot_id=Q8LGE9
chr01   exonerate       mRNA    90633   141688  .       -       .       ID=NBISM00000000004;Parent=NBISG00000000004;Name=GIP;makerName=DcarChr1G00000040.1;product=Copia protein;uniprot_id=P04146
chr01   maker   mRNA    145015  147063  .       +       .       ID=NBISM00000000005;Parent=NBISG00000000005;Name=NAKR2;makerName=DcarChr1G00000050.1;product=Protein SODIUM POTASSIUM ROOT DEFECTIVE 2;uniprot_id=Q58FZ0
chr01   exonerate       mRNA    164172  286235  .       +       .       ID=NBISM00000000006;Parent=NBISG00000000006;Name=GIP;makerName=DcarChr1G00000060.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    395432  509234  .       -       .       ID=NBISM00000000007;Parent=NBISG00000000007;Name=GIP;makerName=DcarChr1G00000070.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    534035  534211  .       -       .       ID=NBISM00000000008;Parent=NBISG00000000008;makerName=DcarChr1G00000080.1;product=hypothetical protein
chr01   maker   mRNA    639615  642189  .       +       .       ID=NBISM00000000009;Parent=NBISG00000000009;makerName=DcarChr1G00000090.1;product=hypothetical protein
chr01   transdecoder    mRNA    655131  661114  .       +       .       ID=NBISM00000000010;Parent=NBISG00000000010;Name=GLYR1;makerName=DcarChr1G00000100.1;product=Glyoxylate/succinic semialdehyde reductase 1;uniprot_id=Q9LSV0

Run InterProScan manually:

# Create protein FASTA sequence
[doutree@plop] ~ $ module load AGAT
[doutree@plop] ~ $ agat_sp_extract_sequences.pl -p -cfs -cis -ct 1 --g ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff -f ~/input/DAUCA_Kuroda_chr01.fa -o ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta
# Run InterProScan
[doutree@plop] ~ $ module load InterProScan
[doutree@plop] ~ $ interproscan.sh -version
InterProScan version 5.62-94.0
InterProScan 64-Bit build  (requires Java 11)
[doutree@plop] ~ $ interproscan.sh -i ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta -f TSV -b ~/output/Daucus_carota.gene_chr_prot.fasta_interpro
# Merge annotation
[doutree@plop] ~ $ ipr_update_gff ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff ~/output/Daucus_carota.gene_chr_prot.fasta_interpro.tsv > ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
[doutree@plop] ~ $ head ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
##gff-version 3
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1G00000010_1;Parent=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   CDS     24795   24945   .       -       1       ID=cds-5;Parent=DcarChr1G00000010_1
chr01   maker   exon    24795   24945   .       -       .       ID=nbis-exon-1;Parent=DcarChr1G00000010_1
chr01   maker   exon    26435   26604   .       -       .       ID=nbis-exon-2;Parent=DcarChr1G00000010_1
chr01   maker   CDS     26435   26604   .       -       0       ID=cds-4;Parent=DcarChr1G00000010_1
chr01   maker   exon    27851   27929   .       -       .       ID=nbis-exon-3;Parent=DcarChr1G00000010_1
chr01   maker   CDS     27851   27929   .       -       1       ID=cds-3;Parent=DcarChr1G00000010_1
chr01   maker   exon    28302   28423   .       -       .       ID=nbis-exon-4;Parent=DcarChr1G00000010_1
# Please note that we store functional annotation at the gene level so a slight difference here
[doutree@plop] ~ $ grep gene ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff | head
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   gene    33922   37446   .       +       .       ID=DcarChr1G00000020;Name=DcarChr1G00000020;Note=Protein of unknown function
chr01   exonerate       gene    45536   50728   .       -       .       ID=DcarChr1G00000030;Name=DcarChr1G00000030;Dbxref=InterPro:IPR007271,PFAM:PF04142,SUPERFAMILY:SSF103481,TIGRFAM:TIGR00803;Note=Similar to At5g41760: CMP-sialic acid transporter 1 (Arabidopsis thaliana)
chr01   exonerate       gene    90633   141688  .       -       .       ID=DcarChr1G00000040;Name=DcarChr1G00000040;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   maker   gene    145015  147063  .       +       .       ID=DcarChr1G00000050;Name=DcarChr1G00000050;Dbxref=InterPro:IPR006121,InterPro:IPR036163,PFAM:PF00403,PROSITE:PS50846,SUPERFAMILY:SSF55008;Note=Similar to NAKR2: Protein SODIUM POTASSIUM ROOT DEFECTIVE 2 (Arabidopsis thaliana)
chr01   exonerate       gene    164172  286235  .       +       .       ID=DcarChr1G00000060;Name=DcarChr1G00000060;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   exonerate       gene    395432  509234  .       -       .       ID=DcarChr1G00000070;Name=DcarChr1G00000070;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   exonerate       gene    534035  534211  .       -       .       ID=DcarChr1G00000080;Name=DcarChr1G00000080;Note=Protein of unknown function
chr01   maker   gene    639615  642189  .       +       .       ID=DcarChr1G00000090;Name=DcarChr1G00000090;Note=Protein of unknown function
chr01   transdecoder    gene    655131  661114  .       +       .       ID=DcarChr1G00000100;Name=DcarChr1G00000100;Dbxref=InterPro:IPR006115,InterPro:IPR008927,InterPro:IPR029154,InterPro:IPR036291,PFAM:PF03446,PFAM:PF14833,SUPERFAMILY:SSF48179,SUPERFAMILY:SSF51735;Note=Similar to GLYR1: Glyoxylate/succinic semialdehyde reductase 1 (Arabidopsis thaliana)

I am using one chromosome as a test (chr01) from a public source, a carrot reference genome. I can provide the input files if that helps to identify the discordance.

I have run the first gene sequence thru web InterProScan and here are the results:

DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	PANTHER	PTHR10231	NUCLEOTIDE-SUGAR TRANSMEMBRANE TRANSPORTER	61	152	5.8E-21	T	09-04-2024	IPR007271	Nucleotide-sugar transporter	GO:0000139(InterPro)|GO:0015136(PANTHER)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0030173(PANTHER)|GO:0090481(InterPro)	Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	TRANSMEMBRANE	Region of a membrane-bound protein predicted to be embedded in the membrane.	71	88	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	TRANSMEMBRANE	Region of a membrane-bound protein predicted to be embedded in the membrane.	108	128	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Pfam	PF04142	Nucleotide-sugar transporter	61	153	2.6E-9	T	09-04-2024	IPR007271	Nucleotide-sugar transporter	GO:0000139(InterPro)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0090481(InterPro)	Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	NON_CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the extracellular region.	89	107	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm.	129	193	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm.	1	70	-	T	09-04-2024	-	-	-	-

Sequence used:

>DcarChr1G00000010.1
MPMEECKAANHDEYFDGEIDGILTTLSQSDGSYKYDYATAPFLAEIFKVLNISRCPVSIDRLFLRRKLSN
LQWMAIFPLAIGTTTSQVKGCGEASCDSLFSSPISGYMLGVLSSCLSALAGIYTEFWLKKNNDDLYWKNV
QLYTCCIPSKTVLDFLLEEKTTKRLVFNQDTMPMEECKAANHDKYFDGEIDVA

Thank you for your cooperation.

Kind regards,
Emilie

@Juke34 Juke34 transferred this issue from NBISweden/GAAS Apr 18, 2024
@Juke34
Copy link
Collaborator

Juke34 commented Apr 18, 2024

The tool/pipeline you ran does not originate from https://github.com/NBISweden/GAAS repository but from this repository (https://github.com/NBISweden/pipelines-nextflow/). I transferred your issue here to be better monitored

@mahesh-panchal
Copy link
Collaborator

Looking at your command for running Interproscan manually, it's not the same as the command run in the workflow.
The command you're running in the workflow should look like:

    interproscan.sh \
        -cpu 20 \
        -i ${fasta_name} \
        -f tsv \
        -dp \
        --iprlookup  --goterms -t p -dra -appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam \
         \
        -o ${prefix}.tsv

The differences in output is likely due to the differences in options.

@EmilieSmeets22
Copy link
Author

I apologize for the delay, I had an issue with my local installation of InterProScan but it is fixed now.

As you advised I re-run Interproscan manually with the additional parameters. As expected we can see now see these additional annotations at the GFF gene's entries. However It is not answering my original question regarding interproscan hits missing.

## Query with usual parameters:
# Run Interproscan
[doutree@plop] ~ $  interproscan.sh -i input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta -f TSV -o output/Daucus_carota.gene_chr_prot.fasta_interpro.tsv
# Add annotations to GFF file
[doutree@plop] ~ $  ipr_update_gff input/Daucus_carota.gene_chr_AGAT_chr01.gff output/Daucus_carota.gene_chr_prot.fasta_interpro.tsv > output/Daucus_carota.gene_chr_prot.fasta_IPS.gff
# Output GFF file
[doutree@plop] ~ $ head output/Daucus_carota.gene_chr_prot.fasta_IPS.gff
##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PANTHER:PTHR10231,PFAM:PF04142;
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1G00000010.1;Parent=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PANTHER:PTHR10231,PFAM:PF04142;
chr01   maker   exon    24795   24945   .       -       .       ID=nbis-exon-1;Parent=DcarChr1G00000010.1
chr01   maker   exon    26435   26604   .       -       .       ID=nbis-exon-2;Parent=DcarChr1G00000010.1
chr01   maker   exon    27851   27929   .       -       .       ID=nbis-exon-3;Parent=DcarChr1G00000010.1
chr01   maker   exon    28302   28423   .       -       .       ID=nbis-exon-4;Parent=DcarChr1G00000010.1
chr01   maker   exon    30953   31012   .       -       .       ID=nbis-exon-5;Parent=DcarChr1G00000010.1
chr01   maker   CDS     24795   24945   .       -       1       ID=cds-5;Parent=DcarChr1G00000010.1
chr01   maker   CDS     26435   26604   .       -       0       ID=cds-4;Parent=DcarChr1G00000010.1
## With additional parameters
# Run Interproscan
[doutree@plop] ~ $  interproscan.sh -cpu 20  -i input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta  -f TSV  -dp  --iprlookup --goterms -t p -dra -appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam -o output/Daucus_carota.gene_chr_prot.fasta_interpro_updateParam.tsv
# Add annotations to GFF file
[doutree@plop] ~ $   ipr_update_gff input/Daucus_carota.gene_chr_AGAT_chr01.gff output/Daucus_carota.gene_chr_prot.fasta_interpro_updateParam.tsv > output/Daucus_carota.gene_chr_prot.fasta_IPS_updateParam.gff
# Output GFF file
[doutree@plop] ~ $ head output/Daucus_carota.gene_chr_prot.fasta_IPS_updateParam.gff

##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PANTHER:PTHR10231,PFAM:PF04142;Ontology_term=GO:0000139,GO:0015136,GO:0015165,GO:0016020,GO:0030173,GO:0090481;
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1G00000010.1;Parent=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PANTHER:PTHR10231,PFAM:PF04142;Ontology_term=GO:0000139,GO:0015136,GO:0015165,GO:0016020,GO:0030173,GO:0090481;
chr01   maker   exon    24795   24945   .       -       .       ID=nbis-exon-1;Parent=DcarChr1G00000010.1
chr01   maker   exon    26435   26604   .       -       .       ID=nbis-exon-2;Parent=DcarChr1G00000010.1
chr01   maker   exon    27851   27929   .       -       .       ID=nbis-exon-3;Parent=DcarChr1G00000010.1
chr01   maker   exon    28302   28423   .       -       .       ID=nbis-exon-4;Parent=DcarChr1G00000010.1
chr01   maker   exon    30953   31012   .       -       .       ID=nbis-exon-5;Parent=DcarChr1G00000010.1
chr01   maker   CDS     24795   24945   .       -       1       ID=cds-5;Parent=DcarChr1G00000010.1
chr01   maker   CDS     26435   26604   .       -       0       ID=cds-4;Parent=DcarChr1G00000010.1

My concern is regarding the potential differences in sensitivity or quality threshold between the Interproscan query run within GAAS and manual queries. My worry is that these differences may result in missing functional annotations.
For instance, in this particular example, the manual queries of Interproscan return hits on the first gene:

chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Dbxref=InterPro:IPR007271

Which are not returned by the Interproscan query of GAAS:

chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1M00000000001;Parent=DcarChr1G00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9

While I appreciate the advantages of using GAAS as an automated tool, I am also cautious about compromising the quality of the annotation. If there is a parameter that can be adjusted to ensure consistent annotations, I would greatly appreciate your guidance in that regard.

Thank you in advance
Emilie

@mahesh-panchal
Copy link
Collaborator

Perhaps @Juke34 has a better understanding, but I'm confused here. The functional annotation workflow doesn't use GAAS. I'm not familiar with how you've reached this step, so I'm not sure where GAAS has come into it.

@EmilieSmeets22
Copy link
Author

I am sorry if I am was not clear: my issue concerns the way InterProScan is run via GAAS. I run InterProScan manually with seemingly the same parameters but I get different results/hits.
I ran also the same query for one gene sequence, as example, on InterProScan website; those results are different from the hits return that GAAS as well. Results are in the first comment.
Thank you in advance

@mahesh-panchal
Copy link
Collaborator

GAAS doesn't run InterProScan though. You mean this Nextflow workflow right ( which isn't GAAS, but some subworkflows use the GAAS package scripts)?

Since you're using conda, are you sure the Database versions in your local run are the same as the databases in the conda package?

@EmilieSmeets22
Copy link
Author

I apologize you are right I run the functional annotation step of the Nextflow workflow: https://github.com/NBISweden/pipelines-nextflow/blob/master/subworkflows/functional_annotation/README.md

I can see the version of InterProScan being run but I am not sure how to find information about the database version, either local or via Nextflow. Both instances are run via conda.
I checked on InterProScan website and there again I cannot see databases version information: https://www.ebi.ac.uk/interpro/search/sequence/

@mahesh-panchal
Copy link
Collaborator

I just tested making an interproscan installation using conda to see what the databases are: It seems the version is tied to the build. However something I didn't know about is the databases are not packaged with conda. It instructs to download the databases which our pipeline doesn't do as I didn't know about it from the nf-core module.

$ conda create -n interproscan-env interproscan
Retrieving notices: ...working... done
Channels:
 - conda-forge
 - bioconda
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/interproscan-env

  added / updated specs:
    - interproscan


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |            2_gnu          23 KB  conda-forge
    alsa-lib-1.2.11            |       hd590300_1         542 KB  conda-forge
    blast-2.15.0               | pl5321h6f7f691_1       146.0 MB  bioconda
    bzip2-1.0.8                |       hd590300_5         248 KB  conda-forge
    c-ares-1.28.1              |       hd590300_0         165 KB  conda-forge
    ca-certificates-2024.2.2   |       hbcca054_0         152 KB  conda-forge
    cairo-1.18.0               |       h3faef2a_0         959 KB  conda-forge
    cath-tools-0.16.5          |       h78a066a_0        11.5 MB  bioconda
    curl-8.8.0                 |       he654da7_0         163 KB  conda-forge
    emboss-6.6.0               |       h6debe1e_0        94.5 MB  bioconda
    entrez-direct-21.6         |       he881be0_0        14.0 MB  bioconda
    expat-2.6.2                |       h59595ed_0         134 KB  conda-forge
    font-ttf-dejavu-sans-mono-2.37|       hab24e00_0         388 KB  conda-forge
    font-ttf-inconsolata-3.000 |       h77eed37_0          94 KB  conda-forge
    font-ttf-source-code-pro-2.038|       h77eed37_0         684 KB  conda-forge
    font-ttf-ubuntu-0.83       |       h77eed37_2         1.5 MB  conda-forge
    fontconfig-2.14.2          |       h14ed4e7_0         266 KB  conda-forge
    fonts-conda-ecosystem-1    |                0           4 KB  conda-forge
    fonts-conda-forge-1        |                0           4 KB  conda-forge
    freetype-2.12.1            |       h267a509_2         620 KB  conda-forge
    gettext-0.22.5             |       h59595ed_2         464 KB  conda-forge
    gettext-tools-0.22.5       |       h59595ed_2         2.6 MB  conda-forge
    giflib-5.2.2               |       hd590300_0          75 KB  conda-forge
    graphite2-1.3.13           |    h59595ed_1003          95 KB  conda-forge
    harfbuzz-8.5.0             |       hfac3d4d_0         1.5 MB  conda-forge
    hmmer-3.4                  |       hdbdd923_1        11.1 MB  bioconda
    hmmer2-2.3.2               |       h031d066_9         392 KB  bioconda
    icu-73.2                   |       h59595ed_0        11.5 MB  conda-forge
    interproscan-5.59_91.0     |       hec16e2b_1       167.0 MB  bioconda
    keyutils-1.6.1             |       h166bdaf_0         115 KB  conda-forge
    krb5-1.21.2                |       h659d440_0         1.3 MB  conda-forge
    lcms2-2.16                 |       hb7c19ff_0         239 KB  conda-forge
    ld_impl_linux-64-2.40      |       h55db66e_0         697 KB  conda-forge
    lerc-4.0.0                 |       h27087fc_0         275 KB  conda-forge
    libasprintf-0.22.5         |       h661eb56_2          42 KB  conda-forge
    libasprintf-devel-0.22.5   |       h661eb56_2          33 KB  conda-forge
    libcups-2.3.3              |       h4637d8d_4         4.3 MB  conda-forge
    libcurl-8.8.0              |       hca28451_0         396 KB  conda-forge
    libdeflate-1.20            |       hd590300_0          70 KB  conda-forge
    libedit-3.1.20191231       |       he28a2e2_2         121 KB  conda-forge
    libev-4.33                 |       hd590300_2         110 KB  conda-forge
    libexpat-2.6.2             |       h59595ed_0          72 KB  conda-forge
    libffi-3.4.2               |       h7f98852_5          57 KB  conda-forge
    libgcc-ng-13.2.0           |       h77fa898_7         758 KB  conda-forge
    libgd-2.3.3                |       h119a65a_9         219 KB  conda-forge
    libgettextpo-0.22.5        |       h59595ed_2         167 KB  conda-forge
    libgettextpo-devel-0.22.5  |       h59595ed_2          36 KB  conda-forge
    libgfortran-ng-7.5.0       |      h14aa051_20          23 KB  conda-forge
    libgfortran4-7.5.0         |      h14aa051_20         1.2 MB  conda-forge
    libglib-2.80.2             |       hf974151_0         3.7 MB  conda-forge
    libgomp-13.2.0             |       h77fa898_7         412 KB  conda-forge
    libiconv-1.17              |       hd590300_2         689 KB  conda-forge
    libidn2-2.3.7              |       hd590300_0         124 KB  conda-forge
    libjpeg-turbo-3.0.0        |       hd590300_1         604 KB  conda-forge
    libnghttp2-1.58.0          |       h47da74e_1         617 KB  conda-forge
    libnsl-2.0.1               |       hd590300_0          33 KB  conda-forge
    libpng-1.6.43              |       h2797004_0         281 KB  conda-forge
    libsqlite-3.45.3           |       h2797004_0         840 KB  conda-forge
    libssh2-1.11.0             |       h0841786_0         265 KB  conda-forge
    libstdcxx-ng-13.2.0        |       hc0a3c3a_7         3.7 MB  conda-forge
    libtiff-4.6.0              |       h1dd3fc0_3         276 KB  conda-forge
    libunistring-0.9.10        |       h7f98852_0         1.4 MB  conda-forge
    libuuid-2.38.1             |       h0b41bf4_0          33 KB  conda-forge
    libwebp-1.4.0              |       h2c329e2_0          90 KB  conda-forge
    libwebp-base-1.4.0         |       hd590300_0         429 KB  conda-forge
    libxcb-1.15                |       h0b41bf4_0         375 KB  conda-forge
    libxcrypt-4.4.36           |       hd590300_1          98 KB  conda-forge
    libzlib-1.2.13             |       hd590300_5          60 KB  conda-forge
    ncbi-vdb-3.1.1             |       h4ac6f70_0        10.7 MB  bioconda
    ncurses-6.5                |       h59595ed_0         867 KB  conda-forge
    openjdk-11.0.23            |       h24d6bf4_0       164.0 MB  conda-forge
    openssl-3.3.0              |       h4ab18f5_3         2.8 MB  conda-forge
    pcre-8.45                  |       h9c3ff4c_0         253 KB  conda-forge
    pcre2-10.43                |       hcad00b1_0         929 KB  conda-forge
    perl-5.32.1                | 7_hd590300_perl5        12.7 MB  conda-forge
    perl-archive-tar-2.40      | pl5321hdfd78af_0          33 KB  bioconda
    perl-carp-1.50             | pl5321hd8ed1ab_0          22 KB  conda-forge
    perl-common-sense-3.75     | pl5321hd8ed1ab_0          20 KB  conda-forge
    perl-compress-raw-bzip2-2.201| pl5321h166bdaf_0          54 KB  conda-forge
    perl-compress-raw-zlib-2.202| pl5321h166bdaf_0          83 KB  conda-forge
    perl-encode-3.21           | pl5321hd590300_0         1.7 MB  conda-forge
    perl-exporter-5.74         | pl5321hd8ed1ab_0          19 KB  conda-forge
    perl-exporter-tiny-1.002002| pl5321hd8ed1ab_0          28 KB  conda-forge
    perl-extutils-makemaker-7.70| pl5321hd8ed1ab_0         154 KB  conda-forge
    perl-io-compress-2.201     | pl5321hdbdd923_2          84 KB  bioconda
    perl-io-zlib-1.14          | pl5321hdfd78af_0          12 KB  bioconda
    perl-json-4.10             | pl5321hdfd78af_0          56 KB  bioconda
    perl-json-xs-2.34          | pl5321h4ac6f70_6          66 KB  bioconda
    perl-list-moreutils-0.430  | pl5321hdfd78af_0          32 KB  bioconda
    perl-list-moreutils-xs-0.430| pl5321h031d066_2          50 KB  bioconda
    perl-parent-0.241          | pl5321hd8ed1ab_0          13 KB  conda-forge
    perl-pathtools-3.75        | pl5321h166bdaf_0          49 KB  conda-forge
    perl-scalar-list-utils-1.63| pl5321h166bdaf_0          50 KB  conda-forge
    perl-storable-3.15         | pl5321h166bdaf_0          70 KB  conda-forge
    perl-types-serialiser-1.01 | pl5321hdfd78af_0          13 KB  bioconda
    pftools-2.3.5              |       h4333106_0         263 KB  bioconda
    pip-24.0                   |     pyhd8ed1ab_0         1.3 MB  conda-forge
    pixman-0.43.2              |       h59595ed_0         378 KB  conda-forge
    pthread-stubs-0.4          |    h36c2ea0_1001           5 KB  conda-forge
    python-3.12.3              |hab00c5b_0_cpython        30.5 MB  conda-forge
    readline-8.2               |       h8228510_1         275 KB  conda-forge
    setuptools-70.0.0          |     pyhd8ed1ab_0         472 KB  conda-forge
    sfld-1.1                   |       h031d066_3         196 KB  bioconda
    tk-8.6.13                  |noxft_h4845f30_101         3.2 MB  conda-forge
    tzdata-2024a               |       h0c530f3_0         117 KB  conda-forge
    wget-1.21.4                |       hda4d442_0         752 KB  conda-forge
    wheel-0.43.0               |     pyhd8ed1ab_1          57 KB  conda-forge
    xorg-fixesproto-5.0        |    h7f98852_1002           9 KB  conda-forge
    xorg-inputproto-2.3.2      |    h7f98852_1002          19 KB  conda-forge
    xorg-kbproto-1.0.7         |    h7f98852_1002          27 KB  conda-forge
    xorg-libice-1.1.1          |       hd590300_0          57 KB  conda-forge
    xorg-libsm-1.2.4           |       h7391055_0          27 KB  conda-forge
    xorg-libx11-1.8.9          |       h8ee46fc_0         809 KB  conda-forge
    xorg-libxau-1.0.11         |       hd590300_0          14 KB  conda-forge
    xorg-libxdmcp-1.1.3        |       h7f98852_0          19 KB  conda-forge
    xorg-libxext-1.3.4         |       h0b41bf4_2          49 KB  conda-forge
    xorg-libxfixes-5.0.3       |    h7f98852_1004          18 KB  conda-forge
    xorg-libxi-1.7.10          |       h7f98852_0          46 KB  conda-forge
    xorg-libxrender-0.9.11     |       hd590300_0          37 KB  conda-forge
    xorg-libxt-1.3.0           |       hd590300_1         370 KB  conda-forge
    xorg-libxtst-1.2.3         |    h7f98852_1002          31 KB  conda-forge
    xorg-recordproto-1.14.2    |    h7f98852_1002           8 KB  conda-forge
    xorg-renderproto-0.11.1    |    h7f98852_1002           9 KB  conda-forge
    xorg-xextproto-7.3.0       |    h0b41bf4_1003          30 KB  conda-forge
    xorg-xproto-7.0.31         |    h7f98852_1007          73 KB  conda-forge
    xz-5.2.6                   |       h166bdaf_0         409 KB  conda-forge
    zlib-1.2.13                |       hd590300_5          91 KB  conda-forge
    zstd-1.5.6                 |       ha6fb4c9_0         542 KB  conda-forge
    ------------------------------------------------------------
                                           Total:       725.5 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  alsa-lib           conda-forge/linux-64::alsa-lib-1.2.11-hd590300_1 
  blast              bioconda/linux-64::blast-2.15.0-pl5321h6f7f691_1 
  bzip2              conda-forge/linux-64::bzip2-1.0.8-hd590300_5 
  c-ares             conda-forge/linux-64::c-ares-1.28.1-hd590300_0 
  ca-certificates    conda-forge/linux-64::ca-certificates-2024.2.2-hbcca054_0 
  cairo              conda-forge/linux-64::cairo-1.18.0-h3faef2a_0 
  cath-tools         bioconda/linux-64::cath-tools-0.16.5-h78a066a_0 
  curl               conda-forge/linux-64::curl-8.8.0-he654da7_0 
  emboss             bioconda/linux-64::emboss-6.6.0-h6debe1e_0 
  entrez-direct      bioconda/linux-64::entrez-direct-21.6-he881be0_0 
  expat              conda-forge/linux-64::expat-2.6.2-h59595ed_0 
  font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0 
  font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0 
  font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0 
  font-ttf-ubuntu    conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_2 
  fontconfig         conda-forge/linux-64::fontconfig-2.14.2-h14ed4e7_0 
  fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0 
  fonts-conda-forge  conda-forge/noarch::fonts-conda-forge-1-0 
  freetype           conda-forge/linux-64::freetype-2.12.1-h267a509_2 
  gettext            conda-forge/linux-64::gettext-0.22.5-h59595ed_2 
  gettext-tools      conda-forge/linux-64::gettext-tools-0.22.5-h59595ed_2 
  giflib             conda-forge/linux-64::giflib-5.2.2-hd590300_0 
  graphite2          conda-forge/linux-64::graphite2-1.3.13-h59595ed_1003 
  harfbuzz           conda-forge/linux-64::harfbuzz-8.5.0-hfac3d4d_0 
  hmmer              bioconda/linux-64::hmmer-3.4-hdbdd923_1 
  hmmer2             bioconda/linux-64::hmmer2-2.3.2-h031d066_9 
  icu                conda-forge/linux-64::icu-73.2-h59595ed_0 
  interproscan       bioconda/linux-64::interproscan-5.59_91.0-hec16e2b_1 
  keyutils           conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 
  krb5               conda-forge/linux-64::krb5-1.21.2-h659d440_0 
  lcms2              conda-forge/linux-64::lcms2-2.16-hb7c19ff_0 
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.40-h55db66e_0 
  lerc               conda-forge/linux-64::lerc-4.0.0-h27087fc_0 
  libasprintf        conda-forge/linux-64::libasprintf-0.22.5-h661eb56_2 
  libasprintf-devel  conda-forge/linux-64::libasprintf-devel-0.22.5-h661eb56_2 
  libcups            conda-forge/linux-64::libcups-2.3.3-h4637d8d_4 
  libcurl            conda-forge/linux-64::libcurl-8.8.0-hca28451_0 
  libdeflate         conda-forge/linux-64::libdeflate-1.20-hd590300_0 
  libedit            conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2 
  libev              conda-forge/linux-64::libev-4.33-hd590300_2 
  libexpat           conda-forge/linux-64::libexpat-2.6.2-h59595ed_0 
  libffi             conda-forge/linux-64::libffi-3.4.2-h7f98852_5 
  libgcc-ng          conda-forge/linux-64::libgcc-ng-13.2.0-h77fa898_7 
  libgd              conda-forge/linux-64::libgd-2.3.3-h119a65a_9 
  libgettextpo       conda-forge/linux-64::libgettextpo-0.22.5-h59595ed_2 
  libgettextpo-devel conda-forge/linux-64::libgettextpo-devel-0.22.5-h59595ed_2 
  libgfortran-ng     conda-forge/linux-64::libgfortran-ng-7.5.0-h14aa051_20 
  libgfortran4       conda-forge/linux-64::libgfortran4-7.5.0-h14aa051_20 
  libglib            conda-forge/linux-64::libglib-2.80.2-hf974151_0 
  libgomp            conda-forge/linux-64::libgomp-13.2.0-h77fa898_7 
  libiconv           conda-forge/linux-64::libiconv-1.17-hd590300_2 
  libidn2            conda-forge/linux-64::libidn2-2.3.7-hd590300_0 
  libjpeg-turbo      conda-forge/linux-64::libjpeg-turbo-3.0.0-hd590300_1 
  libnghttp2         conda-forge/linux-64::libnghttp2-1.58.0-h47da74e_1 
  libnsl             conda-forge/linux-64::libnsl-2.0.1-hd590300_0 
  libpng             conda-forge/linux-64::libpng-1.6.43-h2797004_0 
  libsqlite          conda-forge/linux-64::libsqlite-3.45.3-h2797004_0 
  libssh2            conda-forge/linux-64::libssh2-1.11.0-h0841786_0 
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-13.2.0-hc0a3c3a_7 
  libtiff            conda-forge/linux-64::libtiff-4.6.0-h1dd3fc0_3 
  libunistring       conda-forge/linux-64::libunistring-0.9.10-h7f98852_0 
  libuuid            conda-forge/linux-64::libuuid-2.38.1-h0b41bf4_0 
  libwebp            conda-forge/linux-64::libwebp-1.4.0-h2c329e2_0 
  libwebp-base       conda-forge/linux-64::libwebp-base-1.4.0-hd590300_0 
  libxcb             conda-forge/linux-64::libxcb-1.15-h0b41bf4_0 
  libxcrypt          conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 
  libzlib            conda-forge/linux-64::libzlib-1.2.13-hd590300_5 
  ncbi-vdb           bioconda/linux-64::ncbi-vdb-3.1.1-h4ac6f70_0 
  ncurses            conda-forge/linux-64::ncurses-6.5-h59595ed_0 
  openjdk            conda-forge/linux-64::openjdk-11.0.23-h24d6bf4_0 
  openssl            conda-forge/linux-64::openssl-3.3.0-h4ab18f5_3 
  pcre               conda-forge/linux-64::pcre-8.45-h9c3ff4c_0 
  pcre2              conda-forge/linux-64::pcre2-10.43-hcad00b1_0 
  perl               conda-forge/linux-64::perl-5.32.1-7_hd590300_perl5 
  perl-archive-tar   bioconda/noarch::perl-archive-tar-2.40-pl5321hdfd78af_0 
  perl-carp          conda-forge/noarch::perl-carp-1.50-pl5321hd8ed1ab_0 
  perl-common-sense  conda-forge/noarch::perl-common-sense-3.75-pl5321hd8ed1ab_0 
  perl-compress-raw~ conda-forge/linux-64::perl-compress-raw-bzip2-2.201-pl5321h166bdaf_0 
  perl-compress-raw~ conda-forge/linux-64::perl-compress-raw-zlib-2.202-pl5321h166bdaf_0 
  perl-encode        conda-forge/linux-64::perl-encode-3.21-pl5321hd590300_0 
  perl-exporter      conda-forge/noarch::perl-exporter-5.74-pl5321hd8ed1ab_0 
  perl-exporter-tiny conda-forge/noarch::perl-exporter-tiny-1.002002-pl5321hd8ed1ab_0 
  perl-extutils-mak~ conda-forge/noarch::perl-extutils-makemaker-7.70-pl5321hd8ed1ab_0 
  perl-io-compress   bioconda/linux-64::perl-io-compress-2.201-pl5321hdbdd923_2 
  perl-io-zlib       bioconda/noarch::perl-io-zlib-1.14-pl5321hdfd78af_0 
  perl-json          bioconda/noarch::perl-json-4.10-pl5321hdfd78af_0 
  perl-json-xs       bioconda/linux-64::perl-json-xs-2.34-pl5321h4ac6f70_6 
  perl-list-moreuti~ bioconda/noarch::perl-list-moreutils-0.430-pl5321hdfd78af_0 
  perl-list-moreuti~ bioconda/linux-64::perl-list-moreutils-xs-0.430-pl5321h031d066_2 
  perl-parent        conda-forge/noarch::perl-parent-0.241-pl5321hd8ed1ab_0 
  perl-pathtools     conda-forge/linux-64::perl-pathtools-3.75-pl5321h166bdaf_0 
  perl-scalar-list-~ conda-forge/linux-64::perl-scalar-list-utils-1.63-pl5321h166bdaf_0 
  perl-storable      conda-forge/linux-64::perl-storable-3.15-pl5321h166bdaf_0 
  perl-types-serial~ bioconda/noarch::perl-types-serialiser-1.01-pl5321hdfd78af_0 
  pftools            bioconda/linux-64::pftools-2.3.5-h4333106_0 
  pip                conda-forge/noarch::pip-24.0-pyhd8ed1ab_0 
  pixman             conda-forge/linux-64::pixman-0.43.2-h59595ed_0 
  pthread-stubs      conda-forge/linux-64::pthread-stubs-0.4-h36c2ea0_1001 
  python             conda-forge/linux-64::python-3.12.3-hab00c5b_0_cpython 
  readline           conda-forge/linux-64::readline-8.2-h8228510_1 
  setuptools         conda-forge/noarch::setuptools-70.0.0-pyhd8ed1ab_0 
  sfld               bioconda/linux-64::sfld-1.1-h031d066_3 
  tk                 conda-forge/linux-64::tk-8.6.13-noxft_h4845f30_101 
  tzdata             conda-forge/noarch::tzdata-2024a-h0c530f3_0 
  wget               conda-forge/linux-64::wget-1.21.4-hda4d442_0 
  wheel              conda-forge/noarch::wheel-0.43.0-pyhd8ed1ab_1 
  xorg-fixesproto    conda-forge/linux-64::xorg-fixesproto-5.0-h7f98852_1002 
  xorg-inputproto    conda-forge/linux-64::xorg-inputproto-2.3.2-h7f98852_1002 
  xorg-kbproto       conda-forge/linux-64::xorg-kbproto-1.0.7-h7f98852_1002 
  xorg-libice        conda-forge/linux-64::xorg-libice-1.1.1-hd590300_0 
  xorg-libsm         conda-forge/linux-64::xorg-libsm-1.2.4-h7391055_0 
  xorg-libx11        conda-forge/linux-64::xorg-libx11-1.8.9-h8ee46fc_0 
  xorg-libxau        conda-forge/linux-64::xorg-libxau-1.0.11-hd590300_0 
  xorg-libxdmcp      conda-forge/linux-64::xorg-libxdmcp-1.1.3-h7f98852_0 
  xorg-libxext       conda-forge/linux-64::xorg-libxext-1.3.4-h0b41bf4_2 
  xorg-libxfixes     conda-forge/linux-64::xorg-libxfixes-5.0.3-h7f98852_1004 
  xorg-libxi         conda-forge/linux-64::xorg-libxi-1.7.10-h7f98852_0 
  xorg-libxrender    conda-forge/linux-64::xorg-libxrender-0.9.11-hd590300_0 
  xorg-libxt         conda-forge/linux-64::xorg-libxt-1.3.0-hd590300_1 
  xorg-libxtst       conda-forge/linux-64::xorg-libxtst-1.2.3-h7f98852_1002 
  xorg-recordproto   conda-forge/linux-64::xorg-recordproto-1.14.2-h7f98852_1002 
  xorg-renderproto   conda-forge/linux-64::xorg-renderproto-0.11.1-h7f98852_1002 
  xorg-xextproto     conda-forge/linux-64::xorg-xextproto-7.3.0-h0b41bf4_1003 
  xorg-xproto        conda-forge/linux-64::xorg-xproto-7.0.31-h7f98852_1007 
  xz                 conda-forge/linux-64::xz-5.2.6-h166bdaf_0 
  zlib               conda-forge/linux-64::zlib-1.2.13-hd590300_5 
  zstd               conda-forge/linux-64::zstd-1.5.6-ha6fb4c9_0 


Proceed ([y]/n)? y


Downloading and Extracting Packages:
                                                                                                                                                                                                                                                                                                     
Preparing transaction: done                                                                                                                                                                                                                                                                          
Verifying transaction: done                                                                                                                                                                                                                                                                          
Executing transaction: |                                                                                                                                                                                                                                                                             
######################################                                                                                                                                                                                                                                                               
# First time usage please README !!! #                                                                                                                                                                                                                                                               
######################################                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                     
The databases are huge and consequently not shipped within this installation.                                                                                                                                                                                                                        
Please download and install the Databases manually by following the commands below:                                                                                                                                                                                                                  
!!! /!\ Edit the 2 first lines to match the wished version of the DB /!\ !!!                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                     
Commands:                                                                                                                                                                                                                                                                                            
=========                                                                                                                                                                                                                                                                                            
# See here for latest db available: https://github.com/ebi-pf-team/interproscan or http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/                                                                                                                                                                 
# Set versions                                                                                                                                                                                                                                                                                       
version_major=5.59                                                                                                                                                                                                                                                                                   
version_minor=91.0                                                                                                                                                                                                                                                                                   
CONDA_PREFIX=/the/path/to/your/interproscan/conda/env/                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                     
# get the md5 of the databases                                                                                                                                                                                                                                                                       
wget http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/${version_major}-${version_minor}/interproscan-${version_major}-${version_minor}-64-bit.tar.gz.md5                                                                                                                                             
# get the databases (with core because much faster to download)                                                                                                                                                                                                                                      
wget http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/${version_major}-${version_minor}/interproscan-${version_major}-${version_minor}-64-bit.tar.gz                                                                                                                                                 
# checksum                                                                                                                                                                                                                                                                                           
md5sum -c interproscan-${version_major}-${version_minor}-64-bit.tar.gz.md5
# untar gz
tar xvzf interproscan-${version_major}-${version_minor}-64-bit.tar.gz
# remove the sample DB bundled by default
rm -rf $CONDA_PREFIX/share/InterProScan/data/
# copy the new db
cp -r interproscan-${version_major}-${version_minor}/data $CONDA_PREFIX/share/InterProScan/


INFO:
====
Phobius (licensed software), SignalP, SMART (licensed components) and TMHMM use
licensed code and data provided by third parties. If you wish to run these
analyses it will be necessary for you to obtain a licence from the vendor and
configure your local InterProScan installation to use them.
(see more information in $CONDA_PREFIX/share/InterProScan/data/<db>)




done
#
# To activate this environment, use
#
#     $ conda activate interproscan-env
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Did you also follow the extra instructions when making the local interproscan conda installation?

@mahesh-panchal mahesh-panchal added the bug Something isn't working label May 24, 2024
@mahesh-panchal
Copy link
Collaborator

I've made a pull request to update the Interproscan module on nf-core to fix the missing database issue and once that's in, I can include it here.

nf-core/modules#5688

@EmilieSmeets22
Copy link
Author

Thank you for looking into this.

I run Nextflow workflow using conda but my local InterProScan install is not run via Conda but it is similar to what you described above:

We download the following tarballs from http://ftp.ebi.ac.uk/pub/software/unix/iprscan
interproscan-core-5.67-99.0.tar.gz
interproscan-data-5.67-99.0.tar.gz
We unpack interproscan-core-5.67-99.0.tar.gz to $INSTALLDIR
We unpack interproscan-data-5.67-99.0.tar.gz to another location: /data/prod/Tools/InterProScan/5.67-99.0/data

Then we are performing the following commands:

sed -i "s@EASEL_DIR=@EASEL_DIR=$INSTALLDIRHMMER_interproscan/3.1b2/easel@" $INSTALLDIR/src/sfld/1.1/Makefile
cd $INSTALLDIR/src/sfld/1.1/ && make
cp -f $INSTALLDIR/src/sfld/1.1/sfld_postprocess $INSTALLDIR/bin/sfld/
cp -f $INSTALLDIR/src/sfld/1.1/sfld_preprocess.py $INSTALLDIR/bin/sfld/

We adapt the following line in $INSTALLDIR/interproscan.properties, so that it is making use of the data from interproscan-data-5.67-99.0.tar.gz
data.directory=/data/prod/Tools/InterProScan/5.67-99.0/data

I understand that I am using a newer version of InterProScan compared to your pipeline, so indeed that's a nice bug catch.
However I do no think this is the reason why different InterProScan hits are found on the same gene sequence, I will be curious to test it when it is ready.

Thank you

@mahesh-panchal mahesh-panchal linked a pull request May 28, 2024 that will close this issue
@mahesh-panchal mahesh-panchal removed a link to a pull request May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants