Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ftp.ebi.ac.uk avaialable, but GTEx_ge_brain_frontal_cortex.all.tsv.gz times out #5

Open
paul-shannon opened this issue Nov 23, 2021 · 10 comments
Assignees

Comments

@paul-shannon
Copy link

Thank for this fine package - very useful in our work on Alzheimer's Disease.

I find intermittent - sometimes lasting - problems with the ftp service the package uses.
Here is an example, establishing first that connectivity is good, then showing the error.

ping ftp.ebi.ac.uk
PING ftp.g.ebi.ac.uk (193.62.197.74): 56 data bytes
64 bytes from 193.62.197.74: icmp_seq=0 ttl=53 time=164.786 ms
64 bytes from 193.62.197.74: icmp_seq=1 ttl=53 time=180.432 ms
64 bytes from 193.62.197.74: icmp_seq=2 ttl=53 time=166.346 ms
64 bytes from 193.62.197.74: icmp_seq=3 ttl=53 time=169.450 ms

The specific file request times out:

[E::hts_open_format] Failed to open file "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz" :
Operation timed out
Couldn't open "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz": Operation timed out
zcat: (stdin): unexpected end of file
@paul-shannon
Copy link
Author

I think this is a better problem report:

eQTL_Catalogue.fetch(unique_id="GTEx.brain_frontal_cortex", chrom="8",, bp_lower=27610984, bp_upper=27610987)
[1] "CONDA:: Could not identify tabix executable in echoR env. Defaulting to generic 'tabix' command"
[1] "tabix ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz 8:27610984-27610987"
[E::hts_open_format] Failed to open file "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz" : Operation timed out
Couldn't open "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz": Operation timed out

My tabix is Cellar/htslib/1.14/bin/tabix

@paul-shannon
Copy link
Author

more info. running on ubuntu, a different tabix, same problem:

tabix ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz 8:27610984-276109801
[E::hts_open_format] Failed to open file "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz" : Operation timed out
Couldn't open "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz": Operation timed out

any thoughts? It's clear this problem is outside of catalogueR!

@bschilder
Copy link
Member

bschilder commented Nov 28, 2021

Hi @paul-shannon, glad you're finding this tool useful. Thanks for pointing out this issue. I'll look into this and try to figure out what's going on here.

Some potential sources:

  • File names have changed.
  • I need to raise the timeout limit.
  • The FTP server (or connections to it) is unstable

Potentially related: eQTL-Catalogue/eQTL-Catalogue-resources#15

@bschilder
Copy link
Member

@kauralasoo is there anything on eQTL Catalogue's end that might be causing unstable connections to the FTP server?

I just confirmed that the file paths haven't changed, so they do indeed seem to exist.

@kauralasoo
Copy link

Hi @paul-shannon and @bschilder,

We just received a confirmation form the EBI helpdesk that the root cause for this was that Paul's IP address had been blocked by the EBI firewall. Paul's IP has been whitelisted now, but unfortunately there is no good solution prevent it from happening to other users, because tabix requests over FTP (incomplete downloads) look a lot like DDoS attacks to the firewall. The REST API is much more robust, because it is able to rate limit the number of requests by IP address on its own.

Best,
Kaur

@bschilder
Copy link
Member

bschilder commented Nov 30, 2021

Thanks so much for the response @kauralasoo! This is all really helpful info. I'll make some adjustments to catalogueR and may make it so that the REST API is the default method.

Update in dev branch

  • Changed eQTL_Catalogue.query so that the default is use_tabix=FALSE due to
    instability of using tabix with the EBI server.

@paul-shannon
Copy link
Author

paul-shannon commented Nov 30, 2021 via email

@bschilder
Copy link
Member

Thanks for the helpful info @paul-shannon, hadn't realized this!

catalogueR::eQTL_Catalogue.list_datasets currently relies on the metadata provided here, tabix_ftp_paths.tsv:
https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/tabix/tabix_ftp_paths.tsv

It looks like there is another file called tabix_ftp_paths_imported.tsv:
https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/tabix/tabix_ftp_paths_imported.tsv

I'll modify catalogueR::eQTL_Catalogue.list_datasets to integrate this second file as well (with a tryCatch in case it doesn't exist in the future).

@bschilder
Copy link
Member

bschilder commented Dec 1, 2021

I've just updated the metadata to include GTEX_V8. I also added a new arg to eQTL_Catalogue.list_datasets called include_imported. Setting this to TRUE (default) will integrate the additional datasets in /tabix_ftp_paths_imported.tsv

Currently implemented in the dev branch.

@bschilder
Copy link
Member

bschilder commented Sep 12, 2022

I'm in the process of overhauling catalogueR to make it compatible with (and take advantage of) the rest of the echoverse, which has expanded quite a bit and is much more robust now.

@kauralasoo has anything changed regarding using tabix to query the eQTL Catalogue? If not, I'm going to add the following instructions whenever someone tries to use the fetch_tabix() function:

WARNING: Querying eQTL Catalogue with tabix will only work 
if your IP address has been whitelisted by an EMBL-EBI server administrator. 
Please request access via this form: 
https://www.ebi.ac.uk/about/contact/support/

@bschilder bschilder moved this from Todo to In Progress in 🦇🦇 echoverse 🦇🦇 Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants