Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unusable #81

Closed
T35R6braPwgDJKq opened this issue Jan 26, 2021 · 2 comments
Closed

unusable #81

T35R6braPwgDJKq opened this issue Jan 26, 2021 · 2 comments

Comments

@T35R6braPwgDJKq
Copy link

T35R6braPwgDJKq commented Jan 26, 2021

hi, i tried to use spidy b.c. it looked promising.
Is it dead?

first:
sudo pip install -r requirements.txt
doest work, reppy is not installable (python 3.9)

snd:
Docker is a pita...
Please look into ConfigArgParse if you need config files BUT make sure that arguments can be used as well
with docker, there is no error log...
I ended with
docker run --rm -it -v $PWD:/data -w /data --entrypoint /src/app/spidy/crawler.py spidy
so that the error log is accessible (why is there no config option?!)

why is a suffix on the config file enforced? What is that? Windows?

thrd:
my config contained either an Ip or a hostname (resolved via /etc/hosts)
Spidy did not spider either.
For the hostname option it gave

ERROR: OSError
EXT: HTTPConnectionPool(host='example.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4ae176ecc0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Seems that it doesnt respect /etc/hosts?!
But neither did the ip option work...
e.g. '192.168.1.55/wiki/'

@rivermont
Copy link
Owner

Until Reppy is updated on PyPI with support for Python 3.9 we will need to remove it as a robots.txt parser, at least for versions >3.8 (see discussion at scrapy/scrapy/issues/5230, scrapy/scrapy/issues/5230, possibly fixed in scrapy/scrapy/pull/4759?). Currently looking into possible replacements; in the meantime I may create a separate branch without Reppy.

The Docker is a known issue, see #72. Am hoping to get it looked into soon.
The config process could definitely be updated, if a rewrite happens in the future that is one thing that needs done.
I'm not sure what config suffix you're referring to; do you mean the file path?

I'm not sure what the issue regarding your hosts file is, it's not like spidy makes requests in a unique way that aren't processed by hosts files. It looks like your local wiki was not connecting for some reason, or maybe a separate connection issue?

@rivermont
Copy link
Owner

Closing as first two issues are covered by #89 and #72, and third looks like a personal environment issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants