-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unusable #81
Comments
Until Reppy is updated on PyPI with support for Python 3.9 we will need to remove it as a robots.txt parser, at least for versions >3.8 (see discussion at scrapy/scrapy/issues/5230, scrapy/scrapy/issues/5230, possibly fixed in scrapy/scrapy/pull/4759?). Currently looking into possible replacements; in the meantime I may create a separate branch without Reppy. The Docker is a known issue, see #72. Am hoping to get it looked into soon. I'm not sure what the issue regarding your hosts file is, it's not like spidy makes requests in a unique way that aren't processed by hosts files. It looks like your local wiki was not connecting for some reason, or maybe a separate connection issue? |
hi, i tried to use spidy b.c. it looked promising.
Is it dead?
first:
sudo pip install -r requirements.txt
doest work, reppy is not installable (python 3.9)
snd:
Docker is a pita...
Please look into ConfigArgParse if you need config files BUT make sure that arguments can be used as well
with docker, there is no error log...
I ended with
docker run --rm -it -v $PWD:/data -w /data --entrypoint /src/app/spidy/crawler.py spidy
so that the error log is accessible (why is there no config option?!)
why is a suffix on the config file enforced? What is that? Windows?
thrd:
my config contained either an Ip or a hostname (resolved via /etc/hosts)
Spidy did not spider either.
For the hostname option it gave
ERROR: OSError
EXT: HTTPConnectionPool(host='example.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4ae176ecc0>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Seems that it doesnt respect /etc/hosts?!
But neither did the ip option work...
e.g. '192.168.1.55/wiki/'
The text was updated successfully, but these errors were encountered: