Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeOut when extracting a large dataset #7

Open
AnooshaCherukuri opened this issue Oct 10, 2017 · 5 comments
Open

TimeOut when extracting a large dataset #7

AnooshaCherukuri opened this issue Oct 10, 2017 · 5 comments
Assignees

Comments

@AnooshaCherukuri
Copy link

I have a big dataset with 8,00,000 records. When i do the extraction it came with the following error:

[2017-10-10 14:43:08,491: ERROR/MainProcess] Task extractor.extract[401c7ccc-7a3c-455e-a5f4-f23b804ae43d] raised unexpected: SearchIndexError('Solr returned an error: (u"Connection to server 'http://solr_server/solr/ckan/update/?commit=true' timed out: HTTPConnectionPool(host='#########', port=8983): Read timed out. (read timeout=60)",)',)
Traceback (most recent call last):
File "/usr/lib/ckan/default/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/celery/app/trace.py", line 438, in protected_call
return self.run(*args, **kwargs)
File "/usr/lib/ckan/default/src/ckanext-extractor/ckanext/extractor/tasks.py", line 94, in extract
index_for('package').update_dict(pkg_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 101, in update_dict
self.index_package(pkg_dict, defer_commit)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 295, in index_package
raise SearchIndexError(msg)
SearchIndexError: Solr returned an error: (u"Connection to server 'http://XXXXXXXXXXXXXXXXX/solr/ckan/update/?commit=true' timed out: HTTPConnectionPool(host='xxxxxxxxxx', port=8983): Read timed out. (read timeout=60)",)

Did anyone else had same issue or can anyone please let me know how to fix it.
Thanks in Advance.!!

@torfsen torfsen self-assigned this Oct 23, 2017
@torfsen
Copy link
Contributor

torfsen commented Oct 23, 2017

I'm currently out of office but will take a look at the problem once I'm back (around December).

@torfsen
Copy link
Contributor

torfsen commented Nov 20, 2017

From the traceback, this looks like updating the search index after the extraction fails on Solr's side. Does updating your search index manually on the command line work? Please try the following:

source /usr/lib/ckan/default/bin/activate
paster --plugin=ckan search-index rebuild --config=/etc/ckan/default/production.ini

(see the CKAN documentation for details)

@AnooshaCherukuri
Copy link
Author

Yes updating search index manually works fine.

@torfsen
Copy link
Contributor

torfsen commented Nov 27, 2017

Interesting. Is there anything in the Solr logs when the error happens?

@Freakbrain
Copy link

I have the same problem that I kill solr with huge data sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants