-
Notifications
You must be signed in to change notification settings - Fork 4
Write new indexing process #348
Comments
Two strategies discussed on slack for improving efficiency are:
|
See also this issue about 2 invocations: samvera/hydra-head#402 |
A single baseline run on staging took nearly 10.5 hrs:
note that to reindex correctly you must run it twice (so that would take 21 hrs) |
benchmarking a multithreaded run:
Well that didn't work.
|
My PR's have been merged into ActiveFedora to batch solr adds, as well as order permissions objects first which should prevent double-index needed. Batch solr adds significantly (like order of magnitude) speed up reindexing. I am going to put both of these fixes in locally -- not monkey patching but making a new class that our @HackMasterA , does this make sense as an approach? Concurrency may speed it up even more, but I think it may be fast enough to be workable with these changes -- but I can try working on concurrency if we want even faster. I'm not totally positive I'll have avoided the need for double-index, we'll have to check that. |
Sounds great. Feel free to close this ticket without doing any concurrency; the performance gain was definitely the goal, as opposed to the strategy for getting there. |
okay, can't close the ticket without doing that other stuff first, will do. It'll still be slow -- takes 20 minutes just get all the ID's out of fedora, plus prob another 20-40 to actually index. But that'll still be an order of magnitude improvement! One step at a time, we'll do that first. |
ActiveFedora::Base.reindex_everything is inefficient and must be run twice. Given that it takes more than an entire workday to run even a single time, it would be worth rewriting so that we can run it only once and more efficiently.
The text was updated successfully, but these errors were encountered: