-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify and remediate published items that are missing public cocina in the purl-fetcher db #5181
Comments
I haven't yet figured out how to determine if the item has been previously published since I'm bumping into some problems with querying the workflow service. I'm also trying to figure out in the meantime the best way to know if that previous publishing happened prior to the versioning work. But in the meantime, here is a report on the druids that don't have public json in the purl-fetcher DB. It includes:
|
I spot-checked some druids and every one I checked had the problem with |
@andrewjbtw I've updated the report to include each druid's version 1 "published" milestone date. It looks like all of these were published at least once before migration (the latest "first published" date is 2023-07-06). https://github.com/sul-dlss/dor-services-app/pull/5182/files |
To actually remediate these (or a subset, such as those that are closed) we need to come up with a migration strategy for migrating public cocina in the future. Since we unmounted |
Rather than migrating, will allow these to proceed without erroring via: sul-dlss/purl-fetcher#932, pending testing on stage. |
@andrewjbtw these items are now possible to republish without raising an error. Republishing will create a public JSON record and update the cocina.json on purl. |
@lwrubel sorry for the delay in getting back to this, but I finally went through the druids in the report and something seems off. There are 11,686 druids in the list and all but 9 of them are Google scans, and those 9 were all in an Opened state until I closed them today (which led immediately to publish errors). I'm going to republish the items anyway, since that could turn up issues. But I don't recall Google items being among the items that have been problems. Maybe I don't understand what creates a "problem" druid. I'm likely to end up republishing literally every accessioned item in the next few months, as that appears to be the only way to approach certainty in SDR. |
I agree it's odd that these are all Google Books, which have not typically had problems. I suspect these got into the state they're in on purl-fetcher not through any previous publishing problem but from a cocina.json migration problem that was undetected. But I'm not completely sure. |
This comes out of the discussion in this Slack thread about an item that was failing publish because of missing public Cocina JSON in the purl-fetcher DB. There are probably more items with a similar problem.
More specifically, items that meet these criteria are likely to fail
publish
until remediated:The publish problem seems to be that the publish step tries to diff the cocina for the currently accessioning version with the cocina for the previously-published version. But if there's no cocina in the purl-fetcher DB, the diff can't proceed.
We think it's likely that items that lack public cocina in purl-fetcher ran into a migration problem when we were populating the purl-fetcher db during the versioning work.
Additional background
There are also "Opened" items that don't have public cocina in the purl-fetcher DB even though they have purls. We should treat those differently and not include them in this remediation. Those items will fail a standalone publish (triggered from Argo) but if they are closed, they should successfully be accessioned.
We had to treat "Opened" items differently when migrating to the new versioning model for reasons that I'm finding difficult to summarize concisely in this issue. I may file a separate issue about them but need to gather more information. Ideally, we can solve the issue by closing them but they may contain unfinished in-progress changes.
The text was updated successfully, but these errors were encountered: