You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sha is a metadata field. As discussed, I've kicked the function off (it may take a few days to complete). It is hogging memory so let me know if I need to kill it and write a better version.
Once populated, we can use the sha meta data field to collect up duplicates. This can be done with a bit of shell magic as a one off (I could build this into a web page if needed):
In [1]: import documents.models as dm
In [2]: from django.db.models import Count
In [3]: duplicates = (d for d in dm.Document.objects.values('sha').annotate(dcount=Count('sha')) if d['dcount'] > 1)
In [4]: next(duplicates)
...
it might be helpful during clean-up to see the duplicates
The text was updated successfully, but these errors were encountered: