Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick tool to show shasum #180

Open
cccs-ip opened this issue Oct 11, 2014 · 2 comments
Open

quick tool to show shasum #180

cccs-ip opened this issue Oct 11, 2014 · 2 comments
Assignees

Comments

@cccs-ip
Copy link
Member

cccs-ip commented Oct 11, 2014

it might be helpful during clean-up to see the duplicates

@pwhipp
Copy link
Contributor

pwhipp commented Oct 14, 2014

sha is a metadata field. As discussed, I've kicked the function off (it may take a few days to complete). It is hogging memory so let me know if I need to kill it and write a better version.

Once populated, we can use the sha meta data field to collect up duplicates. This can be done with a bit of shell magic as a one off (I could build this into a web page if needed):

In [1]: import documents.models as dm

In [2]: from django.db.models import Count
In [3]: duplicates = (d for d in dm.Document.objects.values('sha').annotate(dcount=Count('sha')) if d['dcount'] > 1)

In [4]: next(duplicates)
...

@cccs-ip
Copy link
Member Author

cccs-ip commented Oct 14, 2014

Cool, thanks. I will leave this open and assigned to you to think about as we start work on the uploader / importer.

pwhipp added a commit that referenced this issue Oct 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants