We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unique_content_id
Current implementation of unique_content_id is unstable because the order of keys is not preserved in JsonField. On PostgreSQL by default JsonField uses jsonb, which does not preserve the order or whitespace. For more information, see https://docs.djangoproject.com/en/5.1/ref/models/fields/#django.db.models.JSONField and https://www.postgresql.org/docs/16/datatype-json.html.
JsonField
jsonb
This can lead to widespread duplication of advisory data, resulting in increased storage usage.
Below is a snippet to reproduce the bug, where the same advisory data leads to different unique_content_id values:
In [1]: from vulnerabilities import importer In [2]: from packageurl import PackageURL In [3]: from univers.version_range import VersionRange In [4]: from django.utils import timezone In [5]: from vulnerabilities.pipes.advisory import insert_advisory In [6]: from vulnerabilities.importer import AdvisoryData In [7]: advisory_data = importer.AdvisoryData( ...: aliases=["CVE-2020-13371337"], ...: summary="vulnerability description here", ...: affected_packages=[ ...: importer.AffectedPackage( ...: package=PackageURL(type="pypi", name="dummy"), ...: affected_version_range=VersionRange.from_string("vers:pypi/>=1.0.0|<=2.0.0"), ...: ) ...: ], ...: references=[importer.Reference(url="https://example.com/with/more/info/CVE-2020-13371337")], ...: date_published=timezone.now(), ...: url="https://test.com", ...: ) In [8]: r = insert_advisory(advisory_data, "test") In [9]: r.unique_content_id Out[9]: '2ececc550f7f6b5537e5f1a767ef0f25' In [10]: k = Advisory.objects.get(unique_content_id=r.unique_content_id) In [11]: k.unique_content_id Out[11]: '2ececc550f7f6b5537e5f1a767ef0f25' In [12]: k.date_imported = None # Change any field not used for computing content id In [12]: k.save() In [13]: k.unique_content_id Out[13]: 'bf83e58fc8f7eb54d04a59c27f0680f8' In [14]: assert k.unique_content_id == r.unique_content_id --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[14], line 1 ----> 1 assert k.unique_content_id == r.unique_content_id
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Current implementation of
unique_content_id
is unstable because the order of keys is not preserved inJsonField
.On PostgreSQL by default
JsonField
usesjsonb
, which does not preserve the order or whitespace.For more information, see https://docs.djangoproject.com/en/5.1/ref/models/fields/#django.db.models.JSONField and https://www.postgresql.org/docs/16/datatype-json.html.
This can lead to widespread duplication of advisory data, resulting in increased storage usage.
Below is a snippet to reproduce the bug, where the same advisory data leads to different
unique_content_id
values:The text was updated successfully, but these errors were encountered: