Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZODB blobstorage size growing at an uncontrollable rate #769

Open
AshRaghav opened this issue Oct 21, 2016 · 7 comments
Open

ZODB blobstorage size growing at an uncontrollable rate #769

AshRaghav opened this issue Oct 21, 2016 · 7 comments

Comments

@AshRaghav
Copy link

Hi,

Sorry if this appears to be a vague question - has anyone ever experienced the blobstorage growing exponentially?

We seem to have a mysterious and tumorous growth where it currently stands at 40GB. We are unsure on how to profile/analyse this blobstorage to figure out the possible reasons for this unsustainable size.

Can someone kindly recommend any profiling tools for blobstorage please? Is there any configuration that tells us what sort of elements are stored in the blobstorage?

Our environment is:-
Plone 4.3.3 (4308)
CMF 2.2.7
Zope 2.13.22
Python 2.7.6 (default, Jun 2 2016, 08:43:38) [GCC 4.8.4]
PIL 2.3.0 (Pillow)

We have roughly around 2000 users. About 50 Forms and a similar number of Views. The audit trail to collect "Save" and "Delete" actions for documents is on.

Not sure if this helps but the ZMI Index page shows around 13500 records. We do store documents but they are considerably less in size (probably about 2-3GB) as they are either PDF, Word documents or JPEGs not exceeding 5MB. The Data.fs is currently showing around 3.5GB in size.

I am hoping that the above information will be of some help but if I missed any useful information then please let me know.

Thanks

@AshRaghav
Copy link
Author

@ebrehault @jean - any ideas on what could be the problem here?

Is there a way to find out what the highest blobstorage folders are pointing to in the site? Massive size is affecting backups schedules and causing some major concerns. Any direction would be appreciated.

I forgot to mention that we are also using replication between two databases, so not sure if that is causing some issue.

Thanks

@ebrehault
Copy link
Member

Do you have File Attachements fields in your Plomino db ? If you don't, your problem is not related to Plomino. If you do, try to export your db docs as XML on the server, and then see how big it is.

Note: more generally, Plone might have a big blobstorage if:

  • you have enabled versioning on File contents
  • you never pack

@AshRaghav
Copy link
Author

AshRaghav commented Nov 12, 2016

Thanks @ebrehault ! That was a brilliant insight.

I have managed to export the XML and figured out that 5MB PDF file (downloaded to my desktop) uploaded to our website has bloated into 950MB XML file on the server. Taking a closer look reveals that the PDF file had 6 pages of screenshots of some images uploaded as an evidence to us. The same goes with 12MB word document (downloaded to my desktop) with screenshots/photographs of some evidences is showing up 1.2GB XML file.

I understand that it is a binary storage but - is this the way that blobs are always stored or are we doing something horribly wrong? Because what should've been an ideal storage size of around 5-7GB now looks to be 45GB on the server.

Regarding your suggestions

  1. Versioning on Files - no we haven't enabled that
  2. Packing - yes that was not enabled either but when I did the zeopack operation, the blobstorage reduced its size by around 2GB but then it didn't keep up for long. I have set the packing for once a week, however I think the perennial problem is around the way the blobstorage is storing the data.

Secondly, there is problem with exporting certain plomino documents where I get this error

Traceback (innermost last):

Module ZPublisher.Publish, line 138, in publish
Module ZPublisher.mapply, line 77, in mapply
Module ZPublisher.Publish, line 48, in call_object
Module Products.CMFPlomino.PlominoReplicationManager, line 1241, in manage_exportAsXML
Module Products.CMFPlomino.PlominoReplicationManager, line 1296, in exportAsXML
Module Products.CMFPlomino.PlominoReplicationManager, line 1319, in exportDocumentAsXML
Module xmlrpclib, line 1085, in dumps
Module xmlrpclib, line 632, in dumps
Module xmlrpclib, line 654, in __dump
Module xmlrpclib, line 735, in dump_struct
Module xmlrpclib, line 652, in __dump
TypeError: cannot marshal <class 'stripe.resource.Charge'> objects

Not sure how to get rid of this error as it comes from a python package - "stripe-1.25.0-py2.7.egg"

Thanks again

@ebrehault
Copy link
Member

That's because one of the items of the document is not serialisable. Regular Plomino items are supposed to be serialisable so I guess it is an item created programmatically by one of your formulas and it is not a simple type (like a string, integer, date, array, dict...).

@AshRaghav
Copy link
Author

Thanks @ebrehault. That is less of a worry considering what is happening with the blob storage.

Do you know if anyone else faced a similar issue where files containing screenshots or images have different sizes in blob storage and windows file system?

If I am able to fix the blob storage, then I might not have to worry too much with exporting the documents.

@ebrehault
Copy link
Member

No, never heard of it before.

@AshRaghav
Copy link
Author

No problem @ebrehault. I figured out the problem with blob storage bloating eventually.

Turns out that when we were replicating the data between two Plomino databases (not using replication tab), each attachment was being recreated in destination database on a daily basis along with updated data from source, thus causing the bloat over a period in time.

I have cleared out all the duplicates and also looking at other options to clear the views once they are processed to the destination database. I am not sure why replication is not being used in our case but I am hoping there was a certain reason around it.

What was previously a blobstorage of size 45GB is now 12GB.

Thanks for your help and support about exporting documents without which I would've not figured this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants