Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oracle] zodbconvert incredibly slow. Here is a fix. #450

Open
NicolasGoeddel opened this issue Mar 22, 2021 · 10 comments
Open

[oracle] zodbconvert incredibly slow. Here is a fix. #450

NicolasGoeddel opened this issue Mar 22, 2021 · 10 comments

Comments

@NicolasGoeddel
Copy link

Hi there,

for reference you can find the whole issue here: https://community.plone.org/t/zodbconvert-incredibly-slow-solved/13606

While converting a filestorage/blobstorage to a relstorage using the oracle backend it takes a lot if time even if the database is very small.

I found the culprit in this SQL statement:

SELECT
    zoid,
    tid,
    prev_tid,
    current_object_state.state
FROM
    temp_store
        JOIN current_object_state
            USING (zoid)
WHERE (tid <> prev_tid)

It joins with the view current_object_state which in turn joins the two tables current_object and object_state using the columns ZOID and `TID':

CREATE OR REPLACE FORCE EDITIONABLE VIEW "CURRENT_OBJECT_STATE" (
	"ZOID",
	"TID",
	"STATE"
) AS 
	SELECT
		zoid,
		tid,
		state
	FROM object_state
		JOIN current_object
			USING (zoid, tid);

The table object_state already has an unique index over (ZOID, TID) but current_object has not. So I created the index by myself:

CREATE UNIQUE INDEX "CURRENT_OBJECT_UI" ON "CURRENT_OBJECT" ("ZOID", "TID")

After that the SELECT statement I mentioned above only needs 50 ms instead of 35 s to evaluate. And also the zodbconvert script got 36 times faster than before. I was able to convert the data with 1.14 transactions per second:

2021-03-22 15:48:38,689 [relstorage.storage.copy] INFO   Copied transactions: 955/955,100.00%, 63.36 KB/s   1.14 True/s, 51.62 MB

And before it was more like 0.03 transactions per second.

Please add that index to the code base so everyone can feel the difference.

Version:

  • RelStorage[oracle] == 3.4
  • Plone == 5.2.2
  • Python = 3.8.8
@NicolasGoeddel
Copy link
Author

I think one could add the index right after this line if I am right:

CREATE INDEX current_object_tid ON current_object (tid);

@jamadden
Copy link
Member

Oracle is not tested or supported. I will not be adding an index that only benefits oracle.

Now, that could all change if there was some way to test oracle on ci. Given that there’s not, I am more likely to just remove the code.

@jamadden
Copy link
Member

More generally, as every DBA knows, indices are a trade off in several dimensions, chiefly of interest here is online performance versus batch performance. Since RelStorage is focused on online performance, and converting with zodbconvert is extremely rare, relatively speaking, the bar is very high to add features that only benefit such a rare use.

@NicolasGoeddel
Copy link
Author

There is also a guy who saw a similar culprit with Postgres. I did not test it but I wonder if it is the same reason but for restore.
https://community.plone.org/t/zodbconvert-filestorage-relstorage-much-slower-than-other-direction/13172

@jamadden
Copy link
Member

The version of RelStorage being used there was extremely old.

My group just converted a bunch of large-ish databases to Postgres using the current version of RelStorage and it actually went much faster than expected.

@NicolasGoeddel
Copy link
Author

Thanks for your answer. May I ask you one more thing?
What is the last version of RelStorage that was tested against an Oracle database server? I saw something in the Changelog to version 3.0a13. Does that mean version 3.0 is safe to use?

@jamadden
Copy link
Member

3.0a13 was probably the last time I tested it specifically. Oracle has never been regularly tested to my knowledge.

@NicolasGoeddel
Copy link
Author

Okay. And that all depends on the lack of an oracle database server where tests could be ran against?
I never ran tests on Github but shouldn't this docker image help you with that?
https://github.com/oracle/docker-images/tree/main/OracleDatabase
Before I can really help I would need to learn about writing a CI pipeline for Github and I guess I need to simulate it first on my computer to be able to debug it.

@tobiasherp
Copy link

There is also a guy who saw a similar culprit with Postgres. I did not test it but I wonder if it is the same reason but for restore.
https://community.plone.org/t/zodbconvert-filestorage-relstorage-much-slower-than-other-direction/13172

I'm that guy ;-)
It's true that both my PostgreSQL and RelStorage versions are quite old, but the performance difference between the RelDB-to-Data.fs and the Data.fs-to-RelDB conversion is a factor of 200, which makes the latter almost completely useless for backup/restore purposes.
It might be an option to add the index just for this purpose, before performing the conversion; so, nice to know.

But I'm not convinced by that "... only benefits oracle." argument.
I'd be interested in some little tests, performing the conversion in both directions, with and without the index, with newer versions of the package.
Is there really a considerable performance penalty while writing to the database?
(Of course it will take time to maintain the additional index, but this might be over-compensated by performance gains while reading, or it might be neglectably small.)

@mamico
Copy link
Contributor

mamico commented Aug 7, 2021

What is the last version of RelStorage that was tested against an Oracle database server?

@NicolasGoeddel I've been using Relstorage 3.4.0 and Oracle in the wild for months with no particular problems.
But actually moving to Postgresql is part of my medium-term plans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants