-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix behavior of FileStorage.restore
when provided wrong prev_txn
#397
base: master
Are you sure you want to change the base?
Conversation
According to restore comment: > the prev_txn should be considered just a hint, and is ignored if the > transaction doesn't exist. instead, this was raising an UndoError.
…txn_prev if restore is called with an existing txn_prev that does not contain the oid, the "8-byte redundant transaction length -8" and the beginning of the next transaction was read as a data record. The condition to stop was incorrect as discussed in [1], in practice this error would only happens when restore is passed a bad txn_prev so this should not be a problem in normal conditions because we are supposed to find the data record for that oid. [1] #395 (comment)
I'm not sure what should be the behavior if The docs are : Lines 978 to 981 in 6f88781
ZODB/src/ZODB/FileStorage/FileStorage.py Lines 715 to 722 in 6f88781
If But if |
Jérome Perrin wrote at 2024-2-14 07:20 -0800:
I'm not sure what should be the behavior if `restore` is called with a `prev_txn` which does not contain oid.
I have looked at the code:
The only purpose of `prev_txn` is to determine `prev_pos`.
If this succeeds, we know that we find *data* at `prev_pos`
and therefore we can write a backpointer to `prev_pos` instead
of writing *data* ourself.
This means, `prev_txn` is there only as an optimization.
If it does not contain the correct data, `prep_pos` will be `0`
and the optimazation will not be taken.
|
Jérome, thanks for doing this PR and adding tests for +8 and other problems. I agree that prev_txn semantically is just a hint for copy detection on storage side. It is optimization that, unfortunately, is exposed as public API. An UNDO is the same store operation, but usually it stores content that is already there in the database. Git detects this duplication via sha1 - all internally. But in ZODB world the API was setup to help storages handle copies. In simple cases that helps, but generally I think it hurts and constraints storages unneccessaryly. Anyway about this:
I agree it is an inconsistency if prev_txn exists but does not contain an oid. I think it is better report an error here, or at least log a warning. But given that frequently nobody read logs raising an error might be better. I say this because this inconsistency indicate a really potential corruption of data, and such events should not pass by silently. For the record ZODB/go exposes the hint as https://lab.nexedi.com/kirr/neo/-/blob/f7776fc1/go/zodb/zodb.go#L249-258 and it exposes it only because this prev_txn is already semantically there in IStorage and tools like https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbdump.py#L49 Kirill |
Thank you for the feedback and sorry for being slow to reply here. That was also my feeling that restoring a transaction with a "simple" patch to change to an errordiff --git a/src/ZODB/FileStorage/FileStorage.py b/src/ZODB/FileStorage/FileStorage.py
index 5c76000a..869ae8ea 100644
--- a/src/ZODB/FileStorage/FileStorage.py
+++ b/src/ZODB/FileStorage/FileStorage.py
@@ -733,6 +733,11 @@ class FileStorage(
prev_txn_pos = self._txn_find(prev_txn, 0)
if prev_txn_pos:
prev_pos = self._data_find(prev_txn_pos, oid, data)
+ if not prev_pos:
+ panic(
+ "oid %s not found in prev_txn %s",
+ utils.oid_repr(oid),
+ utils.tid_repr(prev_txn))
old = self._index_get(oid, 0)
# Calculate the file position in the temporary file
here = self._pos + self._tfile.tell() + self._thl
diff --git a/src/ZODB/tests/testFileStorage.py b/src/ZODB/tests/testFileStorage.py
index 5f545e52..fb21b6d4 100644
--- a/src/ZODB/tests/testFileStorage.py
+++ b/src/ZODB/tests/testFileStorage.py
@@ -260,6 +260,7 @@ class FileStorageTests(
self.assertEqual(self._storage.load(oid)[0], b'data1b')
def testRestorePrevTxnWithoutOid(self):
+ from ZODB.FileStorage.FileStorage import CorruptedTransactionError
t = TransactionMetaData()
self._storage.tpc_begin(t)
self._storage.store(b'\1' * 8, b'\0' * 8, b'data1', b'', t)
@@ -274,14 +275,13 @@ class FileStorageTests(
self._storage.tpc_vote(t)
self._storage.tpc_finish(t)
- # restore with a transaction not containing the oid
+ # restore with a transaction not containing the oid is an error
t = TransactionMetaData()
self._storage.tpc_begin(t)
- self._storage.restore(b'\2' * 8, b'\0' * 8, b'data2b', b'', tid1, t)
- self._storage.tpc_vote(t)
- self._storage.tpc_finish(t)
- self.assertEqual(self._storage.load(b'\1' * 8)[0], b'data1')
- self.assertEqual(self._storage.load(b'\2' * 8)[0], b'data2b')
+ with self.assertRaisesRegex(
+ CorruptedTransactionError,
+ 'oid 0x0202020202020202 not found in prev_txn.*'):
+ self._storage.restore(b'\2' * 8, b'\0' * 8, b'data2b', b'', tid1, t)
def testCorruptionInPack(self):
# This sets up a corrupt .fs file, with a redundant transaction
I tried to change into an error and there is a legitimate case here in ZODB/src/ZODB/tests/RecoveryStorage.py Lines 41 to 69 in 6f88781
If this becomes an error, that test fails with an error like this:
What this test does is:
and this fails, because the prev_txn in "Dest" exists, but is a "p" (pack) transaction without records. zodb dump Source.fs> zodb dump /tmp/FileStorageTestsao2yt_a3/FileStorageTestsy80r5b7k/FileStorageHexTestsemjrpbn0/FileStorageHexTestsyucvq6sd/FileStorageRecoveryTest9ur7kq7d/Source.fs
Ignoring index for /tmp/FileStorageTestsao2yt_a3/FileStorageTestsy80r5b7k/FileStorageHexTestsemjrpbn0/FileStorageHexTestsyucvq6sd/FileStorageRecoveryTest9ur7kq7d/Source.fs
txn 03f6d00826efb611 " "
user ""
description "initial database creation"
extension ""
obj 0000000000000000 64 sha1:d4b0bf30c5132b1ec57da7fe61b080970eea3418
�cpersistent.mapping
PersistentMapping
q.�}qXdataq}qs.
txn 03f6d00826f06aaa " "
user ""
description ""
extension ""
obj 0000000000000000 118 sha1:bd566fcc3ce469e0d6590f4b69d45e2d3708b4fa
�cpersistent.mapping
PersistentMapping
q.�}qXdataq}qXobj1qqcZODB.tests.MinPO
MinPO
q�qQss.
obj 0000000000000001 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b
�cZODB.tests.MinPO
MinPO
q.�}qXvalueqKs.
txn 03f6d00826f11488 " "
user ""
description ""
extension ""
obj 0000000000000000 148 sha1:4c39467361ff1ab627e82e14f9c4158cc36979f8
�cpersistent.mapping
PersistentMapping
q.�}qXdataq}q(Xobj1qqcZODB.tests.MinPO
MinPO
q�qQXobj2q h�q
Qus.
obj 0000000000000002 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b
�cZODB.tests.MinPO
MinPO
q.�}qXvalueqKs.
txn 03f6d00826f4a100 " "
user ""
description "undo"
extension ""
obj 0000000000000000 from 03f6d00826f06aaa
obj 0000000000000002 delete
zodb dump Dest.fs> zodb dump zodb dump /tmp/FileStorageTestsao2yt_a3/FileStorageTestsy80r5b7k/FileStorageHexTestsemjrpbn0/FileStorageHexTestsyucvq6sd/FileStorageRecoveryTest9ur7kq7d/Dest.fs
txn 03f6d00826f06aaa "p"
user ""
description ""
extension ""
obj 0000000000000001 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b
�cZODB.tests.MinPO
MinPO
q.�}qXvalueqKs.
txn 03f6d00826f11488 "p"
user ""
description ""
extension ""
obj 0000000000000000 148 sha1:4c39467361ff1ab627e82e14f9c4158cc36979f8
�cpersistent.mapping
PersistentMapping
q.�}qXdataq}q(Xobj1qqcZODB.tests.MinPO
MinPO
q�qQXobj2q h�q
Qus.
obj 0000000000000002 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b
�cZODB.tests.MinPO
MinPO
q.�}qXvalueqKs.
I honestly don't fully understand what the test is doing with "copy the final transaction manually" and not using a higher level API, but this sounds like "this is more complex than just raising". Maybe this could be made a little bit more clever, tolerating the case where oid is not found because the transaction was packed, but for now, I believe we can merge this without trying to change to an error (or log or warning) - and then #395 . Do you agree with this ? I will use "request review" and if you agree I'll apply the commits. Thanks again for feedback. ( edit, I fixed the traceback, I used an old traceback different from the zodb dumps ) |
Jérome Perrin wrote at 2024-2-26 22:36 -0800:
...
I believe we can merge this without trying to change to an error (or log or warning) - and then #395 .
I agree.
|
Jérome, Dieter, thanks for feedback. Personally I would still carefully reconsider turning Kirill |
Kirill Smelkov wrote at 2024-3-4 00:25 -0800:
Personally I would still carefully reconsider turning `restore` inconsistencies into an error, but if you both think it is ok to go as it is please go ahead. I will be mostly away for the next 3 weeks and do not want to block progress. If you both think the patch is ok in its current form I trust you to make reasonable judgement even though personally I would try to do it differently.
As you know, I am less interested in potential problems with
low probability (than you). In may well be that a `restore` inconsistency
poses problems within some decades; I would then act -- with
a concrete case.
Unless you come with such a concrete case, I would not change
the current behavior - which apparently is check by a test case
(and thus likely carefully considered).
|
Dieter, thanks for feedback. The difference in between us is how we perceive importance of the bug here. For me a "data corruption" is the worst that could happen to a database. For an external observer data corruptions are usually hard to detect at the time when they happen, but they introduce a real change to the data. Some of such changes could be innocent, for example if we count a volume of some gas and it will be 17.3 litres instead of 17.2 that's quite tolerable and ok. Some of the changes are not so ok, for example if it will be 123 EUR in total after an exchange transaction instead of correct value of 321000 EUR. And some of the changes could really impact lives: for example if it will be the dose of radiation to emit, and due to corruption it will be 1000 Rentgens instead of 0.1 Rentgen, it makes a huge impact of the person in subject. There is no way to save life of that person with post-mortem analysis. So when it comes to databases, I try to make dedicated attention to correctness first. Still, as I said above, if both you and Jérome think it is ok to go as it is now, I trust you to make a reasonable choice and judgement. Kirill |
Kirill Smelkov wrote at 2024-3-4 01:21 -0800:
...
So when it comes to databases, I try to make dedicated attention to correctness first.
Still, as I said above, if both you and Jérome think it is ok to go as it is now, I trust you to make a reasonable choice and judgement.
I read: you will come again in a few weeks with a concrete
case to reproduce the `restore` inconsistency.
Then, we will look again (in a separate PR).
|
Initially, this pull request was about extracting the +8 fix from #395 . I added testRestorePrevTxnWithoutOid , which is a scenario where the +8 bug would appear (because I don't like changing code without having a test corresponding to the change) but what does not seems good now is that this test would define the behavior for a case where the behavior was not clearly defined: what should happen when restore is called with a Before all this, it is an undefined behavior, the data of the next transaction is read as a data header, which very likely causes an unexpected error to be raised. If the transaction is that last in the file, there is no error. In the current version of the patch, it is tolerated and ignored. From the discussions, @navytux suggested that this should cause a proper error. I tried to change into an error, this caused a test to fail. The test is using low level APIs, so maybe it's just the test doing something wrong, I did not try to understand more. Now I don't feel it would be good to merge testRestorePrevTxnWithoutOid as it is, because this would somehow be a specification of how FileStorage should behave when restore is called with such a transaction, but at this point I feel we don't really know what should be the behavior. The next step for this seems to be studying that failing test. I don't really have time to start doing this now. #395 can be merged without all this, everybody approved so let's merge #395 already. There was another commit in this pull request 7bef1b6 , I think there was no controversy for that one. |
See discussions from #395 (comment) about removing
+ 8
fromtend