-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a type change utility for append data keys in lib_tool #1932
base: master
Are you sure you want to change the base?
Conversation
lib.write(sym, get_df(1, 4, str), incomplete=True) | ||
lib.write(sym, get_df(1, 5, np.int64), incomplete=True) | ||
|
||
def read_append_data_keys_from_ref(symbol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helper functions could be added to the library_tool api but I decided against it because it would require frequent tweaks (e.g. if you want to load only the last 20). So instead after this is merged I'll include some examples in the lib tool docs.
47cd9ef
to
36a1032
Compare
Does this by adding a cpp layer functionality to overwrite append data keys. And using a read + type change + overwrite in the python layer. More specifically this change involves: - Using a `LocalVersionEngine` instead of an `AsyncStore` as library tool state - Exposing some more python bindings to allow iteration over `APPEND_DATA` keys with the library tool - Allow using normalization in library tool to allow overwriting append data keys with a custom dataframe - Provide the type change functionality by reading a dataframe, changing its type with pandas and overwriting it - Adds an elaborate test to verify iterating and reading append data linked list is fine with various overwrites.
36a1032
to
87731cb
Compare
const py::object &norm, | ||
const py::object & user_meta) { | ||
if (!std::holds_alternative<AtomKey>(key) || std::get<AtomKey>(key).type() != KeyType::APPEND_DATA) { | ||
throw_error<ErrorCode::E_INVALID_USER_ARGUMENT>(fmt::format("Can only override APPEND_DATA keys. Received: {}", key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util::check
?
|
||
def test_overwrite_append_data(object_and_mem_and_lmdb_version_store): | ||
lib = object_and_mem_and_lmdb_version_store | ||
if lib._lib_cfg.lib_desc.version.encoding_version == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking this is definitely zero indexed and we aren't skipping on v1 encoding by mistake?
@@ -230,3 +232,92 @@ def iterate_through_version_chain(key): | |||
assert len(keys_by_key_type[KeyType.TABLE_DATA]) == (num_versions-1) % 3 + 1 | |||
assert len(keys_by_key_type[KeyType.TOMBSTONE_ALL]) == num_versions // 3 | |||
|
|||
|
|||
def test_overwrite_append_data(object_and_mem_and_lmdb_version_store): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all 3 stores? Why not use one of the _v1 fixtures to avoid the skip below?
lib.write(sym, get_df(1, 5, np.int64), incomplete=True) | ||
|
||
def read_append_data_keys_from_ref(symbol): | ||
nonlocal lib_tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the nonlocal do you? Given that you aren't assigning to lib_tool
# We assert that types are as we wrote them and we can't read or compact because of type mismatch | ||
append_keys = read_append_data_keys_from_ref(sym) | ||
assert len(append_keys) == 3 | ||
# Different storages use either fixed or dynamic strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a good reason for the different backends 👍
# And test that compaction now works with the new types | ||
lib.compact_incomplete(sym, append=True, convert_int_to_float=False, via_iteration=False) | ||
assert read_append_data_keys_from_ref(sym) == [] | ||
assert_frame_equal(lib.read(sym).data, get_df(15, 0, np.int64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for appending more incompletes (with the right types) and compacting?
|
||
# Deliberately write mismatching incomplete types | ||
lib.write(sym, get_df(3, 0, np.int64)) | ||
lib.write(sym, get_df(1, 3, np.int64), incomplete=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test where all the incompletes are the wrong type?
auto old_segment_in_memory = decode_segment(std::move(old_segment)); | ||
const auto& tsd = old_segment_in_memory.index_descriptor(); | ||
std::optional<AtomKey> next_key = std::nullopt; | ||
if (tsd.proto().has_next_key()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the testing for the next key logic work? Given that ArcticDB doesn't write it? Or am I wrong and normal (non streaming) incompletes also have the linked list structure?
Does this by adding a cpp layer functionality to overwrite append data
keys. And using a read + type change + overwrite in the python layer.
More specifically this change involves:
LocalVersionEngine
instead of anAsyncStore
as librarytool state
APPEND_DATA
keys with the library tool
data keys with a custom dataframe
its type with pandas and overwriting it
linked list is fine with various overwrites.
Reference Issues/PRs
What does this implement or fix?
Any other comments?
Checklist
Checklist for code changes...