Skip to content

Using the LibraryTool to look at a library's internal state

IvoDD edited this page Oct 16, 2024 · 3 revisions

What is the LibraryTool?

The LibraryTool is a tool that can be used to explore that ArcticDB stores on disk.
It is very useful when you want/need to:

  • get a better understanding for how ArcticDB works under the hood
  • debug the state of ArcticDB on disk

How do I use it?

The LibraryTool can be used both with ArcticDB and Arcticc.
For the most part, it is used in the same way, but notes will be made if there is are differences in the interface.

Initialize the LibraryTool

You can use LibraryTool with any ArcticDB library, you simply pass it to the library tool like so:

from arcticdb.toolbox.library_tool import KeyType
 
ac = Arctic(...)
lib = ac[...]
 
lib_tool = lib._nvs.library_tool()
If you are using the old Arcticc bindings:
from arcticcxx.tools import LibraryTool
from arcticc.toolbox.storage import KeyType

lib_tool = LibraryTool(lib._nvs._library)

Finding all VREF keys

In [215]: lib_tool.find_keys(KeyType.VERSION_REF)
Out[215]: [r:my_symbolr:testr:test2]

Finding all Symbol List keys

In [216]: lib_tool.find_keys(KeyType.SYMBOL_LIST)
Out[216]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
 l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]

Finding all VREF keys for a symbol

In [220]: lib_tool.find_keys_for_symbol(KeyType.VERSION_REF, "test2")
Out[220]: [r:test2]

Reading a specific key

In [221]: keys = lib_tool.find_keys(KeyType.SYMBOL_LIST)
 
In [222]: keys
Out[222]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
 l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]
 
In [223]: lib_tool.read_to_dataframe(keys[1])
If you are using arcticc You will need to recreate the DataFrame from the underlying segments. \ You can use this snippet to do so:
from arcticcxx_toolbox.codec import Buffer, decode_segment
from arcticc.version_store._normalization import FrameData
from arcticcxx.version_store import PythonOutputFrame
import pandas as pd
 
def read_to_df(lib_tool, key):
    segment = lib_tool.read(key).segment
    field_names = [f.name for f in segment.header.stream_descriptor.fields]
    frame_data = FrameData.from_cpp(PythonOutputFrame(decode_segment(segment)))
    cols = {}
    for idx, field_name in enumerate(field_names):
        cols[field_name] = frame_data.data[idx]
    return pd.DataFrame(cols, columns=field_names)

Following the version chain

You can use lib_too.read_to_keys to read a key which contains links to other keys. This can be used to iterate over the version chain and inspect if there is something surprising with it:

>>> # We find the version ref key for the symbol
>>> vref = lib_tool.find_keys_for_symbol(KeyType.VERSION_REF, "sym")[0]
>>> vref
r:sym
>>> # Reading the keys inside the version ref shows a link to the last version (which tombstones v0)
>>> vref_keys = lib_tool.read_to_keys(vref)
>>> vref_keys
[x:sym:0:0x599c329f212e9b1d@1729069015917091586[0,172800000000001], V:sym:0:0xbd95682775eb0561@1729069015917161585[0,0]]
>>> # Reading the keys inside the last version key in the chain shows the tombstone and the link to the previous version key
>>> version_key = vref_keys[-1]
>>> version_keys = lib_tool.read_to_keys(version_key)
>>> version_keys
[x:sym:0:0x599c329f212e9b1d@1729069015917091586[0,172800000000001], V:sym:1:0xf4de0df7f4a2664c@1729068970655744750[0,0]]
>>> # Reading the previous version key shows the link to the latest index key
>>> prev_version_key = version_keys[-1]
>>> prev_version_keys = lib_tool.read_to_keys(prev_version_key)
>>> prev_version_keys
[i:sym:1:0x40a6734b5581f255@1729068970646451036[0,172800000000001], V:sym:0:0x6fdab687b265d67b@1729068960386415821[0,0]]
>>> # Reading the index key we can find the data key
>>> index_key = prev_version_keys[0]
>>> index_keys = lib_tool.read_to_keys(index_key)
>>> index_keys
[d:sym:1:0x1c34a96809b98d75@1729068970639555561[0,172800000000001]]
>>> # And we can read the data key
>>> data_key = index_keys[0]
>>> lib_tool.read_to_dataframe(data_key)
            col
index
1970-01-01    1
1970-01-02    2
1970-01-03    3

You can see other examples of lib tool usage inside the tests.