-
Notifications
You must be signed in to change notification settings - Fork 76
Guide to the codebase
This page is to help developers get a sense of where to find things in the Uproot codebase.
The purpose of Uproot is to provide an array-oriented, pure Python way of reading and writing ROOT files. It doesn't address any part of ROOT other than I/O, and it provides basic, low-level I/O. It's not an environment/user interface: if someone wants an immersive experience, they can write packages on top of Uproot, such as uproot-browser. However, we do want to streamline the process of navigating through files with Uproot, to the point of being "not annoying."
Although the style of programming is almost entirely imperative, not array-oriented like Awkward Array, there's a wide range in "depth" of code editing. Some changes would be "surface level," making a function more friendly/ergonomic to use, while others are "deep," manipulating low-level byte streams.
All of the source code is in src/uproot. The tests are (roughly) numbered by PR or issue number, and the version number is controlled by src/uproot/version.py (not by pyproject.toml, even though this is a hatchling-based project). If there is no version.py or it has a placeholder version, the information in this paragraph may be out-of-date. (Please update!)
Within src/uproot, all of the files and directories are for reading except writing, sink, and serialization.py, which are for writing. A few are shared, such as models, _util.py, compression.py, and streamers.py.
Everything is for conventional ROOT I/O, not RNTuple, except for models/RNTuple.py with some shared utilities in compression.py and const.py.
So almost all of the code, per line and per file, is for reading conventional ROOT I/O.
A ROOT file consists of
- the TFile header, which starts at byte 0 of the file and has one of two fixed sizes (one uses 32-bit pointers and the other uses 64-bit pointers);
- the TStreamerInfo, which describes how to deserialize most classes—which byte means what—reading this is optional;
- the root (no pun) TDirectory, which describes where to find named objects, including other TDirectories;
- the named objects themselves, which can each be read separately, but must each be read in their entirety;
- a few non-TDirectory classes (TTree and RNTuple are the only ones I know of) point to data beyond themselves;
- chunks of data associated with a TTree (TBasket) or RNTuple (RBlob);
- the TFree list, which keeps track of which parts of the ROOT file are unoccupied by real data. This can be completely ignored when reading a ROOT file.
When we read a ROOT file with Uproot, we assume that it is in read-only mode and that other processes are not changing the file while we read it. C++ ROOT does not make this assumption, which is considerably more complicated.
When we write a ROOT file with Uproot, we assume that a single-threaded Uproot process is the only writer of that file. Again, C++ ROOT is more general, and thus more complex.
None of the objects listed above except the TFile header has a fixed location in the file. To know the byte location of any object, one must find it by following a chain from the TFile header to the root TDirectory to any subdirectory to the object and maybe to a TBasket if the object is a TTree.
RNTuple has its own system of navigation, starting at a ROOT::Experimental::RNTuple
instance, which is a conventional ROOT I/O object that can live in a TDirectory like any TTree or histogram, but its headers, footers, column metadata, etc., are all new, custom objects, exposed to the conventional ROOT I/O system as generic RBlobs.
TBaskets and RBlobs can't be (or at least, aren't in practice) stored in a TDirectory.
Addressing and reading/writing data in a ROOT file is like reading/writing data in RAM, but instead of pointers, we have seek positions. Most conventional ROOT I/O seek positions are 32-bit, but there are modes in which objects can point to other objects with 64-bit seek positions when necessary. A ROOT file can (and often does) have a mixture of 32-bit and 64-bit seek positions.