Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading several rntuple files #708

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

jmcarcell
Copy link
Member

@jmcarcell jmcarcell commented Nov 19, 2024

BEGINRELEASENOTES

  • Add support for reading several RNtuple files

ENDRELEASENOTES

Now it will work correctly for multiple files. For TTrees reading multiple files with random access (needed because we can read any entry) is provided through TChain and TBranch. For RNTuples there is not such thing, there is a RNTupleProcessor but it is intended for iterative access and not random access:
https://root.cern.ch/doc/master/classROOT_1_1Experimental_1_1RNTupleProcessor.html

So the idea is to save a vector with how many entries each file has, and then every time an entry is read do a lookup in the vector to find out which file we are at and get the corresponding reader for that file. Scales as log N in the number of files, which is probably OK for most cases.

Requires C++20 for std::ranges::upper_bound.

Comment on lines 151 to 153
// Map category to a vector that contains at how many entries each reader starts
// For example, if we have 3 readers and the first one has 10 entries, the second one 20 and the third one 30
// then the vector will be {0, 10, 30, 60}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understand why we need to store the final sum of entries here? Is it because of the upper_bound below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be stored (and it isn't) but since it has to be computed anyway I think the code looks cleaner like it is (computing it and storing it in the vector and later removing it) than having to check if it's the last entry every time and not add it in that case. For the comment I can remove the last number

src/RNTupleReader.cc Show resolved Hide resolved
src/RNTupleReader.cc Show resolved Hide resolved
Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu CI is failing because the use of std::ranges effectively requires c++20 and gcc11 that is used there is not yet enough. That should resolve itself with #698 I think.

include/podio/RNTupleReader.h Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator

Rebased this after merging #698. Merging after CI passes.

@tmadlener tmadlener merged commit 2d1a409 into AIDASoft:master Jan 9, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants