Skip to content

Commit

Permalink
[NFC][ntuple] fix spec of sharded clusters / cluster summary flags
Browse files Browse the repository at this point in the history
  • Loading branch information
jblomer committed Sep 24, 2024
1 parent a7f8615 commit bec1bd6
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions tree/ntuple/v7/doc/specifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -684,7 +684,7 @@ Followed by the page list envelope link.

To compute the minimum entry number, take first entry number from all clusters in the cluster group,
and take the minimum among these numbers.
The entry span is the number of entries that are (partially for sharded clusters) covered by this cluster group.
The entry span is the number of entries that are covered by this cluster group.
The entry range allows for finding the right page list for random access requests to entries.
The number of clusters information allows for using consistent cluster IDs even if cluster groups are accessed non-sequentially.

Expand All @@ -709,19 +709,19 @@ The cluster summary record frame contains the entry range of a cluster:
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of Entries |
+ +-+-+-+-+
| | Flags |
+ +-+-+-+-+-+-+-+-+
| | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```

If flag 0x01 (sharded cluster) is set,
an additional 32bit integer containing the column group ID follows the flags field.
If flags is zero, the cluster stores the entry range of _all_ the original columns
_including_ the columns from extension headers.

The order of the cluster summaries defines the cluster IDs,
starting from the first cluster ID of the cluster group that corresponds to the page list.

Flag 0x01 is reserved for a future specification version that will support sharded clusters.
The future use of sharded clusters will break forward compatibility and thus introduce a corresponding feature flag.
For now, readers should abort when this flag is set.
Other flags should be ignored.

#### Page Locations

The page locations are stored in a nested list frame as follows.
Expand Down Expand Up @@ -1019,7 +1019,7 @@ The limits refer to a single RNTuple and do not consider combinations/joins such
| Maximum number of cluster groups | 4B (foreseen: <10k) | List frame limits |
| Maximum number of clusters per group | 4B (foreseen: <10k) | List frame limits, cluster group summary encoding |
| Maximum number of pages per cluster per column | 4B | List frame limits |
| Maximum number of entries per cluster | 2^60 | Cluster summary encoding |
| Maximum number of entries per cluster | 2^56 | Cluster summary encoding |
| Maximum string length (meta-data) | 4GB | String encoding |
| Maximum RBlob size | 128 PiB | 1GiB / 8B * 1GiB (with maxKeySize=1GiB, offsetSize=8B) |

Expand Down

0 comments on commit bec1bd6

Please sign in to comment.