Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document node's store structure #1603

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,10 @@ function sidebarHome() {
},
],
},
{
text: "Datastore structure",
link: "how-to-guides/celestia-node-store-structure",
},
],
},
{
Expand Down
28 changes: 28 additions & 0 deletions how-to-guides/celestia-node-store-structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
description: This section contains information on the celestia-node datastore and its contents.
---

# celestia-node datastore

The node's datastore refers to the storage structure
used to manage the data that supports the node's operation.
It consists of directories and files that contain the node's state,
configuration, and other information relevant to the node.

The following are the directories and files found in the datastore:

- `/blocks`: This directory stores blocks. Each file contained in this directory
represents a block on Celestia and contains its associated data. This directory is present in the datastore for bridge and full nodes but not light nodes, as light nodes do not store blocks.

- `/data`: This directory contains block headers and various files belonging to node log-structured merge (LSM) storage system such as `DISCARD`, `KEYREGISTRY`, and `MANIFEST`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clarify LSM storage system files

Based on the discussion with @distractedm1nd, we should keep the LSM storage system description general.

-This directory contains block headers and various files belonging to node log-structured merge (LSM) storage system such as `DISCARD`, `KEYREGISTRY`, and `MANIFEST`.
+This directory contains block headers and various files belonging to the node's log-structured merge (LSM) storage system. The LSM files (such as `DISCARD`, `KEYREGISTRY`, and `MANIFEST`) manage the efficient storage and retrieval of data.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `/data`: This directory contains block headers and various files belonging to node log-structured merge (LSM) storage system such as `DISCARD`, `KEYREGISTRY`, and `MANIFEST`.
- `/data`: This directory contains block headers and various files belonging to the node's log-structured merge (LSM) storage system. The LSM files (such as `DISCARD`, `KEYREGISTRY`, and `MANIFEST`) manage the efficient storage and retrieval of data.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~17-~17: Loose punctuation mark.
Context: ...ht nodes do not store blocks. - /data: This directory contains block headers a...

(UNLIKELY_OPENING_PUNCTUATION)


- `/index`: This directory stores the index files that handle mapping specific keys such as block heights, to the corresponding data. Similar to `/blocks`, the light node's datastore does not include this directory, as they do not perform indexing.

- `/inverted_index`: This directory stores the inverted index files used for mapping queries to the corresponding data location, and various files belonging to node LSM storage system, such as `DISCARD`, `KEYREGISTRY`, `LOCK`, and `MANIFEST`. The light node's datastore does not contain this directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove redundant LSM file listing

The LSM files are already mentioned in the /data directory description.

-This directory stores the inverted index files used for mapping queries to the corresponding data location, and various files belonging to node LSM storage system, such as `DISCARD`, `KEYREGISTRY`, `LOCK`, and `MANIFEST`. The light node's datastore does not contain this directory.
+This directory stores the inverted index files used for mapping queries to the corresponding data location, along with associated LSM storage system files. The light node's datastore does not contain this directory.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `/inverted_index`: This directory stores the inverted index files used for mapping queries to the corresponding data location, and various files belonging to node LSM storage system, such as `DISCARD`, `KEYREGISTRY`, `LOCK`, and `MANIFEST`. The light node's datastore does not contain this directory.
- `/inverted_index`: This directory stores the inverted index files used for mapping queries to the corresponding data location, along with associated LSM storage system files. The light node's datastore does not contain this directory.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~21-~21: Loose punctuation mark.
Context: ...t perform indexing. - /inverted_index: This directory stores the inverted inde...

(UNLIKELY_OPENING_PUNCTUATION)


- `/keys`: This directory stores the cryptographic key pairs that are used to operate the node.

- `/transients`: This directory contains temporary data such as cache files
that are used while the node is operating, but are not a part of the permanent blockchain state.

- `config.toml`: This is the node's primary configuration file. It defines the node's core settings, such as the network parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clarify config.toml location and purpose

Based on the PR discussion, there was some confusion about configuration files. Let's be more specific about the location and purpose of config.toml.

-`config.toml`: This is the node's primary configuration file. It defines the node's core settings, such as the network parameters.
+`config.toml`: Located in the node's root directory, this is the primary configuration file that defines core settings such as network parameters, P2P configuration, and API endpoints. This file is generated during node initialization.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `config.toml`: This is the node's primary configuration file. It defines the node's core settings, such as the network parameters.
- `config.toml`: Located in the node's root directory, this is the primary configuration file that defines core settings such as network parameters, P2P configuration, and API endpoints. This file is generated during node initialization.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~28-~28: Loose punctuation mark.
Context: ...anent blockchain state. - config.toml: This is the node's primary configuratio...

(UNLIKELY_OPENING_PUNCTUATION)

Loading