Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-core: Refactor FileReader to use RAII. #496

Merged
merged 24 commits into from
Aug 14, 2024

Conversation

haiqi96
Copy link
Contributor

@haiqi96 haiqi96 commented Jul 26, 2024

Description

This PR refactors the FileReader so it does not use the open()/close() API. instead, it requires the FileReader to be constructed with a path to open. This prevents the potential risk of leaving the FileReader in an invalid state.

The PR also updates how FileReader is used in the code in multiple places. Originally, the code keeps a FileReader object and re-open and close it whenever a new file need to be read. With the new design, the code either create a static FileReader object with the target path and let it go out of scope by itself, or use a unique_ptr to keep manage the FileReader.

Validation performed

  1. Verified that all unit-tests still passed.
  2. Compressed and decompressed a single 64MB log. Confirmed that decompressed log matches original log
  3. Ran a simple search queries and ensured that clg doesn't return any error.

Copy link
Contributor Author

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post a few high level questions

m_segment_index_file_reader,
m_segment_index_decompressor
);
read_new_entries(dictionary_path, segment_index_path);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it to be a separate function so the code looks cleaner. we can also put code directly into open and remove the function?

components/core/src/clp/DictionaryReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/FileReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/FileReader.cpp Outdated Show resolved Hide resolved
@@ -262,38 +265,29 @@ void load_lexer_from_file(
}

if (contains_delimiter) {
FileReader schema_reader;
ErrorCode error_code = schema_reader.try_open(schema_ast->m_file_path);
if (ErrorCode_Success != error_code) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception throw here doesn't make sense. actually it would be hard to imagine that the file was opened once but failed to be opened again. so I would rather leave it to throw the default exception from File_reader


// Read dictionary header
auto num_dictionary_entries = read_dictionary_header(m_dictionary_file_reader);
uint64_t num_dictionary_entries{};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we can turn this part into a helper function and put into the dictionary_utils.cpp. But given how simple the code is, I feel it's also ok to leave it as it is.
Let me know if you think it's necessary

) {
constexpr size_t cDecompressorFileReadBufferCapacity = 64 * 1024; // 64 KB

FileReader dictionary_file_reader{dictionary_path};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put them here instead of right in front of the code that uses them, so that if either file fails to open, we can skip the processing.

@haiqi96 haiqi96 changed the title File reader refactor clp-core: Refactor FileReader to use constructor initialization. Aug 13, 2024
@haiqi96 haiqi96 changed the title clp-core: Refactor FileReader to use constructor initialization. clp-core: Refactor FileReader to use constructor-style initialization. Aug 13, 2024
@haiqi96 haiqi96 marked this pull request as ready for review August 13, 2024 13:47
uint64_t read_dictionary_header(FileReader& file_reader) {
auto dictionary_file_reader_pos = file_reader.get_pos();
file_reader.seek_from_begin(0);
uint64_t num_dictionary_entries;
uint64_t num_dictionary_entries{};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbd

@@ -39,7 +16,7 @@ uint64_t read_segment_index_header(FileReader& file_reader) {
// Read segment index header
auto segment_index_file_reader_pos = file_reader.get_pos();
file_reader.seek_from_begin(0);
uint64_t num_segments;
uint64_t num_segments{};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbd same as above

unique_ptr<FileReader> grouped_file_path_reader;
try {
grouped_file_path_reader = make_unique<FileReader>(list_path);
} catch (FileReader::OperationFailed const& err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor, but we usually use e, ex, or exception for exceptions.

components/core/src/clp/FileReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/FileReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/FileReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/DictionaryWriter.hpp Outdated Show resolved Hide resolved
FileReader dictionary_file_reader;
FileReader segment_index_file_reader;
FileReader dictionary_file_reader{dictionary_path};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't spot this earlier. This method was written as a way to reload the dictionary so that we can continue writing an archive, so although we don't use it right now, to maintain correctness, we'd need to also load the segment index, right?

I think an alternative solution is to get rid of open_and_preload entirely until we add support for opening and rewriting archives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have to choose between the two, maybe removing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to remove the method prototype.

components/core/src/clp/DictionaryReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/DictionaryReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/DictionaryReader.hpp Outdated Show resolved Hide resolved
haiqi96 and others added 4 commits August 13, 2024 18:02
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

clp-core: Refactor FileReader to use RAII.

@haiqi96 haiqi96 changed the title clp-core: Refactor FileReader to use constructor-style initialization. clp-core: Refactor FileReader to use RAII. Aug 14, 2024
@haiqi96 haiqi96 merged commit a89ff14 into y-scope:main Aug 14, 2024
12 of 13 checks passed
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants