feat(clp): Add the write path for single-file archives. #646

davemarco · 2024-12-28T23:21:16Z

Description

Reimplementation of unstructured single-file-archive writer from private branch into open source. Reader will be next after this PR is merged.

Open source sfa implementation is "backwards compatible" with private branch (i.e. open source can read archives compressed by private branch, and private branch can read archives compressed by open source. "Backwards compatible" is in quotes as archive metadata and metadataDB have diverged in open source and private branch. However, if the metadata and metadataDB are slightly altered to conform, then the format is backwards compatible.

In addition to the review, I have one high level question and one minor question.

The remainder of the description goes into details about implementation and differences.

High Level Question

Should we remove GlobalMetadataDB when writing an sfa? Note GlobalMetadataDB code is completely removed in private branch, which we obviously can't do in open source.

My thoughts are we should remove it, since i feel it would be weird to require the GlobalMetadataDB file for sfa decompression (like you would need an additional file to decompress?). I have already removed it in another branch when the sfa cli flag is set, but this PR is already large, so it will come in another PR (i.e. it is still generated in this PR).

Minor Question

What exactly is the single-file archive version in the header? Is this a version specifically for single-file-archive or is it just the clp release version?

My instinct is that it is a different version.

Code structure

I decided to implement single file archive writer as a set of functions rather than a class. The single-file archive writer shares common private member variables with the regular archive writer, and I felt it didn't make sense to have a new class with copied members.

Nonetheless, I moved the single-file archive writer methods into their own file/namespace to prevent further bloating of writer/Archive.cpp. To start reviewing, I would look at the main entry point which is create_single_file_archive() function in writer/Archive.cpp .

Implementation differences vs. private branch

I decided to use MsgPack libraries directly instead of nlohmann/json MsgPack integration so the code is more similar to #563. Therefore, someone familiar with clp-s sfa implementation, should be more familiar with clp sfa implementation. As a result, there is no single-file metadata class like private branch, and just a MsgPack struct. Nonetheless, during testing, the msgpack libraries and nlohmann/json were both able to read each others serialized metadata.

Differences in multi-file metadata format

There was a small change (but i could be mistaken myself) in private branch where the archive version (in metadata not header) was changed from uint16_t to uint32_t. I decided for now to leave it as uint16_t in open source, and as a result, the metadata formats are not compatible. There are a multitude of options here. @LinZhihao-723 let us know your thoughts on this.

Differences in Metadata DB

The open source has a new field "BeginMessageIx" which is not in private branch. The private branch has "UtcOffsets" which is not in open source. As a result, clp will throw an error when opening database during decompression.

Javascript sfa reader

Note that despite the differences in metadata format and metadataDB, I believe a javascript sfa reader in log viewer would be able to read both open source and private branch without code modifications. Likely, javascript is less sensitive to integer type differences, and will not throw an error if the DBfield is missing as long as its not used.

Validation performed

I ported reader code from private branch "as is" (most code identical with a few minor changes) to open source. Effectively it is private branch reader without "UtcOffsets" MetadataDB field. I then modified open source writer to use uint32_t for archive version in metadata.

Next, I compressed Loghub/Zookeeper(one segment) and Loghub/HDFS2(multiple segments) with open source sfa writer, and was able to decompress with "private branch reader". I tested decompress files were the same with diff.

Summary by CodeRabbit

Release Notes

New Features
- Added support for creating single-file archives with the new --single-file-archive command-line option.
- Enhanced archive compression configuration to allow users to control archive file splitting.
Improvements
- Updated archive metadata handling to provide more detailed versioning information.
- Improved file compression flexibility with new single-file archive capabilities.
Technical Enhancements
- Introduced new metadata tracking for variable encoding methods and compression types.
- Added robust error handling for single-file archive operations.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1209093488464702

coderabbitai · 2024-12-28T23:21:23Z

Walkthrough

This pull request introduces support for single-file archives in the CLP (Compressed Log Processing) system. The changes include the addition of new source files, modifications to command-line argument parsing, and updates to archive writing mechanisms. Users can now create archives as a single file instead of multiple files, with new configuration options and supporting infrastructure to manage this archiving method.

Changes

File	Change Summary
`components/core/src/clp/clp/CMakeLists.txt`	Added three new source files for single-file archive support.
`components/core/src/clp/clp/CommandLineArguments.cpp/.hpp`	Introduced `--single-file-archive` option and related member variables.
`components/core/src/clp/clp/FileCompressor.cpp`	Modified archive splitting logic to respect single-file archive setting.
`components/core/src/clp/clp/compression.cpp`	Updated compression function to handle single-file archive parameter.
`components/core/src/clp/streaming_archive/ArchiveMetadata.hpp`	Added new metadata-related methods and variables.
`components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp`	New header defining constants and structures for single-file archives.
`components/core/src/clp/streaming_archive/single_file_archive/writer.cpp/.hpp`	New implementation for creating and managing single-file archives.
`components/core/src/clp/streaming_archive/writer/Archive.cpp/.hpp`	Updated to support single-file archive configuration and creation.

Possibly Related PRs

feat(clp-s): Add the write path for single-file archives. #563: This PR adds support for writing clp-s single-file archives, which is directly related to the changes in the main PR that introduce new source files for handling single-file archives.
feat(clp-s): Add support for kv-pair-IR ingestion. #630: This PR enhances the ingestion process of IRv2 files into an archive format, which includes modifications to the JsonParser class that could relate to the new functionalities introduced in the main PR regarding archive handling.
feat(package)!: Add support for writing clp-s single file archives to S3. #634: This PR introduces functionality for writing clp-s single-file archives to Amazon S3, which is relevant to the main PR's focus on single-file archive management.
refactor(clp-s): Replace instances of std::string const& with std::string_view where it would remove unnecessary conversions to and from std::string. #635: This PR refactors components to use std::string_view, which may improve the performance of string handling in the context of the changes made in the main PR related to archive file management.

Suggested Reviewers

kirkrodrigues
LinZhihao-723

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (12)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (5)

19-20: Consider externalizing the read block size
The constant cReadBlockSize is set to 4096. For different environments or performance optimizations, externalizing this block size into a configuration or a compile-time parameter might improve flexibility.

133-147: Reassess large archive handling
A single-file archive exceeding cFileSizeWarningThreshold triggers a warning, but it might be beneficial to add user guidance or a more detailed strategy for dealing with very large archives (e.g. automatically switching to multi-file).

179-190: Potentially make version dynamic
Within write_archive_header, the cArchiveVersion is hard-coded. If changes are expected in future releases, consider introducing a mechanism to set the version at build time or to read it from project configuration.

192-195: Minor inefficiency with repeated .str() calls
Repetitively calling packed_metadata.str() can lead to unnecessary string object creation. While not critical for smaller metadata, consider assigning packed_metadata.str() once to a local variable for more efficiency.

197-213: Graceful recovery on partial reads
The loop for reading from the file and writing to archive_writer is straightforward and robust. For future improvements, consider adding logging or partial recovery in case of transient errors (e.g. a network file) instead of throwing an exception outright.

components/core/src/clp/streaming_archive/writer/Archive.cpp (3)

23-24: Check necessity of newly added includes.

Please confirm that both Defs.hpp and writer.hpp are required here. If these headers are no longer needed, consider removing them to reduce compilation overhead.

249-252: Verify multi–file archive cleanup.

When single–file archive mode is enabled, the archive transitions to a single–file format at closure. Ensure that the multi–file artefacts are either cleaned up or that the user is aware they remain. Inadvertent leftover files could confuse users.

341-342: Revisit dictionary–driven splitting logic.

Currently, splitting only occurs if false == m_use_single_file_archive. Consider whether large dictionary sizes warrant splitting in single–file archive mode too.

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (2)

50-55: Assess return type for better clarity.

Returning a std::stringstream from create_single_file_archive_metadata is straightforward, but consider a custom struct or type alias for readability and future extension.

65-69: Recover from partial writes.

write_single_file_archive can remove the existing multi–file archive. Think about potential rollback or error–handling strategies if the write fails partway.

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1)

16-21: Confirm versioning approach.

Your major/minor/patch shift bits are standard. Confirm that the product’s versioning policy aligns with these values if future releases require increments.

components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)

115-120: Evolution of metadata.

The newly introduced fields (m_variable_encoding_methods_version, m_variables_schema_version, and m_compression_type) address single–file archival requirements. Consider placing them in a derived metadata class if multi–file archives remain unaffected.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 329edf6 and 5c75147.

📒 Files selected for processing (11)

components/core/src/clp/clp/CMakeLists.txt (1 hunks)
components/core/src/clp/clp/CommandLineArguments.cpp (1 hunks)
components/core/src/clp/clp/CommandLineArguments.hpp (3 hunks)
components/core/src/clp/clp/FileCompressor.cpp (3 hunks)
components/core/src/clp/clp/compression.cpp (3 hunks)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (3 hunks)
components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (1 hunks)
components/core/src/clp/streaming_archive/writer/Archive.cpp (5 hunks)
components/core/src/clp/streaming_archive/writer/Archive.hpp (4 hunks)

🧰 Additional context used

📓 Path-based instructions (10)

components/core/src/clp/clp/CommandLineArguments.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/clp/FileCompressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/clp/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/clp/compression.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/writer/Archive.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)

components/core/src/clp/clp/FileCompressor.cpp (2)

Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-11-10T16:46:53.300Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.

Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-11-10T16:46:58.543Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.

🔇 Additional comments (32)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (3)

100-112: Validate file existence before attempting to get its size
While the exception handling is robust, it may be helpful to add an explicit check if the file exists before calling file_size. This would provide a clearer error message if the file is missing and avoid filesystem errors.

114-147: Confirm directory existence when retrieving file info
When assembling the file_infos, consider verifying that the directory holding segments actually exists (e.g. segment_dir_path). If it does not, an early error message might clarify that no segments were found.

256-285: Ensure user awareness before deleting original archive
std::filesystem::remove_all(multi_file_archive_path) irreversibly deletes the multi-file archive after writing the single-file archive. For safety, either confirm user intent or allow an option to retain the source.

Would you like a script to confirm the presence of any leftover files before full deletion?

components/core/src/clp/streaming_archive/writer/Archive.cpp (3)

16-16: Confirm correct spdlog header include.

Typically the common header is <spdlog/spdlog.h> rather than <spdlog.h>. Please verify that this is intentional and that the correct symbols are available.

62-62: Initialization looks fine.

Assigning user_config.use_single_file_archive to m_use_single_file_archive supports the new single–file archive feature as intended. No issues found.

662-679: Handle edge cases for segment IDs and metadata.

If m_next_segment_id is 0, m_next_segment_id - 1 becomes negative, potentially leading to unexpected behaviour in get_segment_ids().
Consider checking for empty or invalid segment_ids before proceeding.
Ensure that partial writes or exceptions during write_single_file_archive are either rolled back or leave a consistent state.

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (2)

16-34: Custom exception design appears consistent.

The OperationFailed class neatly extends TraceableException and provides a clear error message.

40-41: Validate last_segment_id usage.

Please ensure that calling get_segment_ids(last_segment_id) with zero or negative values is handled gracefully in the implementation.

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (6)

23-27: Basic definitions look acceptable.

The magic number, file extension, and file–size warning threshold appear suitable for single–file archives.

28-35: Static file handling looks straightforward.

cStaticArchiveFileNames is a helpful container for known archive files. No issues observed here.

37-43: Packed struct alignment caution.

__attribute__((packed)) on SingleFileArchiveHeader can cause cross–platform alignment mismatches. Confirm that your usage environment supports it consistently.

45-49: FileInfo struct.

No concerns: the usage of MSGPACK_DEFINE_MAP is consistent with message pack patterns.

51-72: MultiFileArchiveMetadata structure is appropriate.

The fields here match multi–file archiving logic. Good usage of MSGPACK_DEFINE_MAP.

74-79: SingleFileArchiveMetadata structure appropriateness.

Combining archive_files, archive_metadata, and num_segments is logical for single–file mode. Nicely integrated with msgpack.

components/core/src/clp/clp/CommandLineArguments.hpp (3)

26-26: Default boolean initialisation.

m_single_file_archive(false) is clear and consistent with the default behaviour of multi–file archives.

49-50: Accessor method is straightforward.

get_use_single_file_archive() properly reflects the new member variable. No issues found.

98-98: Member variable integration.

m_single_file_archive fits in seamlessly with existing arguments. No contradictions observed.

components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (3)

10-10: Added include for encoding methods.

Including encoding_methods.hpp is logical, given usage of ffi::cVariableEncodingMethodsVersion further in the file.

13-14: New compression type constant.

cCompressionTypeZstd = "ZSTD"; is a welcome addition, clarifying the default compression type used.

86-91: Accessors for new metadata fields.

Providing methods to retrieve variable encoding and schema versions, along with compression type, aligns well with single–file archive needs.

components/core/src/clp/clp/compression.cpp (3)

110-110: No issues found with the addition of the single-file-archive configuration.
This line properly forwards the command-line argument into the archive writer’s configuration.

139-140: Logical check for archive splitting is correct.
The code correctly checks whether the dictionary size threshold is reached and if single-file mode is disabled. This ensures that splitting only occurs under the intended conditions.

168-169: Consistent archive splitting logic for grouped files.
These lines mirror the logic above and maintain consistent behaviour for grouped file compression.

components/core/src/clp/streaming_archive/writer/Archive.hpp (3)

51-51: New boolean flag introduced.
Adding the flag use_single_file_archive to UserConfig is clear and self-explanatory, facilitating better configurability of the archiving process.

197-197: Getter method aligns with the coding guidelines.
This accessor cleanly exposes the new flag, supporting usage in external logic.

285-289: Implementation detail for single-file archive creation.
The create_single_file_archive() method is a suitable extension point, though it’s not fully shown here. It should be well-documented and tested to ensure it handles any corner cases (e.g. partially written archives).

components/core/src/clp/clp/CommandLineArguments.cpp (1)

376-379: New command-line option successfully introduced.
The addition of --single-file-archive provides enhanced flexibility for output formats. Ensure that the help text and documentation clearly describe any potential performance or storage trade-offs.

components/core/src/clp/clp/FileCompressor.cpp (3)

246-247: Single-file check prevents increments of archive splits.
This usage respects the new flag to avoid unnecessary splitting, preserving consistent behaviour with single-file archiving.

341-342: Conditional logic consistent with the single-file feature.
Retaining the archival split only if false == … ensures that the newly introduced flag remains orthogonal to existing compression workflows.

542-543: Proper check before splitting IR streams.
The logic parallels other file-splitting checks, keeping single-file intentions intact across different compression paths.

components/core/src/clp/clp/CMakeLists.txt (2)

111-113: LGTM! New source files follow project conventions.

The new single-file archive implementation files are logically organized and follow the project's established patterns for source organization.

Line range hint 1-150: Consider making GlobalMetadataDB optional for single-file archives.

Based on the PR objectives, GlobalMetadataDB might not be required when writing single-file archives. Consider making it conditionally included based on the archive type to avoid unnecessary dependencies.

Let's verify the GlobalMetadataDB usage in the single-file archive implementation:

✅ Verification successful

Based on the examination of the single file archive implementation in the codebase, particularly in single_file_archive/writer.hpp and writer.cpp, I can now provide a final response:

GlobalMetadataDB is not used in single-file archive operations

The single-file archive implementation is completely independent of GlobalMetadataDB. It operates by:

Creating a self-contained archive with its own metadata structure (SingleFileArchiveMetadata)
Directly copying files from the multi-file archive
Managing file offsets and archive structure internally

The GlobalMetadataDB dependency can be safely made optional for single-file archives as they:

Store all necessary metadata within the single file
Don't require external database operations
Handle their own file management and metadata tracking

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check GlobalMetadataDB usage in single-file archive code

# Check for GlobalMetadataDB usage in new SFA files
rg -A 5 "GlobalMetadataDB" "../streaming_archive/single_file_archive/"

# Check for any conditional usage patterns
ast-grep --pattern 'if ($condition) {
  $$$
  GlobalMetadataDB
  $$$
}'

Length of output: 224

Script:

#!/bin/bash
# Let's try searching in the entire codebase
# Check GlobalMetadataDB usage in single file archive related code
rg -A 5 "GlobalMetadataDB" --type cpp

# Check writer implementation specifically
rg -A 10 "single_file_archive.*writer" --type cpp

# Look for single file archive implementation
fd "single.*archive"

Length of output: 48285

Script:

#!/bin/bash
# Let's check the single file archive writer implementation
rg -A 10 "class.*writer" components/core/src/clp/streaming_archive/single_file_archive/writer.hpp

# Check if GlobalMetadataDB is used in single file archive implementation
rg -A 5 "GlobalMetadataDB" components/core/src/clp/streaming_archive/single_file_archive/writer.cpp

# Check Archive constructor to see how single file archive mode is handled
rg -A 10 "Archive::Archive.*Config" components/core/src/clp/streaming_archive/writer/Archive.cpp

Length of output: 298

Script:

#!/bin/bash
# Check the content of single file archive writer files
cat components/core/src/clp/streaming_archive/single_file_archive/writer.hpp
cat components/core/src/clp/streaming_archive/single_file_archive/writer.cpp

# Check Archive constructor implementation
cat components/core/src/clp/streaming_archive/writer/Archive.cpp | rg -A 20 "Archive::Archive"

Length of output: 12781

LinZhihao-723 · 2024-12-28T23:40:48Z

The archive version was changed to use uint32_t in the private branch to match the rule of semantic version encoding. In the current OSS, the archive format version is defined here. In the private branch, we use version = (major_version << 24) | (minor_version << 16) | (patch_version). @kirkrodrigues shall we update the OSS to also use semantic version for the archive format?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (9)

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (3)

15-21: Use consistent naming convention for version constants
The naming “cArchiveMajorVersion”, “cArchiveMinorVersion” and “cArchivePatchVersion” is informative, but ensure consistent naming style (e.g., PascalCase or snake_case) if there’s a global standard.

23-27: Remove magic number duplication
The .clp extension and the cUnstructuredSfaMagicNumber might be repeated in other parts of the code. If so, consider centralizing them in a single constants file to avoid potential inconsistencies.

76-81: Validate memory usage
Storing an entire vector of FileInfo might risk memory issues if the number of archived files grows very large in single-file mode. Consider a streaming approach for extremely large archives.

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (3)

19-20: Enhance documentation for cReadBlockSize
Elaborate on the rationale for using 4096. This helps maintainers decide whether to tune it for performance.

137-145: Large file warning
This warning is valuable. Consider adding a user-configurable threshold to accommodate different environment needs.

215-231: Possible parallelization
The loop writing each segment’s file content can be parallelized if the order doesn’t matter for performance gains.
components/core/src/clp/clp/compression.cpp (2)
139-141: Maintain DRY
This condition repeats in lines 169-171. Factor out an inline function or macro if duplication grows.

139-141: Prefer “false == expr” check
Per coding guidelines, replace && false == archive_writer.get_use_single_file_archive() with && (false == archive_writer.get_use_single_file_archive()).
- && false == archive_writer.get_use_single_file_archive())
+ && (false == archive_writer.get_use_single_file_archive()))
Also applies to: 169-171
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

61-61: Boolean naming
m_use_single_file_archive is clear, but consider naming it m_is_single_file_archive or m_single_file_mode for consistent convention.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c75147 and 82b9802.

📒 Files selected for processing (7)

components/core/src/clp/clp/FileCompressor.cpp (3 hunks)
components/core/src/clp/clp/compression.cpp (3 hunks)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (3 hunks)
components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (1 hunks)
components/core/src/clp/streaming_archive/writer/Archive.cpp (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

components/core/src/clp/clp/FileCompressor.cpp
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp
components/core/src/clp/streaming_archive/single_file_archive/writer.hpp

🧰 Additional context used

📓 Path-based instructions (4)

components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/clp/compression.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (9)

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (3)

19-21: Confirm the semantic version scheme
You are using a bitwise composition of major/minor/patch. Confirm that it remains compatible with your internal versioning constraints discussed in the PR comments.

47-51: Verify no external references to field names
The FileInfo struct serializes fields as “n” and “o”. Ensure there are no external references expecting different property names.

53-73: Serialization definitions
The fields in MultiFileArchiveMetadata are properly included in MSGPACK_DEFINE_MAP. Everything looks coherent and consistent with the usage.

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (4)

100-112: Potential concurrency considerations
update_offset aggregates sizes by reading the filesystem. If future concurrency is introduced, ensure atomic or thread-safe operations around offset calculations.

117-147: Good usage of sentinel
The sentinel FileInfo{"", offset} is a handy approach for communicating total file size.

149-177: Ensure consistent metadata
Data from ArchiveMetadata is mirrored here. If new metadata fields are added in the future, they must be carefully synchronized.

197-213: Handle partial read cases
In a multi-threaded or partial read scenario, validate that your loop handles truncated reads or concurrency issues from external processes.

components/core/src/clp/clp/compression.cpp (1)

110-110: Check code style
The line updates the user config with archive_user_config.use_single_file_archive = command_line_args.get_use_single_file_archive();. Everything looks consistent.

components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

248-250: Single-File archiving
Calling create_single_file_archive() in the close() method is logical. Ensure that any exceptions raised do not leave the archive in an inconsistent state.

coderabbitai · 2024-12-29T00:08:21Z

components/core/src/clp/streaming_archive/writer/Archive.cpp

+void Archive::create_single_file_archive() {
+    std::filesystem::path multi_file_archive_path = m_path;
+
+    auto segment_ids
+            = clp::streaming_archive::single_file_archive::get_segment_ids(m_next_segment_id - 1);
+
+    if (false == m_local_metadata.has_value()) {
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
+
+    auto& multi_file_archive_metadata = m_local_metadata.value();
+    auto packed_metadata
+            = clp::streaming_archive::single_file_archive::create_single_file_archive_metadata(
+                    multi_file_archive_metadata,
+                    multi_file_archive_path,
+                    segment_ids
+            );
+
+    clp::streaming_archive::single_file_archive::write_single_file_archive(
+            multi_file_archive_path,
+            packed_metadata,
+            segment_ids
+    );
+}


⚠️ Potential issue

Consider cleanup ordering
You remove the multi-file archive only after writing the single-file archive. Double-check that partial failures mid-operation do not remove the source prematurely.

davemarco · 2024-12-29T00:11:36Z

sorry closing was by accident...

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (4)

19-21: Consider documenting the rationale for the block size value.

Adding a comment explaining why 4096 was chosen as the block size would help future maintainers understand if this value needs adjustment.

 namespace {
-constexpr size_t cReadBlockSize = 4096;
+// 4096 bytes is a common page size and provides good balance between memory usage and I/O performance
+constexpr size_t cReadBlockSize = 4096;

195-211: Enhance error handling with more context in error messages.

The error handling could be improved by including more context when throwing exceptions.

         }
         if (ErrorCode_Success != error_code) {
-            throw OperationFailed(error_code, __FILENAME__, __LINE__);
+            throw OperationFailed(
+                error_code,
+                __FILENAME__,
+                __LINE__,
+                fmt::format("Failed to read from file '{}': {}", 
+                    file_path.string(),
+                    get_error_message(error_code))
+            );
         }

265-268: Improve error messages for file operations.

The error handling should include more context about the operation that failed.

     if (std::filesystem::exists(single_file_archive_path)) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Single-file archive '{}' already exists",
+                single_file_archive_path.string())
+        );
     }

     // ... (other code remains the same)

     try {
         std::filesystem::remove_all(multi_file_archive_path);
     } catch (std::filesystem::filesystem_error& e) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Failed to clean up multi-file archive '{}': {}",
+                multi_file_archive_path.string(),
+                e.what())
+        );
     }

Also applies to: 279-284

255-284: Consider architectural improvements for better operability.

Two suggestions for improving the implementation:

Add logging for successful operations to aid in debugging and monitoring.
Consider making the cleanup of the multi-file archive configurable, as there might be cases where keeping both versions is desired (e.g., for verification or backup purposes).

Example implementation:

void write_single_file_archive(
        std::filesystem::path const& multi_file_archive_path,
        std::stringstream const& packed_metadata,
        std::vector<std::string> const& segment_ids,
        bool cleanup_multi_file_archive = true  // New parameter
) {
    SPDLOG_INFO("Creating single-file archive from '{}'", 
        multi_file_archive_path.string());
    
    // ... (existing implementation) ...

    archive_writer.close();
    SPDLOG_INFO("Successfully created single-file archive '{}'", 
        single_file_archive_path.string());

    if (cleanup_multi_file_archive) {
        SPDLOG_INFO("Cleaning up multi-file archive '{}'", 
            multi_file_archive_path.string());
        try {
            std::filesystem::remove_all(multi_file_archive_path);
            SPDLOG_INFO("Successfully cleaned up multi-file archive");
        } catch (std::filesystem::filesystem_error& e) {
            // ... (existing error handling) ...
        }
    }
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82b9802 and 393049b.

📒 Files selected for processing (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

1-18: LGTM! Well-organized includes and namespace declaration.

The includes are properly grouped and all necessary headers are present.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (5)

101-113: Enhance error handling with more specific information.

The error handling could be improved by including the file path in the error message and preserving the original error code.

Consider this enhancement:

     } catch (std::filesystem::filesystem_error const& e) {
         throw OperationFailed(
                 ErrorCode_Failure,
                 __FILENAME__,
                 __LINE__,
-                fmt::format("Failed to get file size: {}", e.what())
+                fmt::format("Failed to get file size for '{}': {} (error code: {})",
+                          file_path.string(), e.what(), e.code().value())
         );
     }

138-145: Enhance warning message with size information.

The warning message could be more helpful by including the actual archive size for comparison.

Consider this enhancement:

     if (offset > cFileSizeWarningThreshold) {
         SPDLOG_WARN(
-                "Single file archive size exceeded {}. "
+                "Single file archive size ({}) exceeded threshold ({}). "
                 "The single-file archive format is not intended for large archives, "
                 " consider using multi-file archive format instead.",
+                offset,
                 cFileSizeWarningThreshold
         );
     }

196-212: Consider adding progress reporting for large file operations.

For better observability during large file operations, consider adding progress reporting. Also, the error handling could be more specific.

Consider these enhancements:

 auto write_archive_file(std::filesystem::path const& file_path, FileWriter& archive_writer)
         -> void {
     FileReader reader(file_path.string());
+    auto total_size = std::filesystem::file_size(file_path);
+    uint64_t bytes_processed = 0;
     std::array<char, cReadBlockSize> read_buffer{};
     while (true) {
         size_t num_bytes_read{};
         ErrorCode const error_code
                 = reader.try_read(read_buffer.data(), cReadBlockSize, num_bytes_read);
         if (ErrorCode_EndOfFile == error_code) {
             break;
         }
         if (ErrorCode_Success != error_code) {
-            throw OperationFailed(error_code, __FILENAME__, __LINE__);
+            throw OperationFailed(
+                error_code,
+                __FILENAME__,
+                __LINE__,
+                fmt::format("Failed to read from file: {}", file_path.string())
+            );
         }
         archive_writer.write(read_buffer.data(), num_bytes_read);
+        bytes_processed += num_bytes_read;
+        if (total_size > cFileSizeWarningThreshold) {
+            SPDLOG_DEBUG(
+                "Processing file {}: {:.1f}% ({}/{} bytes)",
+                file_path.filename().string(),
+                (bytes_processed * 100.0) / total_size,
+                bytes_processed,
+                total_size
+            );
+        }
     }
 }

266-268: Improve error handling with specific error messages.

The error handling for file existence check and archive removal could be more informative.

Consider these enhancements:

     if (std::filesystem::exists(single_file_archive_path)) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Single-file archive already exists: {}", 
+                       single_file_archive_path.string())
+        );
     }

     // ... (other code)

     try {
         std::filesystem::remove_all(multi_file_archive_path);
     } catch (std::filesystem::filesystem_error& e) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Failed to remove multi-file archive at '{}': {}",
+                       multi_file_archive_path.string(), e.what())
+        );
     }

Also applies to: 280-284

233-245: Optimize vector allocation and add range validation.

Consider pre-allocating the vector and adding a reasonable range check for segment IDs.

Consider these enhancements:

 auto get_segment_ids(segment_id_t last_segment_id) -> std::vector<std::string> {
     std::vector<std::string> segment_ids;
 
     if (last_segment_id < 0) {
         return segment_ids;
     }
 
+    if (last_segment_id > cMaxSegmentId) {
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Segment ID {} exceeds maximum allowed value {}",
+                       last_segment_id, cMaxSegmentId)
+        );
+    }
+
+    segment_ids.reserve(last_segment_id + 1);
     for (size_t i = 0; i <= last_segment_id; ++i) {
         segment_ids.emplace_back(std::to_string(i));
     }
 
     return segment_ids;
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 393049b and 5428403.

📒 Files selected for processing (2)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp

🧰 Additional context used

📓 Path-based instructions (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

1-17: LGTM! Well-organized includes and namespace setup.

The includes are properly organized into system headers, third-party libraries, and project-specific headers. The organization follows good C++ practices.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (4)
135-135: Document the sentinel file info entry

The empty filename sentinel is used to mark the total size of all files. Consider adding a comment explaining why an empty filename is used as a sentinel and how it should be handled by the reader.

197-213: Enhance error handling in file reading

Consider adding more specific error information when throwing OperationFailed. The current implementation loses the original error code context.
-            throw OperationFailed(error_code, __FILENAME__, __LINE__);
+            throw OperationFailed(
+                error_code,
+                __FILENAME__,
+                __LINE__,
+                fmt::format("Failed to read from file: {}", file_path.string())
+            );
234-246: Improve type consistency and clarity in segment ID generation

The function mixes signed and unsigned types, and the negative check could be more explicit.
-auto get_segment_ids(segment_id_t last_segment_id) -> std::vector<std::string> {
+auto get_segment_ids(segment_id_t const last_segment_id) -> std::vector<std::string> {
     std::vector<std::string> segment_ids;
 
-    if (last_segment_id < 0) {
+    if (last_segment_id < 0) {
+        SPDLOG_WARN("Negative segment ID provided: {}", last_segment_id);
         return segment_ids;
     }
 
-    for (size_t i = 0; i <= last_segment_id; ++i) {
+    auto const last_id = static_cast<size_t>(last_segment_id);
+    for (size_t i = 0; i <= last_id; ++i) {
         segment_ids.emplace_back(std::to_string(i));
     }
1-287: Overall implementation is well-structured and robust

The implementation demonstrates good separation of concerns, proper error handling, and clear documentation. The code is organized logically with helper functions in an anonymous namespace and a clean public interface.

Consider adding the following improvements in future iterations:

Add unit tests to verify the archive format compatibility

Consider implementing a dry-run mode to validate the archive structure before writing

Add progress reporting for large archives
components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (2)
7-7: Consider standardizing header file extensions

The codebase mixes .h and .hpp extensions. Consider standardizing to .hpp for consistency, as it's more specific to C++ headers.

47-51: Use descriptive field names in FileInfo

The FileInfo structure uses abbreviated field names (n, o) which reduce code readability.

Consider using more descriptive names:
 struct FileInfo {
-    std::string n;
-    uint64_t o;
-    MSGPACK_DEFINE_MAP(n, o);
+    std::string name;
+    uint64_t offset;
+    MSGPACK_DEFINE_MAP(name, offset);
 };

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5428403 and 7e261f7.

📒 Files selected for processing (2)

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (5)

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

1-18: LGTM! Well-organized includes

The includes are properly organized into logical groups (standard library, third-party, and project-specific) with clear separation between groups.

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp (4)

13-21: Well-structured version encoding

The version encoding aligns with the semantic versioning approach discussed in the PR comments, using bit shifting to combine major (0), minor (1), and patch (1) versions into a single uint32_t.

38-45: Document the purpose of unused fields and packed attribute

The SingleFileArchiveHeader structure:

Contains 6 unused fields without explanation
Uses __attribute__((packed)) which might cause alignment issues on some architectures

Please document:

The purpose of reserving unused fields
Why packing is necessary
Any alignment requirements for different architectures

27-27: Verify file size warning threshold

The 100MB threshold (cFileSizeWarningThreshold) seems arbitrary. Please confirm if this value aligns with the private branch implementation and document the rationale.

✅ Verification successful

Let me gather more context about the usage of this threshold.

The file size warning threshold is correctly implemented and documented

The 100MB threshold is used to warn users when they're creating single-file archives that are too large. The implementation shows clear rationale in the warning message: single-file archives are not split and can consume excessive memory during decompression. Users are advised to use the multi-file archive format instead for large archives.

The warning message in writer.cpp provides sufficient documentation:

"Single file archive size exceeded {}. "
"The single-file archive format is not intended for large archives, "
"consider using multi-file archive format instead."

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar thresholds in the codebase
rg -i "size.*threshold|threshold.*size" --type cpp

Length of output: 1094

Script:

#!/bin/bash
# Check the usage context of the warning threshold
rg -B 5 -A 5 "cFileSizeWarningThreshold" --type cpp

# Look for any documentation or comments about file size limits
rg -i "file.*size.*limit|size.*limit.*file" --type cpp

Length of output: 3399

53-81: Verify metadata compatibility with private branch

The PR objectives mention differences in metadata formats between branches. Please confirm:

All required fields from the private branch are included
Data type changes are backward compatible
The order of fields in MSGPACK_DEFINE_MAP maintains compatibility

✅ Verification successful

The previous ast-grep pattern might have failed due to syntax. Let's try different approaches to find metadata structures and their msgpack definitions in the codebase. This will help us verify the compatibility concerns.

Let's gather more information about version compatibility and metadata structures from the streaming archive implementation.

Based on the gathered information, I can now provide a final response about the metadata compatibility:

Metadata structures appear compatible with existing implementation

The metadata structures in the PR maintain compatibility with the existing codebase:

All required fields are present and match the existing implementation in ArchiveMetadata.hpp
Data types are consistent with the current usage:
- archive_format_version_t for version tracking
- std::string for version identifiers and compression type
- uint64_t for size fields
The order of fields in MSGPACK_DEFINE_MAP matches the struct field declarations, maintaining proper serialization
The version constants (cArchiveFormatVersion, cVariableEncodingMethodsVersion, cVariablesSchemaVersion) are properly referenced

The changes appear to be a refactoring that preserves the existing metadata format while organizing it into more structured types (MultiFileArchiveMetadata and SingleFileArchiveMetadata).

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar metadata structures in the codebase
ast-grep --pattern 'struct $_ {
  $$$
  MSGPACK_DEFINE_MAP($$$);
}'

Length of output: 69

Script:

#!/bin/bash
# Search for metadata structures using ripgrep
rg -U "struct.*Metadata.*\{[\s\S]*?MSGPACK_DEFINE" -A 20

Length of output: 5375

Script:

#!/bin/bash
# Search for any MSGPACK_DEFINE usage
rg "MSGPACK_DEFINE" -A 5

Length of output: 3013

Script:

#!/bin/bash
# Look for any files containing 'Metadata' in their name
fd "Metadata" --type f

Length of output: 1265

Script:

#!/bin/bash
# Let's check the content of ArchiveMetadata files for any additional metadata structures
rg -U "struct.*\{[\s\S]*?\}" components/core/src/clp/streaming_archive/ArchiveMetadata.hpp

Length of output: 92

Script:

#!/bin/bash
# Check for any version-related constants or definitions
rg "version" -i components/core/src/clp/streaming_archive/

Length of output: 5135

coderabbitai · 2024-12-29T02:11:50Z

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp

+    try {
+        std::filesystem::remove_all(multi_file_archive_path);
+    } catch (std::filesystem::filesystem_error& e) {
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }


🛠️ Refactor suggestion

Improve error handling for cleanup operation

The error message is not propagated when file removal fails

Consider verifying if the archive was written successfully before removing the original files

archive_writer.close(); + // Verify the archive was written successfully + if (!std::filesystem::exists(single_file_archive_path) || + std::filesystem::file_size(single_file_archive_path) == 0) { + throw OperationFailed( + ErrorCode_Failure, + __FILENAME__, + __LINE__, + "Failed to write single-file archive" + ); + } try { std::filesystem::remove_all(multi_file_archive_path); } catch (std::filesystem::filesystem_error& e) { - throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__); + throw OperationFailed( + ErrorCode_Failure, + __FILENAME__, + __LINE__, + fmt::format("Failed to remove original archive: {}", e.what()) + ); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try {

std::filesystem::remove_all(multi_file_archive_path);

} catch (std::filesystem::filesystem_error& e) {

throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);

}

archive_writer.close();

// Verify the archive was written successfully

if (!std::filesystem::exists(single_file_archive_path) ||

std::filesystem::file_size(single_file_archive_path) == 0) {

throw OperationFailed(

ErrorCode_Failure,

__FILENAME__,

__LINE__,

"Failed to write single-file archive"

);

}

try {

std::filesystem::remove_all(multi_file_archive_path);

} catch (std::filesystem::filesystem_error& e) {

throw OperationFailed(

ErrorCode_Failure,

__FILENAME__,

__LINE__,

fmt::format("Failed to remove original archive: {}", e.what())

);

}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

components/core/src/clp/streaming_archive/writer/Archive.cpp (3)
341-343: Consider refactoring the nested conditions for better readability.

The condition for file splitting now includes two nested checks that could be simplified.

Consider this alternative structure:
-    if (get_data_size_of_dictionaries() >= m_target_data_size_of_dicts
-        && false == m_use_single_file_archive)
-    {
+    bool should_split = get_data_size_of_dictionaries() >= m_target_data_size_of_dicts
+                        && !m_use_single_file_archive;
+    if (should_split) {
669-671: Add error context to the metadata validation.

The error message could be more descriptive about why the metadata is missing.
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            "Local metadata is missing when creating single-file archive"
+        );
681-685: Consider adding progress logging for large archives.

For large archives, it would be helpful to log the progress of the single-file archive creation process.
+    SPDLOG_INFO("Creating single-file archive at {}", multi_file_archive_path.string());
     clp::streaming_archive::single_file_archive::write_single_file_archive(
             multi_file_archive_path,
             packed_metadata,
             segment_ids
     );
+    SPDLOG_INFO("Successfully created single-file archive");
components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (2)
24-24: Consider documenting the rationale for the block size value

Adding a comment explaining why 4096 was chosen as the block size would help future maintainers understand if this value needs adjustment.
-constexpr size_t cReadBlockSize = 4096;
+// 4KB block size chosen to match common page size for optimal I/O performance
+constexpr size_t cReadBlockSize = 4096;
199-215: Enhance error handling with descriptive messages

The error handling could be more informative by including the file path and specific error details in the exception message.
         }
         if (ErrorCode_Success != error_code) {
-            throw OperationFailed(error_code, __FILENAME__, __LINE__);
+            throw OperationFailed(
+                error_code,
+                __FILENAME__,
+                __LINE__,
+                fmt::format("Failed to read from file '{}': {}", 
+                    file_path.string(),
+                    get_error_message(error_code))
+            );
         }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e261f7 and 0bd9b27.

📒 Files selected for processing (4)

components/core/CMakeLists.txt (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1 hunks)
components/core/src/clp/streaming_archive/single_file_archive/writer.hpp (1 hunks)
components/core/src/clp/streaming_archive/writer/Archive.cpp (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp

🧰 Additional context used

📓 Path-based instructions (2)

components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (5)

components/core/src/clp/streaming_archive/writer/Archive.cpp (2)

16-24: LGTM! Header includes and dependencies look good.

The addition of the single-file archive writer header and its dependencies is properly organized.

663-686: Verify cleanup order in single-file archive creation.

The function creates a single-file archive after closing the multi-file archive. This ordering needs careful consideration for partial failures.

Let's verify the cleanup handling:

#!/bin/bash
# Search for error handling patterns around archive creation
rg -A 5 "create_single_file_archive|write_single_file_archive" --type cpp

# Look for cleanup/rollback mechanisms
ast-grep --pattern 'try {
  $$$
  create_single_file_archive();
  $$$
} catch ($_) {
  $$$
}'

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp (3)

1-21: LGTM! Well-organized includes and namespace structure.

The includes are properly grouped and contain all necessary headers for the implementation.

283-287: Improve error handling for cleanup operation

This section needs better error handling as identified in a previous review.

236-248: Add validation for segment ID range

Consider adding an upper bound check for last_segment_id to prevent excessive memory allocation for very large segment IDs.

 auto get_segment_ids(segment_id_t last_segment_id) -> std::vector<std::string> {
     std::vector<std::string> segment_ids;
 
     if (last_segment_id < 0) {
         return segment_ids;
     }
 
+    // Prevent excessive memory allocation
+    if (last_segment_id > cMaxSegmentId) {
+        throw OperationFailed(
+            ErrorCode_InvalidArgument,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Segment ID {} exceeds maximum allowed value {}", 
+                last_segment_id, cMaxSegmentId)
+        );
+    }
+
     for (size_t i = 0; i <= last_segment_id; ++i) {

coderabbitai · 2024-12-29T02:57:29Z

components/core/CMakeLists.txt

+        src/clp/streaming_archive/single_file_archive/Defs.hpp
+        src/clp/streaming_archive/single_file_archive/writer.cpp
+        src/clp/streaming_archive/single_file_archive/writer.hpp


💡 Codebase verification

Missing test coverage for the new single-file archive implementation

The new single-file archive files have been properly added to the CMake configuration, but I don't see any corresponding test files for this functionality. While there are several test files in the codebase:

There's only one test file that mentions "Archive": test-clp_s-end_to_end.cpp

None of the existing test files appear to specifically cover the new single-file archive implementation

Recommended actions:

Create unit tests for the new single-file archive writer implementation

Consider adding tests in these areas:

Basic writer functionality in writer.cpp

Edge cases and error handling

Integration with the existing archive system

🔗 Analysis chain

LGTM! The new single-file archive source files are properly integrated.

The files are correctly placed in a dedicated directory and follow the project's naming conventions.

Let's verify the test coverage for these new files:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check for corresponding test files and test coverage for the new SFA implementation. # Look for test files related to single-file archive rg -l "single_file_archive" tests/ # Check for test cases using the new writer rg -A 5 "TEST.*single.*file.*archive" tests/

Length of output: 175

Script:

#!/bin/bash # Let's first find the test directory location fd -t d "test" . # Then search for single file archive related files and tests rg -l "single_file_archive" . # Look for test cases specifically rg -A 5 "TEST.*single.*file.*archive" .

Length of output: 1270

Script:

#!/bin/bash # Let's check the test files in the core component fd -t f ".*test.*\.(cpp|hpp)$" components/core/tests/ # Look for any test files that might be testing the archive functionality rg -l "Archive" components/core/tests/ # Check if there are any test files that include "writer" in their name or content rg -l "writer" components/core/tests/

Length of output: 1828

haiqi96 · 2025-01-06T23:07:06Z

High level comment: let's use auto func() -> [return type] {} for all methods. There are still a few methods using the old style signature

haiqi96 · 2025-01-07T15:52:31Z

To your questions:

Should we remove GlobalMetadataDB when writing an sfa? Note GlobalMetadataDB code is completely removed in private branch, which we obviously can't do in open source.

As a quick conclusion without thinking about details, I don't think we should. GlobalMetadataDB is used by clp and clp package to get the list of available archives. Without GlobalMetadataDB, we have to check the archive directory to see what archives are available, but this is not feasible if the archive is written to S3. It remains a question that if we need all fields in the table for a single archive, but let's keep the GlobalMetadataDB for now.

Let me know if you disagree.

What exactly is the single-file archive version in the header? Is this a version specifically for single-file-archive or is it just the clp release version?

@LinZhihao-723 can you give some input on this?

haiqi96

First post a small batch of review. Still omw to review the core changes

haiqi96 · 2025-01-07T16:06:44Z

components/core/src/clp/clp/CommandLineArguments.hpp

@@ -45,6 +46,8 @@ class CommandLineArguments : public CommandLineArgumentsBase {

    bool show_progress() const { return m_show_progress; }

+    bool get_use_single_file_archive() const { return m_single_file_archive; }


nit: we can name it to be "use_single_file_archive" or maybe even "single_file_archive", same as other boolean.

haiqi96 · 2025-01-07T16:13:09Z

components/core/src/clp/clp/FileCompressor.cpp

@@ -243,7 +243,9 @@ void FileCompressor::parse_and_encode_with_heuristic(

    // Parse content from file
    while (m_message_parser.parse_next_message(true, reader, m_parsed_message)) {
-        if (archive_writer.get_data_size_of_dictionaries() >= target_data_size_of_dicts) {
+        if (archive_writer.get_data_size_of_dictionaries() >= target_data_size_of_dicts


Just thinking, would it be cleaner if we add a new method with name like "should_split" to archive_writer, and embed this if logic into the method.

Now the same if statements have been duplicated at multiple places, which is inefficient and error prone since one change requires you to update multiple places

haiqi96 · 2025-01-07T16:14:47Z

components/core/src/clp/clp/CommandLineArguments.cpp

+            )(
+                    "single-file-archive",
+                    po::bool_switch(&m_single_file_archive),
+                    "Output archive as a single-file"


Suggested change

"Output archive as a single-file"

"Output archive as a single-file archive"

Also, should we support the option that allows user to specify the filename?

haiqi96 · 2025-01-07T16:25:53Z

components/core/src/clp/clp/compression.cpp

@@ -135,7 +136,9 @@ bool compress(
        );
    }
    for (auto it = files_to_compress.cbegin(); it != files_to_compress.cend(); ++it) {
-        if (archive_writer.get_data_size_of_dictionaries() >= target_data_size_of_dictionaries) {
+        if (archive_writer.get_data_size_of_dictionaries() >= target_data_size_of_dictionaries


haiqi96 · 2025-01-07T16:34:30Z

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp

+        std::vector<FileInfo> const& file_infos,
+        std::vector<std::string> const& segment_ids
+) -> std::stringstream {
+    MultiFileArchiveMetadata archive_metadata{


why MultiFileArchiveMetadata is under pack_single_file_archive_metadata?

I feel the naming is a bit confusing and maybe can be improved

I agree the naming is confusing. I added the name MultiFileArchiveMetadata to try and make less confusing but clearly didn't help lol.

The single file archive metadata section is confusing in that is not just the original archives metadata file. It is a superset. It includes the original archive metadata, but 2 other components (the num segments, and file info section). I added the name multi-file archive metadata to reference the metadata of the original archive (i.e. without the num segments and the file info section).

Another wrinkle, is that the metadata was chosen to be written in MsgPack format for sfa (i dont know why but I'm sure there was a reason). As a result, we don't just write the file, like we do with other archive files, it need to be serialized to MsgPack first.

seriliazing to MsgPack first makes sense, but still we should find better naming for the variables.

I think ArchiveMetadata would be a better name compared to MultiFileArchiveMetadata.

Another possible way is to make BaseArchiveMetadata class, and let the SFA and "Normal Archive" (need a better name for sure) to extend on it.

Note there is already a ArchiveMetadata Class. Technically it is in a different namespace, so we could use the same name, but that might be confusing. This struct just takes the variables from ArchiveMetadata class that are actually written to disk and wraps them in a struct with the appropriate variable names (without "m_") for msgpack serialization.

haiqi96 · 2025-01-07T16:46:47Z

components/core/src/clp/streaming_archive/writer/Archive.cpp

@@ -330,7 +338,9 @@ void Archive::write_msg_using_schema(LogEventView const& log_view) {
            m_old_ts_pattern = timestamp_pattern;
        }
    }
-    if (get_data_size_of_dictionaries() >= m_target_data_size_of_dicts) {
+    if (get_data_size_of_dictionaries() >= m_target_data_size_of_dicts


davemarco · 2025-01-07T20:17:52Z

To your questions:

Should we remove GlobalMetadataDB when writing an sfa? Note GlobalMetadataDB code is completely removed in private branch, which we obviously can't do in open source.

As a quick conclusion without thinking about details, I don't think we should. GlobalMetadataDB is used by clp and clp package to get the list of available archives. Without GlobalMetadataDB, we have to check the archive directory to see what archives are available, but this is not feasible if the archive is written to S3. It remains a question that if we need all fields in the table for a single archive, but let's keep the GlobalMetadataDB for now.

Let me know if you disagree.

This is interesting. I wasn't really thinking about the sfa would interact with clp-package. Perhaps they have already considered this when designing for clp-s. For now we can leave it in, as it is not essential for this PR or the reader PR. I am still wary of generating two files, as purpose of single-file archive is that it's a single file. Perhaps kirk will have more context when time allows.

LinZhihao-723 · 2025-01-07T20:37:14Z

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp

+ * @param last_segment_id ID of last written segment in archive.
+ * @return Vector of segment IDs.
+ */
+auto get_segment_ids(segment_id_t last_segment_id) -> std::vector<std::string>;


A friendly reminder that we should add [[nodiscard]] to function definitions that have a non-void return

haiqi96 · 2025-01-07T20:38:09Z

components/core/src/clp/streaming_archive/writer/Archive.cpp

@@ -242,6 +246,10 @@ void Archive::close() {

    m_metadata_db.close();

+    if (m_use_single_file_archive) {
+        create_single_file_archive();


One concern:

Now we are generating a normal archive, writing its metadata to the global metadata database and then create a single file archive by combining the files.

We haven't decided the metadata to be written to global metadataDB for sfa yet, but if it would be different from normal archive, would this be an issue?

I think it would complicate the code if different. Ideally we can keep them the same

haiqi96 · 2025-01-07T20:40:57Z

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp

+namespace {
+constexpr size_t cReadBlockSize = 4096;
+
+/**


I personally feel "packer" might be a better name, since this file isn't a class, but just a collection of helper method.

In addition, I am not sure if this is the best design method. @kirkrodrigues are we using similar packer design in clp-s?

I can rename to packer, but can maybe cause confusion with MsgPack pack method. But if that is the intention, I think its okay.

Overall the flow for the writer in clp-s is quite similiar, in that the files are copied over to the sfa, then deleted. @haiqi96, note ray wrote and devin review the clp-s PR. @kirkrodrigues if you prefer just using a class rather than a collection of helped methods. I can refactor.

haiqi96 · 2025-01-07T20:44:26Z

components/core/src/clp/streaming_archive/single_file_archive/Defs.hpp

+    MSGPACK_DEFINE_MAP(n, o);
+};
+
+struct MultiFileArchiveMetadata {


@LinZhihao-723 @kirkrodrigues
My understanding is that we should use class instead of struct, even if it's for data holding purpose. Can you guys confirm?

https://google.github.io/styleguide/cppguide.html#Structs_vs._Classes

Use a struct only for passive objects that carry data; everything else is a class.

The struct must not have invariants that imply relationships between different fields, since direct user access to those fields may break those invariants.

Do any of the fields in this struct have relationships between each other?

The relationships between the fields are weak, however, one could argue relationships between uncompressed and compressed size.

One could also argue that the ArchiveMetadata class, the location where the values come from, should be responsible for enforcing the invariant, and not this struct, which is just a temporary container for msgpack serialization.

Also note that class members are prefixed with "m_", which would complicate msgpack serialization since as default it just uses the variables names as keys.

Lastly, I took this from clp-s code since i thought it was more elegant than the nholman json class serialization interface.

haiqi96 · 2025-01-07T20:52:30Z

To your questions:

Should we remove GlobalMetadataDB when writing an sfa? Note GlobalMetadataDB code is completely removed in private branch, which we obviously can't do in open source.

As a quick conclusion without thinking about details, I don't think we should. GlobalMetadataDB is used by clp and clp package to get the list of available archives. Without GlobalMetadataDB, we have to check the archive directory to see what archives are available, but this is not feasible if the archive is written to S3. It remains a question that if we need all fields in the table for a single archive, but let's keep the GlobalMetadataDB for now.
Let me know if you disagree.

This is interesting. I wasn't really thinking about the sfa would interact with clp-package. Perhaps they have already considered this when designing for clp-s. For now we can leave it in, as it is not essential for this PR or the reader PR. I am still wary of generating two files, as purpose of single-file archive is that it's a single file. Perhaps kirk will have more context when time allows.

There are two configurations for global metadataDB. When using a "local file" as metadataDB, clp will generate one file per execution, regardless of the number of archives. so say you compress 10000 files, clp may generate 10 sfas, but only 1 global MetadataDB.

When using package, the global MetadataDB will be a MySQL database running at the backend. In this way, no file will be generated, and CLP directly writes to the MySQL database. CLP knows it is a remote database if we pass in a db-config in the cmdline.

I am 100% sure we need to update global MetadataDB when using with package. For non-package case, global metadataDB can still be useful since it contains time_range info of each archives, which can be used as a top level filter.

I think we should agree on the format of global metadata for SFA, then decide how it should be stored.

davemarco · 2025-01-08T02:24:52Z

To your questions:

Should we remove GlobalMetadataDB when writing an sfa? Note GlobalMetadataDB code is completely removed in private branch, which we obviously can't do in open source.

As a quick conclusion without thinking about details, I don't think we should. GlobalMetadataDB is used by clp and clp package to get the list of available archives. Without GlobalMetadataDB, we have to check the archive directory to see what archives are available, but this is not feasible if the archive is written to S3. It remains a question that if we need all fields in the table for a single archive, but let's keep the GlobalMetadataDB for now.
Let me know if you disagree.

This is interesting. I wasn't really thinking about the sfa would interact with clp-package. Perhaps they have already considered this when designing for clp-s. For now we can leave it in, as it is not essential for this PR or the reader PR. I am still wary of generating two files, as purpose of single-file archive is that it's a single file. Perhaps kirk will have more context when time allows.

There are two configurations for global metadataDB. When using a "local file" as metadataDB, clp will generate one file per execution, regardless of the number of archives. so say you compress 10000 files, clp may generate 10 sfas, but only 1 global MetadataDB.

When using package, the global MetadataDB will be a MySQL database running at the backend. In this way, no file will be generated, and CLP directly writes to the MySQL database. CLP knows it is a remote database if we pass in a db-config in the cmdline.

I am 100% sure we need to update global MetadataDB when using with package. For non-package case, global metadataDB can still be useful since it contains time_range info of each archives, which can be used as a top level filter.

I think we should agree on the format of global metadata for SFA, then decide how it should be stored.

This is good info and good points. I'm now okay keeping the global metadataDB as I think it makes sense to keep compatibility with clp package.

Note I think we will still have a requirement to decode archives from the private branch (this is the real long term goal of this work is to open sfa in the log viewer) . The private branch will not have a global MetadataDB. Maybe we treat reading a "lonely archive" (archive without metadataDB) as an exceptional path, and write code specifically for that, but in the general case we produce the global MetadataDB.

haiqi96 · 2025-01-08T16:57:46Z

To your questions:

Should we remove GlobalMetadataDB when writing an sfa? Note GlobalMetadataDB code is completely removed in private branch, which we obviously can't do in open source.

As a quick conclusion without thinking about details, I don't think we should. GlobalMetadataDB is used by clp and clp package to get the list of available archives. Without GlobalMetadataDB, we have to check the archive directory to see what archives are available, but this is not feasible if the archive is written to S3. It remains a question that if we need all fields in the table for a single archive, but let's keep the GlobalMetadataDB for now.
Let me know if you disagree.

This is interesting. I wasn't really thinking about the sfa would interact with clp-package. Perhaps they have already considered this when designing for clp-s. For now we can leave it in, as it is not essential for this PR or the reader PR. I am still wary of generating two files, as purpose of single-file archive is that it's a single file. Perhaps kirk will have more context when time allows.

There are two configurations for global metadataDB. When using a "local file" as metadataDB, clp will generate one file per execution, regardless of the number of archives. so say you compress 10000 files, clp may generate 10 sfas, but only 1 global MetadataDB.
When using package, the global MetadataDB will be a MySQL database running at the backend. In this way, no file will be generated, and CLP directly writes to the MySQL database. CLP knows it is a remote database if we pass in a db-config in the cmdline.
I am 100% sure we need to update global MetadataDB when using with package. For non-package case, global metadataDB can still be useful since it contains time_range info of each archives, which can be used as a top level filter.
I think we should agree on the format of global metadata for SFA, then decide how it should be stored.

This is good info and good points. I'm now okay keeping the global metadataDB as I think it makes sense to keep compatibility with clp package.

Note I think we will still have a requirement to decode archives from the private branch (this is the real long term goal of this work is to open sfa in the log viewer) . The private branch will not have a global MetadataDB. Maybe we treat reading a "lonely archive" (archive without metadataDB) as an exceptional path, and write code specifically for that, but in the general case we produce the global MetadataDB.

Yeah, we might need to have a special path for "lonely archive". It should be able to reuse most of our exisitng code though. maybe we should create a separate PR for the work though. That said, we should have a concrete plan for the other PR so we don't have to revert changes we made in this PR later.

haiqi96

another small batch of reviews

haiqi96 · 2025-01-07T22:16:04Z

components/core/src/clp/streaming_archive/writer/Archive.cpp

+    }
+
+    auto& multi_file_archive_metadata = m_local_metadata.value();
+    auto packed_metadata


nit: auto const.

packed_metadata is not used anywhere, so perhaps we can embed create_single_file_archive_metadata into write_single_file_archive instead of exposing it.

I think this will work fine

haiqi96 · 2025-01-07T22:16:12Z

components/core/src/clp/streaming_archive/writer/Archive.cpp

+void Archive::create_single_file_archive() {
+    std::filesystem::path multi_file_archive_path = m_path;
+
+    auto segment_ids


nit: auto const

haiqi96 · 2025-01-07T22:17:41Z

components/core/src/clp/streaming_archive/writer/Archive.cpp

+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
+
+    auto& multi_file_archive_metadata = m_local_metadata.value();


nit: auto const&

but if you take my advice below, then this line will also be embed into single_file_archive writer.

haiqi96 · 2025-01-07T22:19:45Z

components/core/src/clp/streaming_archive/single_file_archive/writer.hpp

+ * @param last_segment_id ID of last written segment in archive.
+ * @return Vector of segment IDs.
+ */
+auto get_segment_ids(segment_id_t last_segment_id) -> std::vector<std::string>;


is this segment_id different from the segment_id being tracked by archive writer?

Yes, it is the m_next_segment_id - 1. The archive writer keeps a value for the next segment id to write (i.e. when the current segment is full, start a new segment with m_next_segment_id). This last_segment_id is the last actually written segment.

In that case, I feel the writer can directly get the segment_id from archive_writer? don't feel it needs to maintain an extra member variable

Sure I think that will work fine

haiqi96 · 2025-01-07T22:22:45Z

components/core/src/clp/streaming_archive/single_file_archive/writer.cpp

+ * represents the starting position of the next file in single-file archive.
+ * @throws OperationFailed if error getting file size.
+ */
+void update_offset(std::filesystem::path const& file_path, uint64_t& offset);


since the method also gets the file size, I would consider renaming it to be something like "get_file_size_and_update_offset". The current function name doesn't hint about getting file_size

davemarco added 8 commits December 28, 2024 01:46

first draft

a5ef64f

small changes

615fafd

small changes

b84c2da

refactor

d1ec9fe

refactor

869f1b3

refactor

d84c002

refactor

f4136f8

refactor

6bbf12e

davemarco added 2 commits December 28, 2024 23:22

remove extra line

5c75147

fix lint

c1f12df

coderabbitai bot reviewed Dec 28, 2024

View reviewed changes

fix lint

0ab6e0e

davemarco added 2 commits December 28, 2024 23:47

fix lint

d4ed4f6

fix lint?

82b9802

coderabbitai bot reviewed Dec 29, 2024

View reviewed changes

davemarco closed this Dec 29, 2024

davemarco reopened this Dec 29, 2024

fix lint?

393049b

coderabbitai bot reviewed Dec 29, 2024

View reviewed changes

fix lint

5428403

coderabbitai bot reviewed Dec 29, 2024

View reviewed changes

fix lint

7e261f7

coderabbitai bot reviewed Dec 29, 2024

View reviewed changes

fix lint

0bd9b27

coderabbitai bot reviewed Dec 29, 2024

View reviewed changes

davemarco requested a review from haiqi96 January 6, 2025 16:50

haiqi96 reviewed Jan 7, 2025

View reviewed changes

LinZhihao-723 reviewed Jan 7, 2025

View reviewed changes

haiqi96 reviewed Jan 7, 2025

View reviewed changes

haiqi96 reviewed Jan 8, 2025

View reviewed changes

		@@ -45,6 +46,8 @@ class CommandLineArguments : public CommandLineArgumentsBase {

		bool show_progress() const { return m_show_progress; }

		bool get_use_single_file_archive() const { return m_single_file_archive; }

	"Output archive as a single-file"
	"Output archive as a single-file archive"

feat(clp): Add the write path for single-file archives. #646

Are you sure you want to change the base?

feat(clp): Add the write path for single-file archives. #646

Conversation

davemarco commented Dec 28, 2024 • edited by junhaoliao Loading

Description

High Level Question

Minor Question

Code structure

Implementation differences vs. private branch

Differences in multi-file metadata format

Differences in Metadata DB

Javascript sfa reader

Validation performed

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Dec 28, 2024 • edited Loading

Walkthrough

Changes

Possibly Related PRs

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

LinZhihao-723 commented Dec 28, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 29, 2024

Choose a reason for hiding this comment

davemarco commented Dec 29, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 29, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 29, 2024

Choose a reason for hiding this comment

haiqi96 commented Jan 6, 2025

haiqi96 commented Jan 7, 2025 • edited Loading

haiqi96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemarco commented Jan 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemarco Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

haiqi96 commented Jan 7, 2025

davemarco commented Jan 8, 2025 • edited Loading

haiqi96 commented Jan 8, 2025

haiqi96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemarco commented Dec 28, 2024 •

edited by junhaoliao

Loading

coderabbitai bot commented Dec 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

haiqi96 commented Jan 7, 2025 •

edited

Loading

davemarco Jan 8, 2025 •

edited

Loading

davemarco commented Jan 8, 2025 •

edited

Loading