Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/issue 78 impelement file shuffler in rust #134

Merged
merged 13 commits into from
Nov 12, 2024

Conversation

joolaoye
Copy link
Collaborator

FILE SHUFFLER IN RUST

Table of Contents

  1. Introduction
  2. Implementation Details
  3. Conclusion
  4. Demo Video

Introduction

In this project, we implemented Option 1: a File Shuffler, using Rust as our chosen programming language. This implementation aims to address Issue #78. The primary goal of the File Shuffler is to randomize the training dataset files used for training the prediction model in Avatar.

Implementation Details

Project Setup

We used cargo, the project manager within the Rust ecosystem, to set up our project. cargo simplifies the development process by managing dependencies, handling builds, and providing a structured project configuration. Its commands, like cargo build, streamline compiling and testing, allowing us to focus on writing Rust code.

Command-Line Arguments

We chose to use command-line arguments to run the File Shuffler script, because it is more efficient than interactive mode, especially for automation (e.g., in CI/CD pipelines). To run the script, use the cargo run command with the required input directory argument. Additionally, specify an interval with the --interval flag (or -i for short). The interval options are: 0 for "Never", 1 for "Every Week", and 2 for "Every 30 Seconds". If an unsupported interval is entered, the program defaults to 0 (Never).

Directory Structure

Due to time constraints, we kept the code in a single file rather than a modular setup, acknowledging this as technical debt for future refactoring. We used the basic directory structure created by cargo on project initialization:

/shuffler-in-rust
│
├── src
│   └── main.rs
├── Cargo.toml
└── Cargo.lock

Cargo.toml: Configuration file where dependencies are defined.
Cargo.lock: Ensures consistency of dependency versions across builds.
src/: Contains all implementation code.
main.rs: Primary implementation file for the File Shuffler logic.

Shuffling Logic

Before processing, the program verifies that the specified directory exists and is at least 2 levels deep. This depth check aligns with the requirement to start with a parent directory containing subdirectories that represent data labels (e.g., backward, forward, etc.).

The program models the directory structure as an n-ary tree, where the input directory is the root and its immediate child directories are the data labels. Using a recursive backtracking algorithm, we navigate to the deepest level of each branch. Upon reaching a leaf node (a subdirectory without further subdirectories), we move its files into the parent directory and then delete the empty leaf node. This approach employs a bottom-up (post-order) traversal, processing leaf nodes before their parent directories. The files are efficiently copied by appending an incremented number to each filename to avoid duplication.

For a structure with n layers, the recursion descends to a depth of n - 2, stopping at the directories directly under the root before renaming the files.

Renaming Files: We start by counting the files in the current directory to rename and use a hashset to track renamed files, ensuring no file names are duplicated. For each file, a random number from 1 to n (inclusive) is generated and checked against the hashset to prevent overwrites. The files are copied into a temporary directory. Once renaming is complete, all files in the temporary directory are moved back into the original directory, and the temporary directory is deleted.

Conclusion

For more information on the internal logic, functions, and documentation, you can explore the auto-generated documentation. To do so, simply run:

cargo doc --open

This command opens detailed documentation for the implementation.

Demo video

https://drive.google.com/file/d/1oRh9sY2Q5wGD0NrA2Ngd9VrgOjqT3_DC/view?usp=drive_link

Copy link
Owner

@3C-SCSU 3C-SCSU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good implementation. No conflicts. Approved.

@3C-SCSU 3C-SCSU merged commit aae1f2d into main Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants