This Ruby application scans a specified directory to count and identify files with the same content. It efficiently handles large files and directories by comparing file contents using SHA-256 hashes.
git clone https://github.com/andregit1/duplicate_files_counter.git
cd duplicate_files_counter
Use the command line to run the script:
ruby duplicate_file_counter.rb /path/to/directory
Replace /path/to/directory
with the path of the directory you want to scan.
- Purpose: Computes a SHA-256 hash of a file’s content.
- Function: Reads the file in chunks to handle large files efficiently.
- Purpose: Traverses the given directory recursively to find duplicate files.
- Function: Uses
Find.find
to traverse the directory and calculates the hash for each file, storing the count of each hash in a hash map.
- Purpose: Finds the file content hash with the highest count.
- Function: Prints the hash and the number of files that share this hash.
- Purpose: Makes the script flexible and reusable.
- Function: Takes the directory path as a command-line argument, allowing it to scan any directory without changing the code.
- Purpose: Provides feedback on the scanning progress.
- Function: Tracks and prints the progress of file scanning using
total_files
andprocessed_files
.
- Purpose: Saves detailed scan results for later reference.
- Function:
- Stores file paths for each hash in
file_hash_details
. - Writes detailed results to a
.txt
file with a timestamp. - Informs the user when the results are saved.
- Stores file paths for each hash in
After running the script, a file named duplicate_files_report_<timestamp>.txt
will be created in the current directory, containing:
- A list of file hashes and their counts.
- The file paths of the duplicated files.