A command line utility that recursively scans the set directory to find exact duplicate files inside all the sub-directories. The files so found can be listed in an output file. If required the duplicates can also be removed, thereby preserving a single unique file.
Features:
- Recursive scan of the set directory.
- Generate list of duplicate files.
- Scan all files or filter based on file extension.
Future Plans:
- Set a recursive scan depth for the set directory.
- A way to exclude certain directories.
- A way to include only some directories.
- Robust error handling for synchronization issues.
- Make the program interactive.
Caution: As of now, there is no way to select which of the duplicate files will be preserved. The selection happens on the order in which they are loaded into
std::map
. The first file is the one which is preserved.
-
For main program:
sudo apt-get install libssl-dev libboost-filesystem-dev libboost-system-dev
-
For tests, apart from the dependencies for main program:
sudo apt-get install libcppunit-dev
-
Clone the project:
git clone https://github.com/vishal-wadhwa/Duplicate-File-Remover.git
-
Change directory to src:
cd Duplicate-File-Remover/src
-
Build project using Make utility (assuming you've downloaded the dependencies):
make main
-
Run it (See Usage):
./main ...
-
From the root directory of the project go to tests directory:
cd Duplicate-File-Remover/tests
-
Build tests using Make utility (assuming you've downloaded the dependencies):
make test
-
Run them tests, bruh:
./test
You should see OK if all the tests pass and then you can go on to using the program. ;)
- Use
-d
switch to set the directory to be scanned. - Use
-e
switch to provide a list of extensions to filter the files scanned. - Use
-o
switch to generate an output(log) file. If this switch is not followed by a name/path, then a default file dupl_file.txt is generated in the current directory. - Use
-r
switch to remove the duplicates and keep only one copy. - Use
-h
switch to display this help:
Usage: ./main -d [DIRECTORY]
or: ./main -d [DIRECTORY] -e [EXTENSIONS]...
or: ./main -d [DIRECTORY] -o [OUTFILE]
Scan the provided directory and its sub-directories recursively and find duplicates.
Not using either of -o or -r switch is pointless as no action is performed.
-d switch is necessary to set the search directory.
Other switches:
-d provided argument is the directory to be scanned.
-e following arguments treated as extensions.
-o generate file list (default file: "dupl_file.txt").
-h prints this help.
-r remove the duplicates so found.
Note: Use
sudo
if required.
./main -d ./ -o -r
./main -e png jpg jpeg -d ./../ -o log.out
./main -d ./ -r