DUPLICATE FILE REMOVER

GitHub | Docs

A command line utility that recursively scans the set directory to find exact duplicate files inside all the sub-directories. The files so found can be listed in an output file. If required the duplicates can also be removed, thereby preserving a single unique file.

Features:

Recursive scan of the set directory.
Generate list of duplicate files.
Scan all files or filter based on file extension.

Future Plans:

Set a recursive scan depth for the set directory.
A way to exclude certain directories.
A way to include only some directories.
Robust error handling for synchronization issues.
Make the program interactive.

Caution: As of now, there is no way to select which of the duplicate files will be preserved. The selection happens on the order in which they are loaded into std::map. The first file is the one which is preserved.

Dependencies:

For main program:

sudo apt-get install libssl-dev libboost-filesystem-dev libboost-system-dev
For tests, apart from the dependencies for main program:

sudo apt-get install libcppunit-dev

Downloading and Building

Clone the project:

git clone https://github.com/vishal-wadhwa/Duplicate-File-Remover.git
Change directory to src:

cd Duplicate-File-Remover/src
Build project using Make utility (assuming you've downloaded the dependencies):

make main
Run it (See Usage):

./main ...

Testing

From the root directory of the project go to tests directory:

cd Duplicate-File-Remover/tests
Build tests using Make utility (assuming you've downloaded the dependencies):

make test
Run them tests, bruh:

./test

You should see OK if all the tests pass and then you can go on to using the program. ;)

Usage

Use -d switch to set the directory to be scanned.
Use -e switch to provide a list of extensions to filter the files scanned.
Use -o switch to generate an output(log) file. If this switch is not followed by a name/path, then a default file dupl_file.txt is generated in the current directory.
Use -r switch to remove the duplicates and keep only one copy.
Use -h switch to display this help:

Usage: ./main -d [DIRECTORY]
or: ./main -d [DIRECTORY] -e [EXTENSIONS]...
or: ./main -d [DIRECTORY] -o [OUTFILE]

Scan the provided directory and its sub-directories recursively and find duplicates.

Not using either of -o or -r switch is pointless as no action is performed.

-d switch is necessary to set the search directory.

Other switches:
    -d		provided argument is the directory to be scanned.
    -e		following arguments treated as extensions.
    -o		generate file list (default file: "dupl_file.txt").
    -h		prints this help.
    -r		remove the duplicates so found.

Note: Use sudo if required.

Examples

./main -d ./ -o -r
./main -e png jpg jpeg -d ./../ -o log.out
./main -d ./ -r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DUPLICATE FILE REMOVER

Dependencies:

Downloading and Building

Testing

Usage

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

DUPLICATE FILE REMOVER

Dependencies:

Downloading and Building

Testing

Usage

Examples