This program compresses text files, it works the best for files larger than 5kb as it writes instructions of size 1kb for encoding/decoding a file. Size reduction of a file is about 25% - 35%.
The process begins with the leaf nodes containing the probabilities of the symbol they represent. Then, the process takes the two nodes with smallest probability, and creates a new internal node having these two nodes as children. The weight of the new node is set to the sum of the weight of the children. We then apply the process again, on the new internal node and on the remaining nodes (i.e., we exclude the two leaf nodes), we repeat this process until only one node remains, which is the root of the Huffman tree.
The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority:
- Create a leaf node for each symbol and add it to the priority queue.
- While there is more than one node in the queue:
- Remove the two nodes of highest priority (lowest probability) from the queue
- Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
- Add the new node to the queue.
- The remaining node is the root node and the tree is complete.
make
while in program's directory.
Only ASCII support, no UTF-8
-c
: Program will compress file given as a second argument. Compressed file's name is going to be a third argument, example:
./huffman -c test.txt test
will compress file test.txt
to output file test
.
-d
: Program will decompress file given as a second argument. Decompressed file's name is going to be a third argument, example:
./huffman -d test test.txt
will decompress file test
to output file test.txt
.