File Compression Using Huffman's Algorithm

About

Huffman Algorithm is an efficient way for file Compression and Decompression. This program exactly follows huffman algorithm. It reads frequent characters from input file and replace it with shorter binary codeword. The original file can be produced again without loosing any bit.

Usage

Compression:

	./encode <file to compress>

Output file named .spd will be produced. Decompression:

	./decode <file to uncompress>

File Structure

N= total number of unique characters(1 byte)
Character[1 byte]	Binary codeword String Form[MAX bytes]
Character[1 byte]	Binary codeword String Form[MAX bytes]
N times
p (1 byte)	p times 0's (p bits)
DATA

p = Padding done to ensure file fits in whole number of bytes. eg, file of 4 bytes + 3 bits must ne padded by 5 bits to make it 5 bytes.

Example

Text: aabcbaab

Content	Comment
3	N=3 (a,b,c)
a "1"	character and corresponding code "1"
b "01"	character and corresponding code "01"
c "00"	character and corresponding code "00"
4	Padding count
[0000]	Padding 4 zeroes
[1] [1] [01] [00] [01] [1] [1] [01]	Actual data, code in place of char

Algorithm

(Pass 1) Read input file

Create sorted linked list of characters from file, as per character frequency

for eah character ch from file

 if( ch available in linked list at node p) then 
 {
 	p.freq++;
 	sort Linked list as per node's freq;
 }
 else
 	add new node at beginning of linked list with frequency=1;

Construct huffman tree from linked list 0. Create new node q, join two least freq nodes to its left and right 0. Insert created node q into ascending list 0. Repeat i & ii till only one nodes remains, i.e, ROOT of h-tree 0. Traverse tree in preorder mark each node with its codeword. simultaneously Recreate linked list of leaf nodes.
Write Mapping Table(character to codeword) to output file.
(Pass 2) Read input file.
Write codeword in place of each character in input file to output file for each character ch from input file write corresponding codeword into o/p file (lookup in mapping table OR linked list)
End

Contributing

Please feel free to submit issues and pull requests. I appreciate bug reports. Testing on different platforms is especially appreciated. I only tested on Linux.

License

MIT

Development

To do:

Binary files, like jpeg,mp3 support
Run scan to group repeating bit patterns, not bit.
Unicode support
Move entire codebase to python, use neural network to compress files.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Compression Using Huffman's Algorithm

About

Usage

File Structure

Example

Algorithm

Contributing

License

Development

About

Releases

Packages

Languages

License

sspeedy99/File-Compression

Folders and files

Latest commit

History

Repository files navigation

File Compression Using Huffman's Algorithm

About

Usage

File Structure

Example

Algorithm

Contributing

License

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages