-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathcodechallenge_115.py
264 lines (203 loc) · 7.46 KB
/
codechallenge_115.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
'''
Date: 05/04/2022
Problem description:
=====================
This problem was asked by Amazon.
Huffman coding is a method of encoding characters based on their frequency.
Each letter is assigned a variable-length binary string, such as 0101 or 111110,
where shorter lengths correspond to more common letters. To accomplish this, a
binary tree is built such that the path from the root to any leaf uniquely maps
to a character. When traversing the path, descending to a left child corresponds
to a 0 in the prefix, while descending right corresponds to 1.
Here is an example tree (note that only the leaf nodes have letters):
*
/ \
* *
/ \ / \
* a t *
/ \
c s
With this encoding, cats would be represented as 0000110111.
i.e. Traversing starting from the root node
c = left.left.left --> 000
a = left.left.right --> 01
t = right.left --> 10
s = right.right.right --> 111
frequencies = {'c':'000','a':'01','t':'10','s':'111'}
Given a dictionary of character frequencies, build a Huffman tree, and use it to
determine a mapping between characters and their encoded binary strings.
Algorithm:
===========
There are mainly two major parts in Huffman Coding
1. Build a Huffman Tree from input characters.
2. Traverse the Huffman Tree and assign codes to characters.
Steps to build Huffman Tree:
=============================
Input is an array of unique characters along with their frequency of occurrences
and output is Huffman Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf
nodes (Min Heap is used as a priority queue. The value of frequency field is used
to compare two nodes in min heap. Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes
frequencies. Make the first extracted node as its left child and the other extracted
node as its right child. Add this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is
the root node and the tree is complete.
Notes:
======
Applications of Huffman Coding:
- They are used for transmitting fax and text.
- They are used by conventional compression formats like PKZIP, GZIP, etc.
- Multimedia codecs like JPEG, PNG, and MP3 use Huffman encoding(to be more precise the prefix codes).
Note to self: Works-in-progress to find an optimal solution.
'''
from binarytree import Node
# A Huffman Tree Node
class node:
def __init__(self, freq, symbol='', left=None, right=None):
# frequency of symbol
self.freq = freq
# symbol name (character)
self.symbol = symbol
# node left of current node
self.left = left
# node right of current node
self.right = right
# tree direction (0/1)
self.huff = ''
#
# build Huffman tree using binarytree library
#
def build_huffman_code(Hdict):
root = Node('*')
next = root
#frequencies = {'c':'000','a':'01','t':'10','s':'111'}
for value in Hdict.values():
for x in value:
if x == '0':
next.left = Node('*')
elif x == '1':
next.right = Node('*')
def insert_huffman_tree(root, key, value):
list(key)
if root is None:
root = Node(value)
else:
if key == '0':
root.left = Node(value)
# utility function to print huffman
# codes for all symbols in the newly
# created Huffman tree
def printNodes(node, val=''):
# huffman code for current node
newVal = val + str(node.huff)
# if node is not an edge node
# then traverse inside it
if(node.left):
printNodes(node.left, newVal)
if(node.right):
printNodes(node.right, newVal)
# if node is edge node then
# display its huffman code
if(not node.left and not node.right):
print(f"{node.symbol} -> {newVal}")
def build_huffman_tree(Hdict):
hNodes = []
for key, value in Hdict.items():
for x in list(value):
if x == '0':
hNodes.append(node(value, key, None, x))
elif x == '1':
hNodes.append(node(1, '*', None, node.right))
while len(hNodes) > 1:
# sort all the nodes in ascending order
# based on their frequency
hNodes = sorted(hNodes, key=lambda x: x.freq)
# pick 2 smallest nodes
left = hNodes[0]
right = hNodes[1]
# assign directional value to these nodes
left.huff = 0
right.huff = 1
# combine the 2 smallest nodes to create
# new node as their parent
newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
print(newNode.freq)
# remove the 2 nodes and add their
# parent as new node among others
hNodes.remove(left)
hNodes.remove(right)
hNodes.append(newNode)
# Huffman Tree is ready!
printNodes(hNodes[0])
def insert(root, val, key):
if root is None:
return Node(key)
else:
if root.val == '1':
root.right = insert(root.right, key)
elif root.val == '0':
root.left = insert(root.left, key)
return root
#
# build Huffman tree using binarytree library
#
def test_cats():
# chars = ['c','a','t','s']
# freq = ['000','01','10','111']
# frequencies = {'c':'000','a':'01','t':'10','s':'111'}
# val = '*'
# root = Node(val)
# for key, value in frequencies.items():
# [insert(root, x, val) for x in value, if x == '0': val = '*', else x == '1': val = '*']
# # replase last insertion with key
# insert(root, x, key)
root = Node('*')
root.left = Node('*')
root.left.right = Node('a')
root.left.left = Node('*')
root.left.left.left = Node('c')
root.right = Node('*')
root.right.left = Node('t')
root.right.right = Node('*')
root.right.right.right = Node('s')
print(root)
def main():
# characters for huffman tree
chars = ['a', 'b', 'c', 'd', 'e', 'f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
# frequency of characters
freq = [5,9,12,13,16,45,48,51,55,62,66,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190]
# list containing unused nodes
nodes = []
# converting characters and frequencies
# into huffman tree nodes
for x in range(len(chars)):
nodes.append(node(freq[x], chars[x]))
while len(nodes) > 1:
# sort all the nodes in ascending order
# based on their frequency
nodes = sorted(nodes, key=lambda x: x.freq)
# pick 2 smallest nodes
left = nodes[0]
right = nodes[1]
# assign directional value to these nodes
left.huff = 0
right.huff = 1
# combine the 2 smallest nodes to create
# new node as their parent
newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
# remove the 2 nodes and add their
# parent as new node among others
nodes.remove(left)
nodes.remove(right)
nodes.append(newNode)
# Huffman Tree is ready!
printNodes(nodes[0])
print()
if __name__ == '__main__':
print('Huffman Tree by (geeksforgeeks.org):\n')
# main()
test_cats()
# frequencies = {'c':'000','a':'01','t':'10','s':'111'}
# build_huffman_tree(frequencies)