Use pip install hyperloglog
to install from PyPI.
import hyperloglog hll = hyperloglog.HyperLogLog(0.01) # accept 1% counting error hll.add("hello") print len(hll) # 1 hll.add("hello") print len(hll) # 1 as items aren't added more than once hll.add("hello again") print len(hll) # 2
If we add a further 1000 random strings (giving a total of 1002 strings) we'll have a count roughly within 1% of the true value, in this case it counts 1007 (within +/- 10.2 of the true value)
# add 1000 random 30 char strings to hll import random import string [hll.add("".join([string.ascii_letters[random.randint(0, len(string.ascii_letters)-1)] for n in range(30)])) for m in range(1000)] print len(hll) # 1007
- Added Sliding window HLL version
- Added bias correction from HLL++