You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'v been doing some testing with nearpy, and I think it would be a good idea to replace pickle storage with python table for hashes matrixes to be able to work with data in large dimendions.
Real life problems (CF recommendation etc..) have billions of dimensions, and more dimensions we have, the more we need hash methods as well.
Pickle starts to struggle when it comes to store those large matrixes (while numpy can handle without problems matrixes with billion of rows/thousands of columns, if you do have memory for it).
Pickles also has performance memory issues vs table http://www.shocksolution.com/2010/01/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/
Hi,
I'v been doing some testing with nearpy, and I think it would be a good idea to replace pickle storage with python table for hashes matrixes to be able to work with data in large dimendions.
Real life problems (CF recommendation etc..) have billions of dimensions, and more dimensions we have, the more we need hash methods as well.
Pickle starts to struggle when it comes to store those large matrixes (while numpy can handle without problems matrixes with billion of rows/thousands of columns, if you do have memory for it).
Pickles also has performance memory issues vs table
http://www.shocksolution.com/2010/01/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/
I guess more scalable solutions are probably hadoop based
https://github.com/takahi-i/likelike
https://github.com/mrsqueeze/spark-hash
(still a very handy lib though!)
Thanks!
The text was updated successfully, but these errors were encountered: