Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use python table storage instead of pickle for hashes #41

Open
lboudard opened this issue Nov 27, 2015 · 1 comment
Open

Use python table storage instead of pickle for hashes #41

lboudard opened this issue Nov 27, 2015 · 1 comment

Comments

@lboudard
Copy link

Hi,

I'v been doing some testing with nearpy, and I think it would be a good idea to replace pickle storage with python table for hashes matrixes to be able to work with data in large dimendions.
Real life problems (CF recommendation etc..) have billions of dimensions, and more dimensions we have, the more we need hash methods as well.
Pickle starts to struggle when it comes to store those large matrixes (while numpy can handle without problems matrixes with billion of rows/thousands of columns, if you do have memory for it).
Pickles also has performance memory issues vs table
http://www.shocksolution.com/2010/01/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/

I guess more scalable solutions are probably hadoop based
https://github.com/takahi-i/likelike
https://github.com/mrsqueeze/spark-hash

(still a very handy lib though!)

Thanks!

@pixelogik
Copy link
Owner

Sounds cool, but I do not have time for this currently. If you want to participate, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants