Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling store_vector with MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't. #93

Open
Apkar029 opened this issue Feb 14, 2021 · 0 comments

Comments

@Apkar029
Copy link

Apkar029 commented Feb 14, 2021

I have input samples as a sparse matrix of shape (531990 samples, 85765 features).

The size of this matrix in memory is 56KB. The matrix as a numpy array is approximately 340GB.

When i use the MemoryStorage option i run out of memory. This is due to the vec = vec.tocsr() in
unitvec function. The input vectors added by store_vector are scipy.sparse.csr.csr_matrix of shape (85765, 1) as trying to store vectors as scipy.sparse.csr.csr_matrix of shape (1, 85765) gives:

File "nearpy/engine.py", line 96, in store_vector
  for bucket_key in lshash.hash_vector(v):
File "nearpy/hashes/randombinaryprojections.py", line 74, in hash_vector
  projection = self.normals_csr.dot(v)
File "scipy/sparse/base.py", line 359, in dot
  return self * other
File "scipy/sparse/base.py", line 479, in __mul__ raise ValueError('dimension mismatch')
ValueError: dimension mismatch

Removing the vec = vec.tocsr() line solves the problem for matrices of shape (85765, 1) and no extra memory is allocated. This is strange behavior and it might be a scipy bug, but what is the point of the .tocsr() conversion anyway?

@Apkar029 Apkar029 changed the title Calling store_vector on scipy.sparse.csr_matrix allocates memory when it shouldn't. Calling store_vector withe MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't. Feb 14, 2021
@Apkar029 Apkar029 changed the title Calling store_vector withe MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't. Calling store_vector with MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't. Feb 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant