You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have input samples as a sparse matrix of shape (531990 samples, 85765 features).
The size of this matrix in memory is 56KB. The matrix as a numpy array is approximately 340GB.
When i use the MemoryStorage option i run out of memory. This is due to the vec = vec.tocsr() in unitvec function. The input vectors added by store_vector are scipy.sparse.csr.csr_matrix of shape (85765, 1) as trying to store vectors as scipy.sparse.csr.csr_matrix of shape (1, 85765) gives:
File "nearpy/engine.py", line 96, in store_vector
for bucket_key in lshash.hash_vector(v):
File "nearpy/hashes/randombinaryprojections.py", line 74, in hash_vector
projection = self.normals_csr.dot(v)
File "scipy/sparse/base.py", line 359, in dot
return self * other
File "scipy/sparse/base.py", line 479, in __mul__ raise ValueError('dimension mismatch')
ValueError: dimension mismatch
Removing the vec = vec.tocsr() line solves the problem for matrices of shape (85765, 1) and no extra memory is allocated. This is strange behavior and it might be a scipy bug, but what is the point of the .tocsr() conversion anyway?
The text was updated successfully, but these errors were encountered:
Apkar029
changed the title
Calling store_vector on scipy.sparse.csr_matrix allocates memory when it shouldn't.
Calling store_vector withe MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't.
Feb 14, 2021
Apkar029
changed the title
Calling store_vector withe MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't.
Calling store_vector with MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't.
Feb 14, 2021
I have input samples as a sparse matrix of shape (531990 samples, 85765 features).
The size of this matrix in memory is 56KB. The matrix as a numpy array is approximately 340GB.
When i use the MemoryStorage option i run out of memory. This is due to the vec = vec.tocsr() in
unitvec function. The input vectors added by store_vector are scipy.sparse.csr.csr_matrix of shape (85765, 1) as trying to store vectors as scipy.sparse.csr.csr_matrix of shape (1, 85765) gives:
Removing the vec = vec.tocsr() line solves the problem for matrices of shape (85765, 1) and no extra memory is allocated. This is strange behavior and it might be a scipy bug, but what is the point of the .tocsr() conversion anyway?
The text was updated successfully, but these errors were encountered: