Updated consume() in CrDirReader [Polars -> Pandas] #134
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Current
consume()
function usespolars
when readingmatrix.mtx
.This creates an issue for a very large file (cells > 1.5 M) is compressed.
Replaces the
IO
operations withpandas
for better support.The notebook
consume.ipynb
here has shows the memory footprint of the_get_valid_barcodes()
andconsume()
functions for the updatedCrDirReader
class on a783K
cell dataset. The memory footprint is significantly lower and bounded forpandas
compared topolars
.