Collating results is slow for large datasets (>1500 genomes) #14

widdowquinn · 2015-11-09T10:55:12Z

Currently, the code writes out all results individually and leaves processing output for calculation of ANI etc. until the end. This leaves an uninformative, and long, lag time before the results are presented to the user.

It may be possible to collate/summarise intermediate results in file, as we go. The total analysis time will be no shorter, but it might avoid that 'dead time' after the alignments are done.

widdowquinn · 2018-11-19T19:00:50Z

This could be implemented as cached matrix and/or dataframe results in the pyani database, with one table/matrix type for each run. Then, when pulling down the complete dataset for a run, we need only make one SQL request, rather than one for each result.

widdowquinn added the enhancement something we'd like pyani to do that it doesn't already label Nov 9, 2015

widdowquinn self-assigned this Nov 9, 2015

widdowquinn mentioned this issue Aug 30, 2018

Add cache for matrices output #106

Closed

widdowquinn added this to the 0.3.1 milestone May 29, 2020

widdowquinn added the performance the issue relates to making pyani more efficient label May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collating results is slow for large datasets (>1500 genomes) #14

Collating results is slow for large datasets (>1500 genomes) #14

widdowquinn commented Nov 9, 2015

widdowquinn commented Nov 19, 2018

Collating results is slow for large datasets (>1500 genomes) #14

Collating results is slow for large datasets (>1500 genomes) #14

Comments

widdowquinn commented Nov 9, 2015

widdowquinn commented Nov 19, 2018