Speed up kputll. #1805

The kputuw function is considerably faster as it encodes 2 digits at a time and also utilises __builtin_clz. This changes kputll to use the same 2 digits at a time trick. I have a __builtin_clzll variant too, but with longer numbers it's not the main bottleneck and we fall back to kputuw for small numbers. This avoids complicating the code with builtin checks and alternate versions. An alternative, purely for sam_format1_append would be something like: static inline int kputll_fast(long long c, kstring_t *s) { return c <= INT_MAX && c >= INT_MIN ? kputw(c, s) : kputll(c, s); } #define kputll kputll_fast This works as BAM/CRAM only support 32-bit numbers for POS, PNEXT and TLEN anyway, so ll vs w is an irrelevant distinction. However I chose to modify the header file so it fixes other callers. Overall compressed BAM to uncompressed SAM conversion is about 5% quicker (tested on 10 million short-read seqs; it'll be minimal on long seqs). This includes decode time and other functions too. The sam_format1_append only component of that is about 15-25% quicker depending on compiler and version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up kputll. #1805

Speed up kputll. #1805

Commits on Jul 11, 2024