Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up kputll. #1805

Merged
merged 1 commit into from
Jul 16, 2024
Merged

Speed up kputll. #1805

merged 1 commit into from
Jul 16, 2024

Conversation

jkbonfield
Copy link
Contributor

The kputuw function is considerably faster as it encodes 2 digits at a time and also utilises __builtin_clz. This changes kputll to use the same 2 digits at a time trick. I have a __builtin_clzll variant too, but with longer numbers it's not the main bottleneck and we fall back to kputuw for small numbers. This avoids complicating the code with builtin checks and alternate versions.

An alternative, purely for sam_format1_append would be something like:

static inline int kputll_fast(long long c, kstring_t *s) {
    return c <= INT_MAX && c >= INT_MIN ? kputw(c, s) : kputll(c, s);
}
#define kputll kputll_fast

This works as BAM/CRAM only support 32-bit numbers for POS, PNEXT and TLEN anyway, so ll vs w is an irrelevant distinction. However I chose to modify the header file so it fixes other callers.

Overall compressed BAM to uncompressed SAM conversion is about 5% quicker (tested on 10 million short-read seqs; it'll be minimal on long seqs). This includes decode time and other functions too. The sam_format1_append only component of that is about 15-25% quicker depending on compiler and version.

The kputuw function is considerably faster as it encodes 2 digits at a
time and also utilises __builtin_clz.  This changes kputll to use the
same 2 digits at a time trick.  I have a __builtin_clzll variant too,
but with longer numbers it's not the main bottleneck and we fall back
to kputuw for small numbers.  This avoids complicating the code with
builtin checks and alternate versions.

An alternative, purely for sam_format1_append would be something like:

    static inline int kputll_fast(long long c, kstring_t *s) {
        return c <= INT_MAX && c >= INT_MIN ? kputw(c, s) : kputll(c, s);
    }
    #define kputll kputll_fast

This works as BAM/CRAM only support 32-bit numbers for POS, PNEXT and
TLEN anyway, so ll vs w is an irrelevant distinction.  However I chose
to modify the header file so it fixes other callers.

Overall compressed BAM to uncompressed SAM conversion is about 5%
quicker (tested on 10 million short-read seqs; it'll be minimal on
long seqs).  This includes decode time and other functions too.  The
sam_format1_append only component of that is about 15-25% quicker
depending on compiler and version.
@whitwham whitwham merged commit 19a27e9 into samtools:develop Jul 16, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants