Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kernel] Rewrite printk for speed, add __divmod fast 32/16-bit divide #2008

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

ghaerr
Copy link
Owner

@ghaerr ghaerr commented Sep 13, 2024

Rewrites kernel printk for speed as originally discussed in Mellvik/TLVC#71 (comment).

This is the first in a series of enhancements aimed at vastly reducing the amount of CPU spent calculating 32-bit by 32-bit divides and separate calls to 32-bit modulo, only for the purpose of converting unsigned longs to ASCII strings for output.

Both the kernel and C library have code that causes the compiler to call the sometimes very-long-to-execute __udivsi3 and __umodsi3 (or even longer __divsi3 and __modsi3 routines if not unsigned) routines for number to ascii string conversion and display. The kernel printk has been the worst offender, calling the two routines ten full times each for every decimal or hex number output, even when the value being converted fits in 16 bits (or zero!!!). This was due to the way algorithm printk used, which tried to divide by the highest powers of 10 first, rather than just dividing by 10 and reversing the order, which then allows for a much quicker new routine __divmod to be used.

The fast __divmod routine will execute a single DIV instruction if the quotient can be calculated from 32 bits without overflow, otherwise two DIVs will be executed. The routine will never use a very slow software bit-shifting technique used by __udivsi3 for certain divisors > 16 bits, so the divisor is limited to 16 bits, which works for all printf/printk numeric-to-string conversions. In addition, both the quotient and modulo (remainder) are returned in single call from the DIV instruction(s), thereby doubling the conversion speed.

After this speed enhancement, the net result is that the kernel printk.o size went down from 1728 to 1487 bytes, 241 bytes saved and very much faster! While most users won't likely see the difference, these changes will have many effects for 8086/8088 systems, as well as printk debugging during packet traces throwing timing way off.

@ghaerr ghaerr merged commit ac0b9f6 into master Sep 13, 2024
2 checks passed
@ghaerr ghaerr deleted the divmod branch September 13, 2024 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant