[kernel] Rewrite printk for speed, add __divmod fast 32/16-bit divide #2008
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rewrites kernel printk for speed as originally discussed in Mellvik/TLVC#71 (comment).
This is the first in a series of enhancements aimed at vastly reducing the amount of CPU spent calculating 32-bit by 32-bit divides and separate calls to 32-bit modulo, only for the purpose of converting unsigned longs to ASCII strings for output.
Both the kernel and C library have code that causes the compiler to call the sometimes very-long-to-execute __udivsi3 and __umodsi3 (or even longer __divsi3 and __modsi3 routines if not unsigned) routines for number to ascii string conversion and display. The kernel printk has been the worst offender, calling the two routines ten full times each for every decimal or hex number output, even when the value being converted fits in 16 bits (or zero!!!). This was due to the way algorithm printk used, which tried to divide by the highest powers of 10 first, rather than just dividing by 10 and reversing the order, which then allows for a much quicker new routine
__divmod
to be used.The fast
__divmod
routine will execute a single DIV instruction if the quotient can be calculated from 32 bits without overflow, otherwise two DIVs will be executed. The routine will never use a very slow software bit-shifting technique used by __udivsi3 for certain divisors > 16 bits, so the divisor is limited to 16 bits, which works for all printf/printk numeric-to-string conversions. In addition, both the quotient and modulo (remainder) are returned in single call from the DIV instruction(s), thereby doubling the conversion speed.After this speed enhancement, the net result is that the kernel printk.o size went down from 1728 to 1487 bytes, 241 bytes saved and very much faster! While most users won't likely see the difference, these changes will have many effects for 8086/8088 systems, as well as printk debugging during packet traces throwing timing way off.