Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the Kotlin implementation to double performance #254

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cbeyls
Copy link

@cbeyls cbeyls commented Jul 8, 2024

BlurHashDecoder optimizations

  • Precompute cosines tables before composing the image so each cosine value is only computed once.
  • Cosines are stored as Float instead of stored as Double and transformed to Float after multiplication.
  • Compute cosines tables once if both are identical (for square images with the same number of colors in both dimensions).
  • Store colors in a one-dimension array instead of a two-dimension array to reduce memory allocations.
  • Use a simple String.indexOf() to find the index of a Base83 char, which is both faster and needs less memory than a HashMap thanks to better locality and no boxing of chars.
  • Replace value.pow(2f) with faster value * value
  • No cache is used, so computations may be performed in parallel on background threads without the need for synchronization which limits throughput.

Problems with the old cache

The old implementation used a cache that had 3 issues that evaded the tests:

  • Access to the cache was not synchronized, thus two concurrent calls on different threads could result in cache corruption or reading incorrect values from a half-filled array. Since Blurhash is an expensive computation, the main use case is to call it from a pooled background thread.
  • The cache key was the image dimension (width or height) multiplied by the number of colors in that dimension. This means the cache will hit the same result for an image width of 32 pixels with 8 colors (256) or 64 pixels with 4 colors (also 256) but this is incorrect because even if the array size for these configurations is the same and contains the same values, these values appear in a different order. The cache will thus return incorrect values for one of the two configurations. The correct cache key to avoid collisions would have been something like (width * 10) + [number of colors], where 10 is the max colors number.
  • Even when the cache was disabled (useCache is false), the code still allocated memory but didn't write to it. This had two bad effects:
    1. Memory was wasted.
    2. Worse: when the blurhash method was called two times in a row for a compatible configuration, first with the cache disabled then with the cache enabled, the second call will hit the cache but read empty values, resulting in an incorrect computation.

Fixing and keeping the cosines cache would not improve the single-threading performance much, but would make the throughput lower for multi-threading performance because of synchronization. Thus it makes more sense to just remove the cache completely, now that the algorithm is faster without cache than it was before with the cache.

Benchmarks

Simple: 4x4 colors, 32x32 pixels output.
Complex: 9x9 colors, 256x256 pixels output.

Pixel 7 (Android 14)

      517 662   ns          27 allocs    Trace    BlurHashDecoderBenchmark.originalSimpleNoCache
      225 079   ns          27 allocs    Trace    BlurHashDecoderBenchmark.originalSimpleWithCache
      108 153   ns           8 allocs    Trace    BlurHashDecoderBenchmark.newSimple

  162 140 992   ns          92 allocs    Trace    BlurHashDecoderBenchmark.originalComplexNoCache
   46 720 337   ns          92 allocs    Trace    BlurHashDecoderBenchmark.originalComplexWithCache
   13 124 883   ns           8 allocs    Trace    BlurHashDecoderBenchmark.newComplex

Nexus 5 (Android 6)

    6 389 579   ns          26 allocs    Trace    BlurHashDecoderBenchmark.originalSimpleNoCache
    2 358 534   ns          26 allocs    Trace    BlurHashDecoderBenchmark.originalSimpleWithCache
    1 546 684   ns           7 allocs    Trace    BlurHashDecoderBenchmark.newSimple

2 258 036 875   ns          91 allocs    Trace    BlurHashDecoderBenchmark.originalComplexNoCache
  404 196 250   ns          91 allocs    Trace    BlurHashDecoderBenchmark.originalComplexWithCache
  185 823 959   ns           7 allocs    Trace    BlurHashDecoderBenchmark.newComplex

The source code of the benchmark can be found here.

Other code changes

  • Replace custom function timed() with standard measureTime() in demo app
  • Replace junit.framework.Assert.assertTrue with org.junit.Assert.assertArrayEquals.

Update configuration to modern standards

(So the project can build on modern versions of Android Studio)

  • Upgrade dependencies: Kotlin 2.0.0, AppCompat 1.7.0, Androidx Test Runner 1.6.1, JUnit 4.13.2, Android Gradle Plugin 8.5.0, Gradle 8.7.0
  • Bump minSdkVersion to 21 and targetSdkVersion to 34
  • Replace abandoned jcenter repository with mavenCentral
  • Add namespace to replace package in AndroidManifest.xml
  • Target Java 11 bytecode
  • Remove obsolete Kotlin Android Extensions plugin
  • Disable Jetifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant