Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: optimize lazy val with power of two. #22428

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

He-Pin
Copy link

@He-Pin He-Pin commented Jan 21, 2025

Motivation:
Use & instead of %, which is faster.

@@ -27,17 +27,18 @@ object LazyVals {

private val base: Int = {
val processors = java.lang.Runtime.getRuntime.nn.availableProcessors()
8 * processors * processors
val rawSize = 8 * processors * processors
Copy link
Author

@He-Pin He-Pin Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why it's 8 here, Some CPU has 384 cores now, maybe 2 * 2 is better than 8.

The current implementation may use a little more memory than it was, but as many server just has 64 cores, which is the same.


if (id < 0) id += base
monitors(id)
monitors((java.lang.System.identityHashCode(obj) + fieldId) & mask)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use & instead of %

@He-Pin
Copy link
Author

He-Pin commented Jan 22, 2025

image

jmh-result.json

with https://jmh.morethan.io

[info] Benchmark                          (processors)   Mode  Cnt           Score          Error  Units
[info] ModuloBenchmark.newPowerOf2Method            16  thrpt   10  1169017875.314 ± 25717906.652  ops/s
[info] ModuloBenchmark.newPowerOf2Method            32  thrpt   10  1169948109.834 ± 13735667.926  ops/s
[info] ModuloBenchmark.newPowerOf2Method            64  thrpt   10  1169506488.098 ± 14091486.582  ops/s
[info] ModuloBenchmark.newPowerOf2Method            96  thrpt   10  1171280212.547 ±  7645883.638  ops/s
[info] ModuloBenchmark.newPowerOf2Method           128  thrpt   10  1169116163.009 ± 15487127.019  ops/s
[info] ModuloBenchmark.oldModuloMethod              16  thrpt   10  1020610301.567 ± 14270157.632  ops/s
[info] ModuloBenchmark.oldModuloMethod              32  thrpt   10  1016082948.598 ± 11667961.439  ops/s
[info] ModuloBenchmark.oldModuloMethod              64  thrpt   10  1024484355.834 ± 11580908.271  ops/s
[info] ModuloBenchmark.oldModuloMethod              96  thrpt   10  1016482302.606 ± 12229045.684  ops/s
[info] ModuloBenchmark.oldModuloMethod             128  thrpt   10  1016325470.534 ± 10125272.051  ops/s
[info] ModuloBenchmark.newPowerOf2Method            16   avgt   10           0.856 ±        0.016  ns/op
[info] ModuloBenchmark.newPowerOf2Method            32   avgt   10           0.844 ±        0.011  ns/op
[info] ModuloBenchmark.newPowerOf2Method            64   avgt   10           0.856 ±        0.011  ns/op
[info] ModuloBenchmark.newPowerOf2Method            96   avgt   10           0.846 ±        0.005  ns/op
[info] ModuloBenchmark.newPowerOf2Method           128   avgt   10           0.855 ±        0.012  ns/op
[info] ModuloBenchmark.oldModuloMethod              16   avgt   10           0.984 ±        0.016  ns/op
[info] ModuloBenchmark.oldModuloMethod              32   avgt   10           0.982 ±        0.009  ns/op
[info] ModuloBenchmark.oldModuloMethod              64   avgt   10           0.984 ±        0.011  ns/op
[info] ModuloBenchmark.oldModuloMethod              96   avgt   10           0.977 ±        0.009  ns/op
[info] ModuloBenchmark.oldModuloMethod             128   avgt   10           0.982 ±        0.010  ns/op

with

package benchmark

import org.openjdk.jmh.annotations.*
import java.util.concurrent.TimeUnit
import scala.util.Random

@State(Scope.Thread)
@BenchmarkMode(Array(Mode.Throughput))  // 吞吐量模式
@OutputTimeUnit(TimeUnit.SECONDS)        // 每秒
@Warmup(iterations = 10, time = 1)
@Measurement(iterations = 10, time = 1)
@Fork(1)
class ModuloBenchmark:
  @Param(Array("16", "32", "64", "96", "128"))
  var processors: Int = _

  private var hashCodes: Array[Int] = _
  private var fieldIds: Array[Int] = _
  private var oldBase: Int = _
  private var newBase: Int = _
  private var mask: Int = _

  private val testSize = 1_000_000 // 100万个样本

  @Setup(Level.Trial)
  def setup(): Unit =
    oldBase = 8 * processors * processors

    val rawSize = 8 * processors * processors
    newBase = 1 << (32 - Integer.numberOfLeadingZeros(rawSize - 1))
    mask = newBase - 1

    val rng = new Random(42)

    hashCodes = new Array[Int](testSize)
    fieldIds = new Array[Int](testSize)

    var i = 0
    while i < testSize do
      hashCodes(i) = rng.nextInt() | (rng.nextInt() << 16)
      fieldIds(i) = rng.nextInt(10000)
      i += 1

    println(s"""
               |Setting up test with:
               |  Processors: $processors
               |  Old base: $oldBase
               |  New base: $newBase
               |  Mask: $mask
               |""".stripMargin)

  @Benchmark
  def oldModuloMethod(): Unit =
    var i = 0
    while i < testSize do
      val hashCode = hashCodes(i)
      val fieldId = fieldIds(i)
      (hashCode + fieldId) % oldBase
      i += 1

  @Benchmark
  def newPowerOf2Method(): Unit =
    var i = 0
    while i < testSize do
      val hashCode = hashCodes(i)
      val fieldId = fieldIds(i)
      (hashCode + fieldId) & mask
      i += 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants