chore: optimize lazy val with power of two. #22428

He-Pin · 2025-01-21T19:18:47Z

Motivation:
Use & instead of %, which is faster.

He-Pin · 2025-01-21T19:23:38Z

library/src/scala/runtime/LazyVals.scala

@@ -27,17 +27,18 @@ object LazyVals {

  private val base: Int = {
    val processors = java.lang.Runtime.getRuntime.nn.availableProcessors()
-    8 * processors * processors
+    val rawSize = 8 * processors * processors


not sure why it's 8 here, Some CPU has 384 cores now, maybe 2 * 2 is better than 8.

The current implementation may use a little more memory than it was, but as many server just has 64 cores, which is the same.

He-Pin · 2025-01-22T03:08:49Z

library/src/scala/runtime/LazyVals.scala

-
-    if (id < 0) id += base
-    monitors(id)
+    monitors((java.lang.System.identityHashCode(obj) + fieldId) & mask)


Use & instead of %

library/src/scala/runtime/LazyVals.scala

He-Pin · 2025-01-22T19:43:48Z

jmh-result.json

with https://jmh.morethan.io

[info] Benchmark                          (processors)   Mode  Cnt           Score          Error  Units
[info] ModuloBenchmark.newPowerOf2Method            16  thrpt   10  1169017875.314 ± 25717906.652  ops/s
[info] ModuloBenchmark.newPowerOf2Method            32  thrpt   10  1169948109.834 ± 13735667.926  ops/s
[info] ModuloBenchmark.newPowerOf2Method            64  thrpt   10  1169506488.098 ± 14091486.582  ops/s
[info] ModuloBenchmark.newPowerOf2Method            96  thrpt   10  1171280212.547 ±  7645883.638  ops/s
[info] ModuloBenchmark.newPowerOf2Method           128  thrpt   10  1169116163.009 ± 15487127.019  ops/s
[info] ModuloBenchmark.oldModuloMethod              16  thrpt   10  1020610301.567 ± 14270157.632  ops/s
[info] ModuloBenchmark.oldModuloMethod              32  thrpt   10  1016082948.598 ± 11667961.439  ops/s
[info] ModuloBenchmark.oldModuloMethod              64  thrpt   10  1024484355.834 ± 11580908.271  ops/s
[info] ModuloBenchmark.oldModuloMethod              96  thrpt   10  1016482302.606 ± 12229045.684  ops/s
[info] ModuloBenchmark.oldModuloMethod             128  thrpt   10  1016325470.534 ± 10125272.051  ops/s
[info] ModuloBenchmark.newPowerOf2Method            16   avgt   10           0.856 ±        0.016  ns/op
[info] ModuloBenchmark.newPowerOf2Method            32   avgt   10           0.844 ±        0.011  ns/op
[info] ModuloBenchmark.newPowerOf2Method            64   avgt   10           0.856 ±        0.011  ns/op
[info] ModuloBenchmark.newPowerOf2Method            96   avgt   10           0.846 ±        0.005  ns/op
[info] ModuloBenchmark.newPowerOf2Method           128   avgt   10           0.855 ±        0.012  ns/op
[info] ModuloBenchmark.oldModuloMethod              16   avgt   10           0.984 ±        0.016  ns/op
[info] ModuloBenchmark.oldModuloMethod              32   avgt   10           0.982 ±        0.009  ns/op
[info] ModuloBenchmark.oldModuloMethod              64   avgt   10           0.984 ±        0.011  ns/op
[info] ModuloBenchmark.oldModuloMethod              96   avgt   10           0.977 ±        0.009  ns/op
[info] ModuloBenchmark.oldModuloMethod             128   avgt   10           0.982 ±        0.010  ns/op

with

package benchmark

import org.openjdk.jmh.annotations.*
import java.util.concurrent.TimeUnit
import scala.util.Random

@State(Scope.Thread)
@BenchmarkMode(Array(Mode.Throughput))  // 吞吐量模式
@OutputTimeUnit(TimeUnit.SECONDS)        // 每秒
@Warmup(iterations = 10, time = 1)
@Measurement(iterations = 10, time = 1)
@Fork(1)
class ModuloBenchmark:
  @Param(Array("16", "32", "64", "96", "128"))
  var processors: Int = _

  private var hashCodes: Array[Int] = _
  private var fieldIds: Array[Int] = _
  private var oldBase: Int = _
  private var newBase: Int = _
  private var mask: Int = _

  private val testSize = 1_000_000 // 100万个样本

  @Setup(Level.Trial)
  def setup(): Unit =
    oldBase = 8 * processors * processors

    val rawSize = 8 * processors * processors
    newBase = 1 << (32 - Integer.numberOfLeadingZeros(rawSize - 1))
    mask = newBase - 1

    val rng = new Random(42)

    hashCodes = new Array[Int](testSize)
    fieldIds = new Array[Int](testSize)

    var i = 0
    while i < testSize do
      hashCodes(i) = rng.nextInt() | (rng.nextInt() << 16)
      fieldIds(i) = rng.nextInt(10000)
      i += 1

    println(s"""
               |Setting up test with:
               |  Processors: $processors
               |  Old base: $oldBase
               |  New base: $newBase
               |  Mask: $mask
               |""".stripMargin)

  @Benchmark
  def oldModuloMethod(): Unit =
    var i = 0
    while i < testSize do
      val hashCode = hashCodes(i)
      val fieldId = fieldIds(i)
      (hashCode + fieldId) % oldBase
      i += 1

  @Benchmark
  def newPowerOf2Method(): Unit =
    var i = 0
    while i < testSize do
      val hashCode = hashCodes(i)
      val fieldId = fieldIds(i)
      (hashCode + fieldId) & mask
      i += 1

chore: optimize lazy val with power of two.

0f8f2c2

He-Pin commented Jan 21, 2025

View reviewed changes

He-Pin commented Jan 22, 2025

View reviewed changes

counter2015 reviewed Jan 22, 2025

View reviewed changes

library/src/scala/runtime/LazyVals.scala Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: optimize lazy val with power of two. #22428

chore: optimize lazy val with power of two. #22428

He-Pin commented Jan 21, 2025

He-Pin Jan 21, 2025 •

edited

Loading

He-Pin Jan 22, 2025

He-Pin commented Jan 22, 2025

chore: optimize lazy val with power of two. #22428

Are you sure you want to change the base?

chore: optimize lazy val with power of two. #22428

Conversation

He-Pin commented Jan 21, 2025

He-Pin Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

He-Pin Jan 22, 2025

Choose a reason for hiding this comment

He-Pin commented Jan 22, 2025

He-Pin Jan 21, 2025 •

edited

Loading