Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JNI for Pose3d exp and log #5444

Merged
merged 21 commits into from
Jul 18, 2023

Conversation

KangarooKoala
Copy link
Contributor

Fixes #5311

Just adds the JNI methods and tests which check that they work the same as the pure Java implementations, as requested in the issue. (I don't know how to compare the performance, though)

@KangarooKoala KangarooKoala requested a review from a team as a code owner July 14, 2023 01:01
Copy link
Member

@calcmogul calcmogul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the JNI took some doubles instead of complex objects, we could remove a lot of the added code and avoid the JVM class and method lookups, which are slow.

A Pose3d is three doubles for translation and four doubles for the quaternion. A Twist3d is three doubles for the translation and three doubles for the rotation.

@KangarooKoala
Copy link
Contributor Author

Ah, so it's faster to unpack the values in Java than in JNI?

@calcmogul
Copy link
Member

Since the object originated in Java, yes.

Copy link
Member

@calcmogul calcmogul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pose3d needs to be modified to use the JNI functions.

@KangarooKoala
Copy link
Contributor Author

Pose3d needs to be modified to use the JNI functions.

So will the performance be compared between versions of the code? (something like git checkout KangarooKoala/pose3d-exp-log-jni; <run test>; git checkout main; <run test>)

@calcmogul
Copy link
Member

calcmogul commented Jul 14, 2023

The Pose3d and Twist3d tests don't have enough code to make this difference consistently measurable. You could make a separate test case that runs in a loop, then measure the runtime for it with one impl vs the other. The time for one call will be the total time divided by the number of loops.

You could copy the test between the main and PR branches to test each.

@KangarooKoala
Copy link
Contributor Author

The ideal solution would've been putting the timing code into DevMain, but ./gradlew :wpimath:run had the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/ejml/data/Matrix
        at edu.wpi.first.math.DevMain.main(DevMain.java:14)
Caused by: java.lang.ClassNotFoundException: org.ejml.data.Matrix
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 1 more

This also occurs when just constructing an empty Rotation3d().

Because of this, I put the timing code into WPIMathJNITest. The tests seem to suppress both System.out and System.err output, so I have to resort to throwing an exception to get actual output, which makes the tests fail.

@calcmogul
Copy link
Member

We don't plan on putting this test in the actual test suite anyway, so it failing doesn't matter. It's just for performance metrics.

@KangarooKoala
Copy link
Contributor Author

KangarooKoala commented Jul 14, 2023

Results from running locally:

Output using JNI:

WPIMathJNITest > timePose3dExp() FAILED
    java.lang.RuntimeException: report:
    Pose3d.exp():
      start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
      twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
      average milliseconds: 58.180481
        at edu.wpi.first.math.WPIMathJNITest.timePose3dExp(WPIMathJNITest.java:49)

WPIMathJNITest > timePose3dLog() FAILED
    java.lang.RuntimeException: report:
    Pose3d.log():
      start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
      end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
      average milliseconds: 39.324258
        at edu.wpi.first.math.WPIMathJNITest.timePose3dLog(WPIMathJNITest.java:61)

Output on main (aaea85f) after git restore -s pose3d-exp-log-jni -- wpimath/src/test/java/edu/wpi/first/math/WPIMathJNITest.java:

WPIMathJNITest > timePose3dExp() FAILED
    java.lang.RuntimeException: report:
    Pose3d.exp():
      start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
      twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
      average milliseconds: 98.860128
        at edu.wpi.first.math.WPIMathJNITest.timePose3dExp(WPIMathJNITest.java:49)

WPIMathJNITest > timePose3dLog() FAILED
    java.lang.RuntimeException: report:
    Pose3d.log():
      start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
      end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
      average milliseconds: 45.053423
        at edu.wpi.first.math.WPIMathJNITest.timePose3dLog(WPIMathJNITest.java:61)

Update: After running a two more times for each one, JNI had results of 58.18 ms to 86.27 ms for exp() and 39.32 ms to 41.33 ms for log(), and main had results of 97.90 ms to 108.89 ms for exp() and 39.18 ms to 45.05 ms for log().

@calcmogul
Copy link
Member

calcmogul commented Jul 14, 2023

How many iterations was it averaged over? 45 ms is extremely high for one iteration. Please post your benchmark code.

@KangarooKoala
Copy link
Contributor Author

The benchmark code should be pushed in WPIMathJNITest.java, and it was averaged over only 1000 iterations. I could benchmark over more iterations tonight.

@calcmogul
Copy link
Member

It's still almost a 2x improvement on average for exp(), which is pretty good. Less so for log(), but at least it didn't get worse.

Use correct time unit
Print number of iterations
Show multiple runs
wpimath/src/main/native/cpp/jni/WPIMathJNI.cpp Outdated Show resolved Hide resolved
wpimath/src/main/native/cpp/jni/WPIMathJNI.cpp Outdated Show resolved Hide resolved
wpimath/src/main/native/cpp/jni/WPIMathJNI.cpp Outdated Show resolved Hide resolved
wpimath/src/main/native/cpp/jni/WPIMathJNI.cpp Outdated Show resolved Hide resolved
wpimath/src/main/native/cpp/jni/WPIMathJNI.cpp Outdated Show resolved Hide resolved
wpimath/src/main/native/cpp/jni/WPIMathJNI.cpp Outdated Show resolved Hide resolved
@KangarooKoala
Copy link
Contributor Author

All tests were run with the same values:

Exp:
      start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
      twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
Log:
      start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
      end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))

I ran wpimath:testDesktopJava 3 times for each set of code. Within each test, there are five runs, each with 1000 calls. (There were 351000=15000 total calls for each set of code)

Results:

Latest:
  Exp:
      run 1: average 53.12992μs
      run 2: average 41.043313999999995μs
      run 3: average 40.958409μs
      run 4: average 38.293073μs
      run 5: average 38.210373μs
  Log:
      run 1: average 53.12992μs
      run 2: average 41.043313999999995μs
      run 3: average 40.958409μs
      run 4: average 38.293073μs
      run 5: average 38.210373μs

  Exp:
      run 1: average 58.48583μs
      run 2: average 46.917997μs
      run 3: average 43.899507μs
      run 4: average 39.625592μs
      run 5: average 38.706832999999996μs
  Log:
      run 1: average 38.971568μs
      run 2: average 25.762291μs
      run 3: average 24.550192μs
      run 4: average 24.747035μs
      run 5: average 25.248191μs

  Exp:
      run 1: average 52.164429μs
      run 2: average 42.980228000000004μs
      run 3: average 42.866025μs
      run 4: average 38.756681μs
      run 5: average 38.217113μs
  Log:
      run 1: average 33.230883999999996μs
      run 2: average 25.851509999999998μs
      run 3: average 25.402437000000003μs
      run 4: average 25.483408999999998μs
      run 5: average 24.813485μs


Old JNI:
  git checkout --detach ad63760
  git restore -s 546800c wpimath/src/test/java/edu/wpi/first/math/WPIMathJNITest.java

  Exp:
      run 1: average 58.933543μs
      run 2: average 42.629724μs
      run 3: average 42.189028μs
      run 4: average 39.545277999999996μs
      run 5: average 38.456942000000005μs
  Log:
      run 1: average 36.334248μs
      run 2: average 26.53427μs
      run 3: average 26.443434μs
      run 4: average 26.325218μs
      run 5: average 26.264263μs

  Exp:
      run 1: average 52.068613μs
      run 2: average 42.62615μs
      run 3: average 40.949432μs
      run 4: average 37.607365μs
      run 5: average 36.620735μs
  Log:
      run 1: average 37.152859μs
      run 2: average 25.729931μs
      run 3: average 25.013565999999997μs
      run 4: average 24.090074μs
      run 5: average 23.688123μs

  Exp:
      run 1: average 57.152034μs
      run 2: average 46.953031μs
      run 3: average 45.577352μs
      run 4: average 40.703798μs
      run 5: average 39.896879999999996μs
  Log:
      run 1: average 38.035892μs
      run 2: average 26.311195μs
      run 3: average 24.257792000000002μs
      run 4: average 23.859375μs
      run 5: average 24.02324μs

Pure Java:
  git checkout --detach main
    (main is aaea85ff1)
  git restore -s 546800c wpimath/src/test/java/edu/wpi/first/math/WPIMathJNITest.java

  Exp:
      run 1: average 102.58741099999999μs
      run 2: average 28.160149μs
      run 3: average 23.419479μs
      run 4: average 22.723291μs
      run 5: average 28.013217μs
  Log:
      run 1: average 14.990053μs
      run 2: average 16.649438999999997μs
      run 3: average 8.933627μs
      run 4: average 8.524281μs
      run 5: average 8.165511μs

  Exp:
      run 1: average 91.738444μs
      run 2: average 23.169584μs
      run 3: average 22.698014999999998μs
      run 4: average 20.937452μs
      run 5: average 27.092988000000002μs
  Log:
      run 1: average 19.753909μs
      run 2: average 16.282923μs
      run 3: average 8.458788μs
      run 4: average 7.556838μs
      run 5: average 7.417304μs

  Exp:
      run 1: average 95.524235μs
      run 2: average 26.733477999999998μs
      run 3: average 25.613077μs
      run 4: average 21.046727μs
      run 5: average 25.756777999999997μs
  Log:
      run 1: average 23.657487μs
      run 2: average 16.035242μs
      run 3: average 12.160426μs
      run 4: average 10.780788000000001μs
      run 5: average 9.910645μs

I'm very confused about these results, since now Java's performing very significantly better. exp() seems to get a lot of help for JIT compiling, but I'm still confused how log() is doing better than I ever recorded it doing previously.
I recorded every git command I ran (minus some git status and failed git checkout commands before I ran git restore . to get a clean working tree), so maybe someone can see where I messed up somewhere?

@KangarooKoala
Copy link
Contributor Author

Also I saw your comments calcmogul, I'll clean things up tomorrow.

@calcmogul
Copy link
Member

calcmogul commented Jul 15, 2023

You may need to run the test for a while before starting the timing to ensure any JIT that's going to occur has happened, and that what happened during that time doesn't affect the measurements.

@PeterJohnson
Copy link
Member

We should also run the benchmark on a Rio (easiest way to do this is by adding the benchmark to the myRobot project).

@KangarooKoala
Copy link
Contributor Author

Who would run it on a Rio? Also would I make the benchmark run within the timed robot framework (e.g., put it in robotInit()) or elsewhere?

@rzblue
Copy link
Member

rzblue commented Jul 17, 2023

I can benchmark on a RIO tonight. It's just the code in DevMain?

@KangarooKoala
Copy link
Contributor Author

It seems that only the shared/jni/setupBuild.gradle flag needs to be changed to release, but I also changed the other one since it's just temporary anyways.

@KangarooKoala
Copy link
Contributor Author

KangarooKoala commented Jul 17, 2023

I can benchmark on a RIO tonight. It's just the code in DevMain?

Yes, it's just the code in DevMain.java. I could copy it into the myrobot project so you could just deploy it to the Rio from your , or you could run gradlew wpimath:run in the allwpilib repository. (I'm not sure how feasible having the whole repo on there would be, hence the copying to myrobot option.) If I do copy it to the myrobot project, would it make more sense for the code to be in Main.java or in something like robotInit()?

@calcmogul
Copy link
Member

calcmogul commented Jul 17, 2023

Here's what I got after setting release flags for main and this PR:

main:

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 5.528899460000049µs, std dev 57.328942225854085µs
  iterations per run: 1000
  run 1: mean 2.512490000000004µs, std dev 0.747721088307666µs
  run 2: mean 2.6502350000000003µs, std dev 0.9252607609614698µs
  run 3: mean 3.6962960000000034µs, std dev 40.593735001948055µs
  run 4: mean 2.230944999999999µs, std dev 0.4888850846313474µs
  run 5: mean 2.495122000000002µs, std dev 1.0619146364543612µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 2.795758189999995µs, std dev 9.868246027349374µs
  iterations per run: 1000
  run 1: mean 2.2690169999999994µs, std dev 0.9942003564226888µs
  run 2: mean 2.7280809999999995µs, std dev 1.1159565755167173µs
  run 3: mean 2.655426999999999µs, std dev 0.9600135044211623µs
  run 4: mean 2.3964479999999995µs, std dev 0.5835580633458851µs
  run 5: mean 2.4754989999999983µs, std dev 0.8358185317393956µs

PR:

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 1.9433407899999922µs, std dev 4.6180453602143015µs
  iterations per run: 1000
  run 1: mean 1.8053149999999998µs, std dev 1.0700685350831498µs
  run 2: mean 1.4002869999999983µs, std dev 0.6134915196080557µs
  run 3: mean 1.352037000000001µs, std dev 0.3882399717069332µs
  run 4: mean 1.3030939999999984µs, std dev 0.3651649205003131µs
  run 5: mean 1.30147µs, std dev 0.30565680280340546µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 0.8114945399999931µs, std dev 0.5868834705204059µs
  iterations per run: 1000
  run 1: mean 0.7150760000000002µs, std dev 0.32411485653082894µs
  run 2: mean 0.7361459999999994µs, std dev 0.2793493738743651µs
  run 3: mean 0.7249370000000006µs, std dev 0.23535258025141786µs
  run 4: mean 0.6907119999999995µs, std dev 0.310498465464806µs
  run 5: mean 0.7328019999999985µs, std dev 0.2933702316118669µs

So (1.8-2.5)/2.5 = -0.28 -> 28% runtime drop for exp() and (0.7-2.5)/2.5 = -0.72 -> 72% runtime drop for log().

@KangarooKoala
Copy link
Contributor Author

Running with release mode:

% git checkout --detach 30dd17f
% ./gradlew :wpimath:run

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 4.251241269999979µs, std dev 554.8476005607279µs
  iterations per run: 1000
  run 1: mean 1.942649999999999µs, std dev 1.3693239147477112µs
  run 2: mean 2.024640999999999µs, std dev 0.9576497752931409µs
  run 3: mean 1.867678µs, std dev 1.1714196303272402µs
  run 4: mean 2.2670899999999996µs, std dev 1.676397146233553µs
  run 5: mean 2.1090609999999987µs, std dev 2.0448450731727816µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 0.8292191699999989µs, std dev 1.4493320305418453µs
  iterations per run: 1000
  run 1: mean 0.8288259999999983µs, std dev 1.0287960846173527µs
  run 2: mean 0.6772859999999988µs, std dev 0.9312526060119244µs
  run 3: mean 0.7854479999999991µs, std dev 0.726760199306484µs
  run 4: mean 0.6661859999999975µs, std dev 0.46842144635360145µs
  run 5: mean 0.653403000000001µs, std dev 0.3325097992405616µs
DataLog: Logging to 'wpilog_66e2f3e898164425.wpilog'

% git checkout --detach ad63760
% git restore -s 30dd17f -- shared/javacpp/setupBuild.gradle shared/jni/setupBuild.gradle wpimath/build.gradle wpimath/src/dev/java/edu/wpi/first/math/DevMain.java
% ./gradlew :wpimath:run

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 3.6400589500000606µs, std dev 554.9297374498834µs
  iterations per run: 1000
  run 1: mean 1.4383980000000005µs, std dev 1.6037437555906484µs
  run 2: mean 1.2687020000000007µs, std dev 0.6653663661442465µs
  run 3: mean 1.210418µs, std dev 0.3533561507544477µs
  run 4: mean 1.406861000000002µs, std dev 1.6276872720762419µs
  run 5: mean 1.1370079999999994µs, std dev 0.16602267295764134µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 0.9010020600000058µs, std dev 6.513513000912943µs
  iterations per run: 1000
  run 1: mean 0.8076879999999986µs, std dev 0.6775072196338583µs
  run 2: mean 0.778073999999999µs, std dev 0.17188769741898433µs
  run 3: mean 0.7728239999999985µs, std dev 0.12278206311998505µs
  run 4: mean 0.7610439999999978µs, std dev 0.056181776974389086µs
  run 5: mean 0.779593999999999µs, std dev 0.29793885809675824µs
DataLog: Logging to 'wpilog_10bfe2bddfdae7d2.wpilog'

% git restore .
% git checkout --detach main
% git restore -s 30dd17f -- shared/javacpp/setupBuild.gradle shared/jni/setupBuild.gradle wpimath/build.gradle wpimath/src/dev/java/edu/wpi/first/math/DevMain.java
% ./gradlew :wpimath:run

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 7.282167149999999µs, std dev 56.44988215712018µs
  iterations per run: 1000
  run 1: mean 4.026312000000001µs, std dev 1.029352955334563µs
  run 2: mean 3.561899999999999µs, std dev 2.434353728199746µs
  run 3: mean 2.9520659999999994µs, std dev 0.3846252379186784µs
  run 4: mean 3.2383269999999995µs, std dev 1.8256434810967321µs
  run 5: mean 3.2774700000000005µs, std dev 1.9367427699877957µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 3.224196169999905µs, std dev 12.45865067999394µs
  iterations per run: 1000
  run 1: mean 2.6062020000000006µs, std dev 1.0240281837898801µs
  run 2: mean 2.5404400000000025µs, std dev 1.1281393293383577µs
  run 3: mean 2.5044089999999986µs, std dev 0.760519864118617µs
  run 4: mean 3.3548979999999973µs, std dev 21.615909702753584µs
  run 5: mean 2.5780140000000067µs, std dev 1.2574968023036874µs
DataLog: Logging to 'wpilog_731511bf49974060.wpilog'

JNI's definitely beating pure Java now, but I'm still a bit uncertain about which JNI version is fastest. My machine does not copy arrays in env->GetDoubleArrayElements, which may differ from the RIO and which would impact the relative performance between the two versions. (I could see it being cheaper to allocate 2 double arrays and pass two references to them than to copy 13 doubles, but I'll run some tests to make sure.)

@KangarooKoala
Copy link
Contributor Author

Didn't see your comment calcmogul when I posted, but good to see that our results match pretty well (except for some scaling on exp, which isn't surprising given the different machines).

@PeterJohnson
Copy link
Member

Java allocations are very cheap, but have the downside of increasing GC pressure, so while you may not see it in the benchmark hot path, there will be an increased system cost. In general, allocation will be a LOT more expensive than copying ~100 bytes.

@KangarooKoala
Copy link
Contributor Author

This was unexpected (at least by me): Changing the inputs to double[] slightly slowed things down, but changing the return type to double[] sped things up exp() specifically by 50%.

Latest:
Pose3d.exp():
  <same inputs as always>
  warm up (100000 iterations): mean 4.350842790000205µs, std dev 544.8021229155931µs
  iterations per run: 1000
  run 1: mean 1.7763310000000005µs, std dev 1.0699783499861106µs
  run 2: mean 2.2023280000000023µs, std dev 4.008273016202364µs
  run 3: mean 1.6187679999999987µs, std dev 0.4915373232787109µs
  run 4: mean 1.6210809999999984µs, std dev 0.7460877726105694µs
  run 5: mean 2.0371650000000003µs, std dev 1.3445008366583495µs
Pose3d.log():
  <same inputs as always>
  warm up (100000 iterations): mean 0.9510567099999891µs, std dev 2.032192809699567µs
  iterations per run: 1000
  run 1: mean 0.8205949999999999µs, std dev 0.368394119083082µs
  run 2: mean 0.8307860000000001µs, std dev 0.25129714324679436µs
  run 3: mean 0.8750630000000023µs, std dev 1.2421738183648026µs
  run 4: mean 0.8314029999999996µs, std dev 0.0553652290792699µs
  run 5: mean 0.9394079999999996µs, std dev 1.018634448433786µs
  
Input double[]:
Pose3d.exp():
  <same inputs as always>
  warm up (100000 iterations): mean 5.31269083999997µs, std dev 675.2458766975202µs
  iterations per run: 1000
  run 1: mean 2.1584280000000007µs, std dev 1.3506486955592856µs
  run 2: mean 2.084517000000002µs, std dev 1.008303831050441µs
  run 3: mean 2.2500059999999986µs, std dev 1.3786326885592093µs
  run 4: mean 2.160159999999996µs, std dev 1.3620441558187457µs
  run 5: mean 2.032093000000001µs, std dev 0.6752498310632894µs
Pose3d.log():
  <same inputs as always>
  warm up (100000 iterations): mean 0.7574985499999924µs, std dev 1.1257231425434533µs
  iterations per run: 1000
  run 1: mean 0.6616070000000005µs, std dev 0.30078951536082427µs
  run 2: mean 0.654243999999999µs, std dev 0.25297929651258017µs
  run 3: mean 0.7127629999999988µs, std dev 0.9648084560320761µs
  run 4: mean 0.6635609999999993µs, std dev 0.39537831791715605µs
  run 5: mean 0.6636090000000003µs, std dev 0.16081369381678856µs

Output double[]:
Pose3d.exp():
  <same inputs as always>
  warm up (100000 iterations): mean 3.291479960000161µs, std dev 613.2600022762634µs
  iterations per run: 1000
  run 1: mean 0.8271829999999982µs, std dev 0.4466510332586292µs
  run 2: mean 0.791813999999999µs, std dev 0.21772146748540971µs
  run 3: mean 0.8117519999999988µs, std dev 0.3422799008063428µs
  run 4: mean 0.7778320000000005µs, std dev 0.11070152562634358µs
  run 5: mean 0.7951899999999985µs, std dev 0.19937191853418101µs
Pose3d.log():
  <same inputs as always>
  warm up (100000 iterations): mean 0.6494966099999887µs, std dev 4.3423922720980626µs
  iterations per run: 1000
  run 1: mean 0.5575490000000002µs, std dev 0.3426027518847427µs
  run 2: mean 0.5505729999999993µs, std dev 0.09255938996665851µs
  run 3: mean 0.5647819999999985µs, std dev 0.43655600153473995µs
  run 4: mean 0.5453359999999988µs, std dev 0.04641197155907096µs
  run 5: mean 0.6358459999999989µs, std dev 1.2055209132503673µs

I'm not too familiar with GCs, but how significant is the memory pressure with temporary objects? (Also, I guess that for some reason it's significantly more expensive to construct objects in JNI than in Java)

@ThadHouse
Copy link
Member

Constructing Objects in JNI, or calling methods on them will always be incredibly slow. Its basically a reflection lookup and reflection call each time.

As for the GC, lots of temporary objects cause a lot more memory pressure. They might not show up in a test, because a lot could be allocated without triggering a gc, and its possible whatever test framework used could cause a GC to happen in a non timed period.

@KangarooKoala
Copy link
Contributor Author

So should I change the return value to a double array or keep it as Java objects? Are there changes to the test framework that would make it more similar to robot code for the GC?

@calcmogul
Copy link
Member

calcmogul commented Jul 17, 2023

It would be most performant to have the JNI consume and return a double array then construct the Pose3d/Twist3d on the Java side. You need to make a Java object at some point anyway, and doing it in Java avoids the JNI reflection.

@calcmogul
Copy link
Member

Impl looks good to me. Now we just need RIO performance metrics.

@PeterJohnson
Copy link
Member

Note we'll want to revert the MyRobot changes before merging.

@calcmogul
Copy link
Member

We'll also want to revert the DevMain.java changes.

@rzblue
Copy link
Member

rzblue commented Jul 18, 2023

Here's the results from benchmarking on a RIO 1:

Dev computer:

git checkout --detach e5452e3
git restore -s c7927ce -- shared/javacpp/setupBuild.gradle shared/jni/setupBuild.gradle wpimath/build.gradle wpimath/src/dev/java/edu/wpi/first/math/DevMain.java myRobot/src/main/java/frc/robot/MyRobot.java
./gradlew myRobot:build
./gradlew myRobot:deployJava

On RIO:

frcRunRobot.sh

Test took ~3m58s

Results for main:

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 166.1110329500002µs, std dev 485.29685841349266µs
  iterations per run: 1000
  run 1: mean 161.23461600000022µs, std dev 274.6104896466348µs
  run 2: mean 159.6266909999999µs, std dev 263.02388086702973µs
  run 3: mean 157.28079600000012µs, std dev 254.5092768872805µs
  run 4: mean 156.92961799999998µs, std dev 253.0252875361789µs
  run 5: mean 155.25996399999994µs, std dev 243.958751189398µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 150.3632419599998µs, std dev 250.07857422847374µs
  iterations per run: 1000
  run 1: mean 151.74660099999994µs, std dev 331.3735628252378µs
  run 2: mean 142.814113µs, std dev 208.38662174360962µs
  run 3: mean 124.622044µs, std dev 99.54465196276517µs
  run 4: mean 134.3820319999999µs, std dev 158.68405115168µs
  run 5: mean 142.24838100000005µs, std dev 204.1584684101542µs

Dev computer:

git reset --hard
git checkout --detach c7927ce
./gradlew myRobot:build
./gradlew myRobot:deployJava

On RIO:

frcRunRobot.sh

Test took ~3m41s

Results for JNI:

Pose3d.exp():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  twist = Twist3d(dX: 0.00, dY: 0.00, dZ: 0.00, rX: 3.14, rY: 0.00, rZ: 0.00)
  warm up (100000 iterations): mean 115.85929505999947µs, std dev 133.42732113633303µs
  iterations per run: 1000
  run 1: mean 110.7377240000002µs, std dev 86.24598447636751µs
  run 2: mean 112.16343500000012µs, std dev 107.32020420056877µs
  run 3: mean 109.40926900000004µs, std dev 74.59456928932988µs
  run 4: mean 108.45511299999994µs, std dev 16.868705236627697µs
  run 5: mean 109.98942099999995µs, std dev 69.06237274992624µs
Pose3d.log():
  start = Pose3d(Translation3d(X: 1.00, Y: 0.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.0, -0.7071067811865475, 0.0)))
  end = Pose3d(Translation3d(X: 0.00, Y: 1.00, Z: 0.00), Rotation3d(Quaternion(0.7071067811865476, 0.7071067811865475, 0.0, 0.0)))
  warm up (100000 iterations): mean 77.9198372499996µs, std dev 15.771255351731364µs
  iterations per run: 1000
  run 1: mean 75.33272500000011µs, std dev 12.962974618712137µs
  run 2: mean 74.676907µs, std dev 8.902687743954129µs
  run 3: mean 74.93459300000004µs, std dev 10.245133802413267µs
  run 4: mean 75.01325100000003µs, std dev 10.274193155961157µs
  run 5: mean 74.97768699999996µs, std dev 10.975726238706525µs

@KangarooKoala
Copy link
Contributor Author

Thanks! Seems like there's good improvement, though it's hard to tell with how high the RIO standard deviations are. Should we run more tests, or should I remove the testing code?

@PeterJohnson
Copy link
Member

I think we’re good to remove the testing code and prepare to merge. It’s a clear improvement.

calcmogul
calcmogul previously approved these changes Jul 18, 2023
@PeterJohnson PeterJohnson merged commit 6f7cdd4 into wpilibsuite:main Jul 18, 2023
21 checks passed
@KangarooKoala KangarooKoala deleted the pose3d-exp-log-jni branch July 19, 2023 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make a JNI for Pose3d.exp() and Pose3d.log()
5 participants