-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JNI for Pose3d exp and log #5444
Add JNI for Pose3d exp and log #5444
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the JNI took some doubles instead of complex objects, we could remove a lot of the added code and avoid the JVM class and method lookups, which are slow.
A Pose3d is three doubles for translation and four doubles for the quaternion. A Twist3d is three doubles for the translation and three doubles for the rotation.
Ah, so it's faster to unpack the values in Java than in JNI? |
Since the object originated in Java, yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pose3d needs to be modified to use the JNI functions.
So will the performance be compared between versions of the code? (something like |
The Pose3d and Twist3d tests don't have enough code to make this difference consistently measurable. You could make a separate test case that runs in a loop, then measure the runtime for it with one impl vs the other. The time for one call will be the total time divided by the number of loops. You could copy the test between the main and PR branches to test each. |
The ideal solution would've been putting the timing code into
This also occurs when just constructing an empty Because of this, I put the timing code into |
We don't plan on putting this test in the actual test suite anyway, so it failing doesn't matter. It's just for performance metrics. |
Results from running locally: Output using JNI:
Output on main (aaea85f) after
Update: After running a two more times for each one, JNI had results of 58.18 ms to 86.27 ms for exp() and 39.32 ms to 41.33 ms for log(), and |
How many iterations was it averaged over? 45 ms is extremely high for one iteration. Please post your benchmark code. |
The benchmark code should be pushed in |
It's still almost a 2x improvement on average for exp(), which is pretty good. Less so for log(), but at least it didn't get worse. |
Use correct time unit Print number of iterations Show multiple runs
All tests were run with the same values:
I ran Results:
I'm very confused about these results, since now Java's performing very significantly better. |
Also I saw your comments calcmogul, I'll clean things up tomorrow. |
You may need to run the test for a while before starting the timing to ensure any JIT that's going to occur has happened, and that what happened during that time doesn't affect the measurements. |
We should also run the benchmark on a Rio (easiest way to do this is by adding the benchmark to the myRobot project). |
Who would run it on a Rio? Also would I make the benchmark run within the timed robot framework (e.g., put it in |
I can benchmark on a RIO tonight. It's just the code in DevMain? |
It seems that only the |
Yes, it's just the code in |
Here's what I got after setting release flags for main and this PR: main:
PR:
So |
Running with release mode:
JNI's definitely beating pure Java now, but I'm still a bit uncertain about which JNI version is fastest. My machine does not copy arrays in |
Didn't see your comment calcmogul when I posted, but good to see that our results match pretty well (except for some scaling on exp, which isn't surprising given the different machines). |
Java allocations are very cheap, but have the downside of increasing GC pressure, so while you may not see it in the benchmark hot path, there will be an increased system cost. In general, allocation will be a LOT more expensive than copying ~100 bytes. |
This was unexpected (at least by me): Changing the inputs to double[] slightly slowed things down, but changing the return type to double[] sped things up exp() specifically by 50%.
I'm not too familiar with GCs, but how significant is the memory pressure with temporary objects? (Also, I guess that for some reason it's significantly more expensive to construct objects in JNI than in Java) |
Constructing Objects in JNI, or calling methods on them will always be incredibly slow. Its basically a reflection lookup and reflection call each time. As for the GC, lots of temporary objects cause a lot more memory pressure. They might not show up in a test, because a lot could be allocated without triggering a gc, and its possible whatever test framework used could cause a GC to happen in a non timed period. |
So should I change the return value to a double array or keep it as Java objects? Are there changes to the test framework that would make it more similar to robot code for the GC? |
It would be most performant to have the JNI consume and return a double array then construct the Pose3d/Twist3d on the Java side. You need to make a Java object at some point anyway, and doing it in Java avoids the JNI reflection. |
Impl looks good to me. Now we just need RIO performance metrics. |
Note we'll want to revert the MyRobot changes before merging. |
We'll also want to revert the DevMain.java changes. |
Here's the results from benchmarking on a RIO 1: Dev computer:
On RIO:
Test took ~3m58s Results for
Dev computer:
On RIO:
Test took ~3m41s Results for JNI:
|
Thanks! Seems like there's good improvement, though it's hard to tell with how high the RIO standard deviations are. Should we run more tests, or should I remove the testing code? |
I think we’re good to remove the testing code and prepare to merge. It’s a clear improvement. |
Fixes #5311
Just adds the JNI methods and tests which check that they work the same as the pure Java implementations, as requested in the issue. (I don't know how to compare the performance, though)