Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The octree tests get stuck in the Jetson Xavier and RTX 20xx GPUs #105

Open
AlejoDiaz49 opened this issue Nov 8, 2019 · 9 comments · May be fixed by #133
Open

The octree tests get stuck in the Jetson Xavier and RTX 20xx GPUs #105

AlejoDiaz49 opened this issue Nov 8, 2019 · 9 comments · May be fixed by #133

Comments

@AlejoDiaz49
Copy link

I tried to use the library in the Jetson Xavier, but the test test_gpu_voxels_core is getting stuck at the begging when it prints this:

insertTest()
create octree....

My setup is exactly this one:

  • Kernel: 4.9.140-tegra
  • Eigen: 3.3.7
  • PCL: 1.9.1
  • TinyXML: 2.6.2

I tried to change the line with the number 31 as is written there but nothing happened. In both cases, with 63 and 31, everything compiles without problems, but when I launch this test it doesn't work.

Any clue about this?

@cjue
Copy link
Contributor

cjue commented Nov 8, 2019

The octree code has a couple of issues on new CUDA hardware. I will try to test on our own Jetson Xavier device.

When you don't get any error message, this is probably related to incorrect octree load-balancing. If you don't need the octrees, my recommendation would be to ignore those tests for the moment.

One way to disable specific tests:

# the backslash stops bash from interpreting the exaclamation mark
./bin/test_gpu_voxels_core --run_test=\!octree_selftest:\!octree_collisions

Do the remaining tests succeed?

@AlejoDiaz49
Copy link
Author

Now it finishes, but it returns an error TestOutput.txt

I also have this issue in my tests with the method insertBoxIntoMap, this work with octrees as well right? and with new CUDA hardware do you also mean the GeForce 20 series?

@cjue
Copy link
Contributor

cjue commented Nov 8, 2019

Thanks for the error log.

<2019-11-08 15:51:58.058> VoxellistLog(Error) TemplateVoxelList::writeToDisk: Write to file temp_list.lst failed!

Do you have write access to your current working directory when you run the tests? The test tries to write a temporary file there (and also a GPUVoxelsBenchmarkProtocol_*.txt report file).

I also have this issue in my tests with the method insertBoxIntoMap

Which type of map are you calling insertBoxIntoMap on? It works with all map types, by creating a "point cloud" and inserting them.

@cjue
Copy link
Contributor

cjue commented Nov 8, 2019

and with new CUDA hardware do you also mean the GeForce 20 series?

I found inflated octree runtimes even back to compute capability 5.x (e.g. GTX 980) GPUs as well as 6.x ones. The new 2080 and similar 7.x GPUs are probably also affected.

We have an ongoing bug hunt for these random runtime variations in octrees. Let me know if you also encounter this outside of the Xavier platform.

@AlejoDiaz49
Copy link
Author

AlejoDiaz49 commented Nov 8, 2019

Do you have write access to your current working directory when you run the tests? The test tries to write a temporary file there (and also a GPUVoxelsBenchmarkProtocol_*.txt report file).

Ah yes, thanks. Now there is no error in the test.

Which type of map are you calling insertBoxIntoMap on? It works with all map types, by creating a "point cloud" and inserting them.

The map is MT_BITVECTOR_OCTREE. I tried with a probabilistic VoxelMap and everything works, thanks a lot.

We have an ongoing bug hunt for these random runtime variations in octrees. Let me know if you also encounter this outside of the Xavier platform.

Ok, until now I can tell you that it works in a:

  • GeForce MX130 (CUDA Version 10.2 / Capability 5.0)
  • GeForce GTX 1060 (CUDA Version 10.1 / Capability 6.1)
  • GeForce GTX 1050 (CUDA Version 9.1 / Capability 6.1)

@cjue
Copy link
Contributor

cjue commented Nov 13, 2019

The Jetson Xavier has compute capability 7.2, and should not require changes to maxrregcount.

I could confirm the octree issues with all the octree tests on our Jetson Xavier board (Capability 7.2) and GeForce RTX 2080 Super (Capability 7.5).

For these two platforms octrees appear to be unusable at the moment, much worse than for 5.x and 6.x devices. I will try to find a solution to this as quickly as possible.

@cjue cjue changed the title The tests get stuck in the Jetson Xavier The octree tests get stuck in the Jetson Xavier Nov 13, 2019
@cjue cjue changed the title The octree tests get stuck in the Jetson Xavier The octree tests get stuck in the Jetson Xavier and RTX 20xx GPUs Nov 13, 2019
@kire93
Copy link

kire93 commented May 14, 2020

I also tried to use the library on a Jetson Xavier with CUDA Version 10.0 and Ubuntu 18.04.4 LTS. When I ran test_gpu_voxels_core, it also got stuck at:

insertTest()
create octree....

If I outline the octree with

./bin/test_gpu_voxels_core --run_test=\!octree_selftest:\!octree_collisions

it finishes successfully.

My actual intention is to load a PointCloud from a camera into an environment map. This works great when I use a MT_PROBAB_VOXELMAP with the insertPointCloud() or insertSensorData() commands. But as soon as I use an octree (MT_PROBAB_OCTREE or MT_BITVECTOR_OCTREE) the same effect occurs. The program gets stuck on the commands insertPointCloud() or insertPointCloudWithFreespaceCalculation(). It doesn't matter whether the PointCloud is recorded by a real camera or is insert as a box. I don't get an error message when I run the program, it just gets stuck. It was mentioned that the octree is currently unusable for the Jetson Xavier. Has a solution been found in the meantime or doesn't it work on a Jetson Xavier yet?

Thank you very much in advance for your answer.

@cjue
Copy link
Contributor

cjue commented May 14, 2020

It was mentioned that the octree is currently unusable for the Jetson Xavier. Has a solution been found in the meantime or doesn't it work on a Jetson Xavier yet?

Hi @kire93, unfortunately I was not yet able to fix this. The problem is caused by octree-related code written for much older CUDA architectures, with assumptions about warp scheduling and synchronization that are not valid any more on new hardware.

Therefore fixing the buggy behavior you see involves rewriting at least part of the octree load balancing internals. As a stopgap measure I will replace the faulty tests and functions with a log message on affected hardware, which should save everyone a lot of time.

On the positive side, the octree code base will be quite a bit smaller and run faster when the rewrite is done...

@kire93
Copy link

kire93 commented May 15, 2020

Hi @cjue, thank you very much for your quick and detailed response. I am glad to hear that you are still working on the problem and I will follow your solution with great interest.

@Olli1080 Olli1080 linked a pull request Aug 5, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants