Tensor RT #285

bushibushi · 2017-10-17T21:52:00Z

Opening this for those wanting to test Tensor RT inference in advance. Still using FP32 in caffe blobs for now to keep things simple, but still get 2x shorter inference on jetson TX2. Did not test on bigger cards.

Documentation not done yet and usage still by hand: you have to duplicate the prototxt file and call it pose_deploy_linevec.prototxt_<desired_input_height>x<desired_input_width> and make these values match with the net_resolution used and the values in the new prototxt file.

First usage will take a bit of time to create the serialized TensorRT engine, then nexts runs will be faster.

Branch compilation separation messed things up, if you want to test checkout commit
b3ae8ec or the one before the merge.

…forms Caffe inference.

…n laptop. DOES NOT compile, convenience commit.

… segfault, need precise step logs or debug.

…ymore, still not working.

…cleaning.

… for now.

makeitraina · 2017-11-05T19:36:38Z

Makefile

+
+        ifeq ($(DEEP_NET), tensorrt)
+                COMMON_FLAGS += -DUSE_TENSORRT
+        endif
 endif


If DEEP_NET is tensorrt then the else clause is never reached hence lines 73-76 are not needed. It looks like the libraries and dirs that are part of the else clause will still be needed for tensorrt however so that should be moved up.

For now TensorRT is included for the main inferences on the middle of a pipeline using CAFFE, for example I use caffe blobs for input and output. I think it's line 64-65 that should be removed.

bushibushi · 2017-11-07T14:46:42Z

Sorry for the waiting, got caught up in some other work, I now try polishing this ASAP.

megashigger · 2017-11-10T21:37:21Z

Thanks @bushibushi, super excited to try this out.

makeitraina · 2017-11-10T21:40:55Z

include/openpose/pose/poseExtractorTensorRT.hpp

+    class OP_API PoseExtractorTensorRT : public PoseExtractor
+    {
+    public:
+        PoseExtractorTensorRT(const Point<int>& netInputSize, const Point<int>& netOutputSize, const Point<int>& outputSize, const int scaleNumber,


Is the definition and implementation for std::vector<int> getHeatMapSize() const; needed here since we are extending PoseExtractor.

…tion errors.

…from scratch.

…ence

…s hardcoded.

bushibushi · 2017-11-15T21:41:51Z

Hi @gineshidalgo99, I had a hard time applying the new PIMPL architecture on my TensorRT versions of netCaffe and poseExtractorCaffe. In the process its seems I broke the pipeline, my TensorRT network forwards to it's output caffe::blob gpu data but nothing is finally displayed. Any clues ?

bushibushi · 2017-11-15T21:46:33Z

I am also having trouble with the runtime-only knowledge of net I/O dimensions as TensorRT networks are not really geared for this.

gineshidalgo99 · 2017-11-16T00:09:55Z

@bushibushi

Sorry for that huge internal change with the PIMPL paradigm and now flexible output size, unfortunately it was completely required for future modularity and scalability, hence unavoidable :(

For the runtime-only knowledge of net I/O dimensions, I guess it's fine since the purpose of RT is real-time applications (i.e. video/webcam), so fixed I/O dimensions should be fine there.

I guess implementing your own poseExtractorRT and netRT is the way to go, so even if I changed e.g. poseExtractorCaffe, you'd have no problems at all in the future (I guess you might start from the old poseExtractorCaffe format when your code was working and just adapting it to the new one?)

About forwarding output and getting nothing, that's definitively weird. Do you mean the GPU data is noise? Or it's size is 0? Or something else? Maybe forgetting about GPU rendering (which requires params from the poseExtractor) and using instead CPU rendering and/or no display but JSON saving?

If I did not really answer your question, please, ask again!

bushibushi · 2017-11-16T09:18:59Z

Hello @gineshidalgo99,

Sorry for that huge internal change with the PIMPL paradigm and now flexible output size, unfortunately it was completely required for future modularity and scalability, hence unavoidable :(

It's ok, it makes sense, just made me regret not to have completed my PR sooner !

For the runtime-only knowledge of net I/O dimensions, I guess it's fine since the purpose of RT is real-time applications (i.e. video/webcam), so fixed I/O dimensions should be fine there.

Well I'll go for the fixed size version with sh launch scripts to create and select the correct prototxt.

I guess implementing your own poseExtractorRT and netRT is the way to go, so even if I changed e.g. poseExtractorCaffe, you'd have no problems at all in the future (I guess you might start from the old poseExtractorCaffe format when your code was working and just adapting it to the new one?)

This is exactly what I did, but while converting it to PIMPL something went wrong.

About forwarding output and getting nothing, that's definitively weird. Do you mean the GPU data is noise? Or it's size is 0? Or something else? Maybe forgetting about GPU rendering (which requires params from the poseExtractor) and using instead CPU rendering and/or no display but JSON saving?

I'll check again tonight.

If I did not really answer your question, please, ask again!

Only thing : where is the info on the input size at runtime now ?

gineshidalgo99 · 2017-11-16T16:14:34Z

Only thing : where is the info on the input size at runtime now ?

It is not directly outputted, but you can print the size of each element of inputNetData as you get it as argument input. PS: inputNetData moved from a single array with all the scales included, to a vector with each scale individually (this gave a 40% speed up in multi-scale). So I'd recommend you to first make it work for single scale and later think about the multi-scale problem (doing everything at once might be harder)

makeitraina · 2017-12-14T19:12:00Z

Hi @bushibushi it looks like this is a pretty tough setup. Do you have any recommended steps to pick up where you left off. Im trying to run tensorRT with what you have done so far with a custom net resolution of 512x288.

djramakrishna · 2021-02-18T20:42:37Z

Hi @gineshidalgo99 @bushibushi is this issue resolved ? I'm really looking forward for Tensor RT inference on jetson nano. Thanks in advance !

bushibushi added 30 commits September 21, 2017 11:23

Files for tensort rt pose detection, for now nothing done.

564aece

Adding timer in new demo and checking build before replacing inference.

dfc1f82

PoseExtractorTensorRT changed names for build conflicts but still per…

a4885e0

…forms Caffe inference.

Started modifying tutorial pose 3.

c05580d

More precise timing.

9a97e93

More precise timings before replacing inference.

4778ed6

Clearer timing display.

9c258b7

Replaced poseExtractorCaffe with poseExtractorTensorRT

e6fbd25

Added inference sample code at end of poseExtractorTensorRT to work o…

f290fc5

…n laptop. DOES NOT compile, convenience commit.

First code adaptation trial. Will not compile, still loads to replace.

ddc2396

New netTensorRT version, cleaner, ready for debug, loads of questions.

f09f27b

Fixed everything to compile, runs, reads network and convert but then…

ba2b435

… segfault, need precise step logs or debug.

Debug logs.

97bbc05

First try on tensorRT inference with caffe Blobs.

c666163

Running, but not pose recognition. Find a way to copy memory correctly.

1c77534

pose.sh script

1380b14

Timing in original pose demo

32f5387

Did not take into account forwardPass input data !

d2310db

Data copied to cuda memory. Correct sizes hardcoded, no CUDA error an…

576c055

…ymore, still not working.

Tutorial pose 3 working !!!! Gaining x2 inference time, now time for …

e5d27fe

…cleaning.

TensorRT Net input and output dimensions at runtime.

7d37095

NetTensorRT cleaning.

f3a898c

NetTensorRT cleaning bis.

5c630b5

Cleaning compilation fix.

a617583

caffeToGIE needs fixed input size and cannot be determined at runtime…

d3a31e0

… for now.

Engine serialization and deserialization.

f6df326

Targetting highest possible FPS in demo.

404077a

Asynchronous inference.

1971baa

Way simpler inference code, a lot was useless.

330d4bb

Removing log to speedup inference.

c2be9aa

makeitraina reviewed Nov 5, 2017

View reviewed changes

liutao2017 mentioned this pull request Nov 9, 2017

Cuda check failed (7 vs. 0): too many resources requested for launch on TX2 #310

Closed

makeitraina reviewed Nov 10, 2017

View reviewed changes

bushibushi added 16 commits November 13, 2017 17:02

PIMPL version of poseExtractorTensorRT, still having template compila…

d4a89d0

…tion errors.

Spot the differences part 1.

766c44a

Spot the differences part 2

c12dd28

Spot the differences 3

047d18b

Fixed compilation without TensorRT

9e4d903

Fix attempt

b3655d0

Wrong variable name

e76dc71

Merge branch 'master' into TensorRT_PR

bbe83e9

Too much changed in poseExtractorCaffe, need to rewrite TensorRT one …

6456dff

…from scratch.

PIMPL for netTensorRT

b3673e6

Fix source issues, example remains.

273a351

Fix samples

ca682c4

Compilation fixed, TensorRT net optimisation works, segfault on infer…

cb0d440

…ence

Code kind of work, not full pipeline lead to no shape displayed, size…

827510b

…s hardcoded.

Useless preproc macros

a1619fa

NetTensorRT modifs

344ab67

Merge branch 'master' into TensorRT_PR

b9c33c9

bushibushi mentioned this pull request Nov 8, 2018

How to accelerate the inference speed by using inference engine (ex:Intel Openvino, NVIDIA TensorRT) instead of using caffe? #914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor RT #285

Tensor RT #285

bushibushi commented Oct 17, 2017 •

edited

Loading

makeitraina Nov 5, 2017

bushibushi Nov 7, 2017

bushibushi commented Nov 7, 2017

megashigger commented Nov 10, 2017

makeitraina Nov 10, 2017

bushibushi commented Nov 15, 2017

bushibushi commented Nov 15, 2017

gineshidalgo99 commented Nov 16, 2017 •

edited

Loading

bushibushi commented Nov 16, 2017

gineshidalgo99 commented Nov 16, 2017 •

edited

Loading

makeitraina commented Dec 14, 2017

djramakrishna commented Feb 18, 2021

Tensor RT #285

Are you sure you want to change the base?

Tensor RT #285

Conversation

bushibushi commented Oct 17, 2017 • edited Loading

makeitraina Nov 5, 2017

Choose a reason for hiding this comment

bushibushi Nov 7, 2017

Choose a reason for hiding this comment

bushibushi commented Nov 7, 2017

megashigger commented Nov 10, 2017

makeitraina Nov 10, 2017

Choose a reason for hiding this comment

bushibushi commented Nov 15, 2017

bushibushi commented Nov 15, 2017

gineshidalgo99 commented Nov 16, 2017 • edited Loading

bushibushi commented Nov 16, 2017

gineshidalgo99 commented Nov 16, 2017 • edited Loading

makeitraina commented Dec 14, 2017

djramakrishna commented Feb 18, 2021

bushibushi commented Oct 17, 2017 •

edited

Loading

gineshidalgo99 commented Nov 16, 2017 •

edited

Loading

gineshidalgo99 commented Nov 16, 2017 •

edited

Loading