Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor RT #285

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

bushibushi
Copy link
Contributor

@bushibushi bushibushi commented Oct 17, 2017

Opening this for those wanting to test Tensor RT inference in advance. Still using FP32 in caffe blobs for now to keep things simple, but still get 2x shorter inference on jetson TX2. Did not test on bigger cards.

Documentation not done yet and usage still by hand: you have to duplicate the prototxt file and call it pose_deploy_linevec.prototxt_<desired_input_height>x<desired_input_width> and make these values match with the net_resolution used and the values in the new prototxt file.

First usage will take a bit of time to create the serialized TensorRT engine, then nexts runs will be faster.

Branch compilation separation messed things up, if you want to test checkout commit
b3ae8ec or the one before the merge.

…n laptop. DOES NOT compile, convenience commit.

ifeq ($(DEEP_NET), tensorrt)
COMMON_FLAGS += -DUSE_TENSORRT
endif
endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DEEP_NET is tensorrt then the else clause is never reached hence lines 73-76 are not needed. It looks like the libraries and dirs that are part of the else clause will still be needed for tensorrt however so that should be moved up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now TensorRT is included for the main inferences on the middle of a pipeline using CAFFE, for example I use caffe blobs for input and output. I think it's line 64-65 that should be removed.

@bushibushi
Copy link
Contributor Author

Sorry for the waiting, got caught up in some other work, I now try polishing this ASAP.

@megashigger
Copy link

Thanks @bushibushi, super excited to try this out.

class OP_API PoseExtractorTensorRT : public PoseExtractor
{
public:
PoseExtractorTensorRT(const Point<int>& netInputSize, const Point<int>& netOutputSize, const Point<int>& outputSize, const int scaleNumber,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the definition and implementation for std::vector<int> getHeatMapSize() const; needed here since we are extending PoseExtractor.

@bushibushi
Copy link
Contributor Author

Hi @gineshidalgo99, I had a hard time applying the new PIMPL architecture on my TensorRT versions of netCaffe and poseExtractorCaffe. In the process its seems I broke the pipeline, my TensorRT network forwards to it's output caffe::blob gpu data but nothing is finally displayed. Any clues ?

@bushibushi
Copy link
Contributor Author

I am also having trouble with the runtime-only knowledge of net I/O dimensions as TensorRT networks are not really geared for this.

@gineshidalgo99
Copy link
Member

gineshidalgo99 commented Nov 16, 2017

@bushibushi

Sorry for that huge internal change with the PIMPL paradigm and now flexible output size, unfortunately it was completely required for future modularity and scalability, hence unavoidable :(

For the runtime-only knowledge of net I/O dimensions, I guess it's fine since the purpose of RT is real-time applications (i.e. video/webcam), so fixed I/O dimensions should be fine there.

I guess implementing your own poseExtractorRT and netRT is the way to go, so even if I changed e.g. poseExtractorCaffe, you'd have no problems at all in the future (I guess you might start from the old poseExtractorCaffe format when your code was working and just adapting it to the new one?)

About forwarding output and getting nothing, that's definitively weird. Do you mean the GPU data is noise? Or it's size is 0? Or something else? Maybe forgetting about GPU rendering (which requires params from the poseExtractor) and using instead CPU rendering and/or no display but JSON saving?

If I did not really answer your question, please, ask again!

@bushibushi
Copy link
Contributor Author

Hello @gineshidalgo99,

Sorry for that huge internal change with the PIMPL paradigm and now flexible output size, unfortunately it was completely required for future modularity and scalability, hence unavoidable :(

It's ok, it makes sense, just made me regret not to have completed my PR sooner !

For the runtime-only knowledge of net I/O dimensions, I guess it's fine since the purpose of RT is real-time applications (i.e. video/webcam), so fixed I/O dimensions should be fine there.

Well I'll go for the fixed size version with sh launch scripts to create and select the correct prototxt.

I guess implementing your own poseExtractorRT and netRT is the way to go, so even if I changed e.g. poseExtractorCaffe, you'd have no problems at all in the future (I guess you might start from the old poseExtractorCaffe format when your code was working and just adapting it to the new one?)

This is exactly what I did, but while converting it to PIMPL something went wrong.

About forwarding output and getting nothing, that's definitively weird. Do you mean the GPU data is noise? Or it's size is 0? Or something else? Maybe forgetting about GPU rendering (which requires params from the poseExtractor) and using instead CPU rendering and/or no display but JSON saving?

I'll check again tonight.

If I did not really answer your question, please, ask again!

Only thing : where is the info on the input size at runtime now ?

@gineshidalgo99
Copy link
Member

gineshidalgo99 commented Nov 16, 2017

Only thing : where is the info on the input size at runtime now ?

It is not directly outputted, but you can print the size of each element of inputNetData as you get it as argument input. PS: inputNetData moved from a single array with all the scales included, to a vector with each scale individually (this gave a 40% speed up in multi-scale). So I'd recommend you to first make it work for single scale and later think about the multi-scale problem (doing everything at once might be harder)

@makeitraina
Copy link

Hi @bushibushi it looks like this is a pretty tough setup. Do you have any recommended steps to pick up where you left off. Im trying to run tensorRT with what you have done so far with a custom net resolution of 512x288.

@djramakrishna
Copy link

Hi @gineshidalgo99 @bushibushi is this issue resolved ? I'm really looking forward for Tensor RT inference on jetson nano. Thanks in advance !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants