-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the model capable of detecting open-vocabulary objects, such as grounding dino? #2
Comments
@aixiaodewugege |
Thanks for your reply! Have you considered the possibility that HOI could enhance the accuracy of action recognition problems, like Kinetics400, given that you claim your model has superior zero-shot performance? |
@aixiaodewugege |
Thanks! Do you have any suggestions on how I can integrate a video-based action recognition model with an image-based HOI model? Should I use the same image encoder, like mPLUG? |
@aixiaodewugege |
Thanks! I'm new to the HOI (Human-Object Interaction) and action recognition tasks. They have been using different dataset. I'm curious as to why there haven't been attempts to combine them, given that it seems logically beneficial for both tasks. |
@aixiaodewugege |
|
Thanks for your brilliant work!
I'm wondering if the model can detect all objects, such as a 'grounding dino'?
The text was updated successfully, but these errors were encountered: