-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions to understand this repo well :D #2
Comments
Hi, so what I used was the raw_caption_superclean.json for the captions file, which you can download with the raw caption zip file I believe. I did run the other repo you mentioned, but I couldn't get any visual results, I only got the quantitative results the author did. Also, just a tip: try to increase the model size as much as possible and run it on a GPU for the best results. Thanks for your interest! |
@ammesatyajit Thank you for your reply !!! |
@FormerAutumn Sure! Im happy to answer any questions you have. |
@ammesatyajit Thanks for your kindness ! Do you know where to get the 'data/newest-data-max-len-20.npy' in https://github.com/MDSKUL/MasterProject/blob/master/stap5/globals.py ? Can I get your email or other contact ways ? Or just make problems here is ok ? If you will provide your contact way, you can email to cnzsy98@163.com. Thank you so much :D |
Hi, sorry to disturb you. |
So for the text next token prediction, there is no video involved, and I am just using the model for next word prediction in a sentence (similar to GPT). This was useful as a sanity check to see if the model gained useful information which I later built on when I tested it on video (Note that I haven't added more inference functionality yet and it is relatively simple to do so). Hope that answers your question |
@FormerAutumn Sorry for not replying earlier. I believe the author links the google drive file that you ask for. You can definitely contact me by email, my email is ammesatyajit@gmail.com. |
Thank you for your consistent reples :D I transfer this method to implement a idea occurs to me(doesn't work now, I might show it github in the future if possible) |
So video_next_tok_pred takes in the tokens from the validation set. It doesn't take in video clips. Hope that answers your question. |
hey, great work! I am also trying to understand your code better. In Video Bert and also the parameters used here you take 4 HIERARCHIES and 12 clusters. The paper says that yields 12**4 = 20736 clusters, but in this code in README you mention concatenating the centroids, and then the label_data labels features by the closest centroid. Wouldn't that yield 124 clusters, effectively 124 video tokens? How does it become 20736 clusters? |
Hi, sorry if the readme was slightly confusing. The 20736 centroids were stored in separate files due to the hierarchical k-means. The only purpose of concatenating them was so I could access all of the centroids with one file. the label data takes in the video feature vectors and finds the closest of these 20736 centroids to effectively tokenized each video. Hope that clears up any confusion. |
That makes sense, thank you! Another question, I am able to run the clustering with this command: for python3 -m hkmeans_minibatch -r features -p ft_hp -b 60 -s vecs_dir2r -c centroid_dir2 -hr 3 -k 15 -e 1 Should the hr and k be in some relation to the batch size? |
@ammesatyajit sorry for my late reply. Thank you for your kindness, I reeeeeeee-read the VideoBERT and found that seems ViT model is more similar to what I wanna implement, so I turn to ViT. :D |
@joaanna Sorry for not replying earlier. I am not going to be able to provide a detailed response because I am a little busy at the moment due to personal reasons, but if you want to, you can read the code/the docs for my hkmeans code: https://github.com/ammesatyajit/hierarchical-minibatch-kmeans. I will try to reproduce your error as soon as possible and get back to you on what the problem is. Also, could I ask for you to tell me the dimensions of your input data files? The batch size should ideally be larger than the number of vectors in each input file. For example, I used a batch size of 500 when I did hkmeans on files with 20 vectors each. |
@FormerAutumn no problem. Vision transformer is really interesting, hope you find what you are looking for :) |
@joaanna can u share the data u downloaded the site is down or something i am unable to download the cooking videos data |
First, thank you for your great work :D
Question is like the title asked.
In short, which file should I download if I want to match the 'Captions'(appears in the last sentence of quotes) ? I have downloaded the howto100m_captions.zip(2.3G), is it correct ? I saw the files in it are all .csv file ;_;
What's more, Has anyone run the repo which the author said that he/she inspired by ? the repo is https://github.com/MDSKUL/MasterProject
The text was updated successfully, but these errors were encountered: