Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-compute automatic labelling models #200

Open
KDeser opened this issue Dec 7, 2024 · 3 comments
Open

Pre-compute automatic labelling models #200

KDeser opened this issue Dec 7, 2024 · 3 comments

Comments

@KDeser
Copy link

KDeser commented Dec 7, 2024

Feature request: ability to run annotation models ahead of time for an entire dataset and then load these from storage while performing interactive annotation.

Justification: In a data annotation project, the users who perform annotation often lack powerful GPUs. Even if they do have them, it can be difficult to get them working properly due to dependancies issues, especially in a corporate environment with tight security restrictions. However there is always at least one person (a data scientist usually) who does have a powerful working GPU, who can spare the time to run the model(s) ahead of time before distributing work to the annotation team. A secondary benefit is the elimination of waiting for the models to run after advancing to a new image.

Solution? Option to run the auto-annotation functions in batch mode and cache the results to disk for use in future sessions. In particular this would benefit SAM.

I am happy to contribute as a beginner!

@KDeser KDeser changed the title Pre-compute Pre-compute automatic labelling models Dec 7, 2024
@vietanhdev
Copy link
Owner

Hi @KDeser,
Thank you for suggesting. It would be a very good feature! :D
Please go ahead and implement it.
I think we should keep the computed embedding in a way that people can copy and share them easily. What do you think about it?
Does @mnmnk43434 's idea and implementation help in some way?

@KDeser
Copy link
Author

KDeser commented Dec 8, 2024

Hi @vietanhdev

Sounds good, I will get started!

As the main developer (or are you a bot?) do you have any suggestions on where to begin? We should try to avoid instantiating the ONNX models and related dependencies. Will modifying LRUCache class so it contains a match for every filename be enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@vietanhdev @KDeser and others