-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError #1
Comments
@khronimo |
Fantastic. Yeah It would be great to be able to have more options for captioning. I did see your comment about exposing an API, this would be fanatic and extremely useful as there are a host of options like cogLVM, Phi3 vision, LLAVA and some of them are super easy to script with an openapi compatible server like ollama/lm studio. Perhaps captions could be stored in a json file with a title, subtitle and maybe the prompt used. You could use something like CogLVM a few times with a different focus, e.g. clothing, background. Then everything could be displayed alongside the lists below the image that has "CAPTION .TXT FILE", "AUTO TAGS (FYI)". e.g.: "CAPTION .TXT FILE" |
Oh interesting! I'm learning a lot from you - thanks - I hadn't even considered going into this much detail with the autotags! Out of interest, what size datasets are you working with, and how many datasets are you creating on average per month? Is this something you'd want to do for all of them, or a subset? Also, do you use this workflow because you don't want to caption the images yourself (i.e. you want an automated solution), or because the LLMs are coming up with better descriptions than you could have thought of yourself? (Context: I originally wasn't going to add auto tagging to Candy Machine, but only did so because I wanted an "overview" of the tag vocabulary of a dataset so that I could create the initial set of tags. I'm still trying to figure out the "optimal" workflow.) |
I'm playing around with CogVLM2 now. Can you share what prompt(s) you would use to caption an image? |
I have millions of untagged images, at this stage I've been manually going through and building datasets from selections of them. At this stage I've only been doing a couple a month. I've got some good ideas on how to manage catalogues at scale but haven't built anything around that yet.
A lot of people use automated captioning (I believe SD3 was largely captioned by coglvm). I feel the models are almost there but not quite, they miss a lot of detail and they they hallucinate too much leading to lower quality captions. But things are improving fast as things scale and the quality of the datasets used to train them improve. At this stage i like the idea of taking a semi-supervised approach, and either manually add suggested tags to go over them to remove erroneous ones. By using a combination of models and prompts to try and drag out the most detail and manually checking for errors, I think it's possible to produce far better datasets in less time.
I'm still figuring things out there too.
Honestly, pretty much still experimenting with those things. I did a lot more playing around previously with the LLaVA 1.6 Mistral and Vicuna models when they were SOTA (State of the art). Here is one example using an old LLava 1.6 Mistral 7b model, with a bit of work and playing around I think it would be possible to create a nice workflow using several models and prompts to get structured json results with some nice suggested tags, and a lot better than just running it through something like WD alone. One thing for sure, it's got a hell of a lot of potential. Prompt:
Response: Keywords describing the lighting of the scene: dim, subdued, moody, artificial, greenish tint The image depicts a scene with two individuals standing in a hallway or corridor. The lighting is dim and subdued, creating a moody and artificial atmosphere with a greenish tint. The framing of the image is close-up, focusing on the two people side by side. The background features an urban, futuristic, dystopian setting with modern, sleek, minimalist, metallic, concrete, glass, mirrors, lights, shadows, and reflections. The scene location appears to be indoors, possibly within a building or facility. The architecture is characterized by doors, windows, and other architectural elements that contribute to the overall futuristic and dystopian aesthetic of the image. In terms of objects in the scene, there are two people wearing clothing, accessories, and sunglasses. They are also carrying weapons and technology, which further emphasizes the futuristic and dystopian theme. The presence of these items suggests that the individuals may be involved in some sort of conflict or confrontation. The overall mood and atmosphere of the image convey a sense of tension and unease, with the two people standing in the hallway looking serious and focused. Their body language and facial expressions suggest that they are prepared for whatever comes their way. The image does not provide enough information to determine the exact nature of their relationship or the context of the scene. Prompt:
Response: The image features two people standing in a hallway or corridor. The exposure is dim, creating a moody and artificial atmosphere. The location appears to be indoors, possibly within a building or facility. The framing of the image is close-up on body parts, specifically from the waist up. The individuals are wearing leather jackets and sunglasses, which further emphasizes the futuristic and dystopian theme of the scene. |
"one-piece swimsuit" seems to trigger a KeyError.
On a side note, it would be nice to have WD auto tagging as optional as it doesn't suit every dataset.
[2024-06-12 15:16:28,877] ERROR in app: Exception on /project/test/get [POST]
Traceback (most recent call last):
File "I:\Tools\candy-machine\candy-machine-main\venv\lib\site-packages\flask\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
File "I:\Tools\candy-machine\candy-machine-main\venv\lib\site-packages\flask\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
File "I:\Tools\candy-machine\candy-machine-main\venv\lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "I:\Tools\candy-machine\candy-machine-main\venv\lib\site-packages\flask\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "I:\Tools\candy-machine\candy-machine-main\venv\lib\site-packages\flask\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "I:\Tools\candy-machine\candy-machine-main\server\main.py", line 85, in get_project
project = Project(project_name)
File "I:\Tools\candy-machine\candy-machine-main\server\project.py", line 68, in init
self._load()
File "I:\Tools\candy-machine\candy-machine-main\server\project.py", line 132, in _load
self.auto_tags = self._get_filtered_auto_tags(self.project_layout)
File "I:\Tools\candy-machine\candy-machine-main\server\project.py", line 450, in _get_filtered_auto_tags
tag_info.to_dict() for tag_info in self._analyze_auto_tags()
File "I:\Tools\candy-machine\candy-machine-main\server\project.py", line 500, in _analyze_auto_tags
del tags[tag]
KeyError: 'one-piece swimsuit'
The text was updated successfully, but these errors were encountered: