Allow using files/images from HTTP request/webscraper/other nodes as the context/input of an LLM node that support vision #4323
pjamessteven
started this conversation in
Suggestion
Replies: 2 comments
-
+1 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This would be a really powerful feature for workflow mode. It feels quite restrictive the way dify handles files and images at the moment. You can't do much with them. It looks like a lot of the groundwork is already in the front end code base at least. If I find the time I might try work on this but otherwise just putting this idea out there.
One thing I would love to do with dify is use a webscraper or web-screengrab component to get an image of a full webpage, and then pass it to an LLM node with vision to ask it questions about the page. I find that chat-gpt's vision capabilities are much more capable than the current webscraper tool (as long as you tell it not to fall back to the python OCR package).
Beta Was this translation helpful? Give feedback.
All reactions