Allow using files/images from HTTP request/webscraper/other nodes as the context/input of an LLM node that support vision #4323

pjamessteven · 2024-05-12T11:16:00Z

pjamessteven
May 12, 2024

This would be a really powerful feature for workflow mode. It feels quite restrictive the way dify handles files and images at the moment. You can't do much with them. It looks like a lot of the groundwork is already in the front end code base at least. If I find the time I might try work on this but otherwise just putting this idea out there.

One thing I would love to do with dify is use a webscraper or web-screengrab component to get an image of a full webpage, and then pass it to an LLM node with vision to ask it questions about the page. I find that chat-gpt's vision capabilities are much more capable than the current webscraper tool (as long as you tell it not to fall back to the python OCR package).

Lagyu · 2024-06-20T00:04:50Z

Lagyu
Jun 20, 2024

+1
I really want to fully make use of gpt-4o or other models with image inputs from external resources.

0 replies

MatheMatrix · 2024-08-29T04:58:28Z

MatheMatrix
Aug 29, 2024

+1

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using files/images from HTTP request/webscraper/other nodes as the context/input of an LLM node that support vision #4323

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Allow using files/images from HTTP request/webscraper/other nodes as the context/input of an LLM node that support vision #4323

pjamessteven May 12, 2024

Replies: 2 comments

Lagyu Jun 20, 2024

MatheMatrix Aug 29, 2024

pjamessteven
May 12, 2024

Lagyu
Jun 20, 2024

MatheMatrix
Aug 29, 2024