You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose to add a prompt injection attack scenario specific to organizations that are looking into implementing autonomous ReAct based agents to expose to customers, such as those that can be be built with Langchain. In these agents, it is possible to leverage prompt injection to inject Thoughts, Actions and Observations that alter the behaviour of the agent, but not via a direct jailbreak (i.e.: deviation from the system prompt) - instead they do so by altering the external reality of the agent.
Suggested text for the example to add: "An autonomous LLM agent on a web shop assists users in managining orders. It can access order details and issue refunds by using a set of tools via the ReAct (Reason+Act) framework. An attacker injects forged Thoughts, Actions and Observations into the LLM context via prompt injection. This tricks the LLM into thinking that the user has ordered an item they have not and that this item is eligible for a refund. Under these false premises, the agent proceeds to issue a refund to the malicious user for an order they never placed."
It's debatable though where such a scenario would fit: it's an injection attack, but not a direct jail break. The "LLM08: Excessive Agency" vulnerability definitely has an overlap, so does the the "LLM07: Insecure Plugin Design" to an extent. In this scenario mentioned by the article the agent should not be able to issue a refund if the original order doesn't exist or if information doesn't match the records.
The text was updated successfully, but these errors were encountered:
Thanks. Agree this is both prompt inj plus insecure plugin design; prompt inj. often provides an attack vector. ReAct is a fine case study for prompt injection impact, and an example is a good place to highlight this. Will update
I propose to add a prompt injection attack scenario specific to organizations that are looking into implementing autonomous ReAct based agents to expose to customers, such as those that can be be built with Langchain. In these agents, it is possible to leverage prompt injection to inject Thoughts, Actions and Observations that alter the behaviour of the agent, but not via a direct jailbreak (i.e.: deviation from the system prompt) - instead they do so by altering the external reality of the agent.
Reference and attack example:
https://labs.withsecure.com/publications/llm-agent-prompt-injection
Probably better understood in action in this short 30-sec video: https://www.linkedin.com/posts/withsecure_prompt-injection-for-react-llm-agents-ugcPost-7125756992341618688-ZFQ5
Suggested text for the example to add: "An autonomous LLM agent on a web shop assists users in managining orders. It can access order details and issue refunds by using a set of tools via the ReAct (Reason+Act) framework. An attacker injects forged Thoughts, Actions and Observations into the LLM context via prompt injection. This tricks the LLM into thinking that the user has ordered an item they have not and that this item is eligible for a refund. Under these false premises, the agent proceeds to issue a refund to the malicious user for an order they never placed."
It's debatable though where such a scenario would fit: it's an injection attack, but not a direct jail break. The "LLM08: Excessive Agency" vulnerability definitely has an overlap, so does the the "LLM07: Insecure Plugin Design" to an extent. In this scenario mentioned by the article the agent should not be able to issue a refund if the original order doesn't exist or if information doesn't match the records.
The text was updated successfully, but these errors were encountered: