You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe LLM07 could benefit from some or all of these mitigation methods to be included in the vulnerability:
Human in the Loop: A plugin should not be able to invoke another plugin (by default), especially plugins with high stakes operations and to ensure that generated content meets quality and ethical standards.
It should be transparent to the user which plugin will be invoked, and what data is sent to it. Possibly even allow modifying the data before its sent.
A security contract and threat model for plugins should be created, so we can have a secure and open infrastructure where all parties know what their security responsibilities are.
An LLM application must assume plugins cannot be trusted (e.g. direct or indirect prompt injection), and similarly plugins cannot blindly trust LLM application invocations (example: confused deputy attack)
Regularly perform red teaming and model serialization attacks with thorough benchmarking and reporting of input and outputs.
Plugins that handle PII and/or impersonate the user are high stakes.
Isolation. Discussed before that follows a Kernel LLM vs. Sandbox LLMs could help.
These mitigation techniques are primarily focused towards combatting indirect prompt injection, but should pretty much be a defacto standard. I also think there should be some sort of statement or wording such as "Plugins should never be inheriently trusted".
I believe LLM07 could benefit from some or all of these mitigation methods to be included in the vulnerability:
These mitigation techniques are primarily focused towards combatting indirect prompt injection, but should pretty much be a defacto standard. I also think there should be some sort of statement or wording such as "Plugins should never be inheriently trusted".
Resource and inspiration kudos to embracethered.
The text was updated successfully, but these errors were encountered: