feat: initial red teaming orchestrator setup #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
wip - generated prompts
wip - add scenarios and handle db
This pull request primarily introduces a new red teaming functionality to the codebase. The most significant changes include the addition of new environment variables, the creation of a red teaming orchestrator in
src/red_teaming_orchestrator.py
, and the addition of new prompts insrc/scenarios/prompts.json
.Environment Variables:
.env_example
: AddedDATABASE_NAME
andMAX_CONVERSATION_TURN
environment variables. These variables are used to set the database name and to limit the number of conversation turns respectively.Red Teaming Orchestrator:
src/red_teaming_orchestrator.py
: This new file contains the logic for the red teaming functionality. It reads prompts from a JSON file, sets up a red teaming orchestrator, and applies an attack strategy until a conversation objective is reached or the maximum number of turns is reached. It also sets up logging and starts a new thread for each prompt.Prompts:
src/scenarios/prompts.json
: Added new prompts for the red teaming functionality. These prompts are read by the red teaming orchestrator and used to guide the conversation.