Constitutional AI: Harmlessness from AI Feedback

Status: Pending

Author: Jared Kaplan, Yuntao Ba

Topic: Generative, Large-Language-Models, Training Method

Category: Instruction-Finetuning, Reinforcement-Learning, Unsupervised

Conference: arXiv

Year: 2022

Link: https://arxiv.org/pdf/2212.08073.pdf

Summary: The paper introduces Constitutional AI, a method for training a safe AI assistant without human-labeled data on harmful outputs. It combines supervised learning and reinforcement learning phases, enabling the AI to engage with harmful queries by explaining its objections, thus improving control, transparency, and human-judged performance with minimal human oversight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constitutional_AI_Harmlessness_from_AI_Feedback.md

Constitutional_AI_Harmlessness_from_AI_Feedback.md

Constitutional AI: Harmlessness from AI Feedback

Questions

What did authors try to accomplish?

What were the key elements of the approach?

What can you use yourself from this paper?

What other references to follow?

Files

Constitutional_AI_Harmlessness_from_AI_Feedback.md

Latest commit

History

Constitutional_AI_Harmlessness_from_AI_Feedback.md

File metadata and controls

Constitutional AI: Harmlessness from AI Feedback

Questions

What did authors try to accomplish?

What were the key elements of the approach?

What can you use yourself from this paper?

What other references to follow?