Skip to content

Latest commit

 

History

History
28 lines (15 loc) · 879 Bytes

Constitutional_AI_Harmlessness_from_AI_Feedback.md

File metadata and controls

28 lines (15 loc) · 879 Bytes

Constitutional AI: Harmlessness from AI Feedback

Status: Pending

Author: Jared Kaplan, Yuntao Ba

Topic: Generative, Large-Language-Models, Training Method

Category: Instruction-Finetuning, Reinforcement-Learning, Unsupervised

Conference: arXiv

Year: 2022

Link: https://arxiv.org/pdf/2212.08073.pdf

Summary: The paper introduces Constitutional AI, a method for training a safe AI assistant without human-labeled data on harmful outputs. It combines supervised learning and reinforcement learning phases, enabling the AI to engage with harmful queries by explaining its objections, thus improving control, transparency, and human-judged performance with minimal human oversight.

Questions

What did authors try to accomplish?

What were the key elements of the approach?

What can you use yourself from this paper?

What other references to follow?