Status: Pending
Author: Jared Kaplan, Yuntao Ba
Topic: Generative, Large-Language-Models, Training Method
Category: Instruction-Finetuning, Reinforcement-Learning, Unsupervised
Conference: arXiv
Year: 2022
Link: https://arxiv.org/pdf/2212.08073.pdf
Summary: The paper introduces Constitutional AI, a method for training a safe AI assistant without human-labeled data on harmful outputs. It combines supervised learning and reinforcement learning phases, enabling the AI to engage with harmful queries by explaining its objections, thus improving control, transparency, and human-judged performance with minimal human oversight.