Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
wj-Mcat committed Aug 16, 2024
1 parent b3fb042 commit ade3c15
Showing 1 changed file with 14 additions and 4 deletions.
18 changes: 14 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,29 @@
</p>
</h1>

## 初衷
## 因为热爱,所以分享

分享自己在工作学习过程中对于Agent所有的知识点,并定期将其编写成一篇篇博客,进而跟大家讨论学习,共同进步。
我始终相信开源,也热爱分享在工作学习过程中对于Agent所有觉得有价值、有意思的知识点,并定期将其编写成一篇篇博客,进而跟大家讨论学习,共同进步。

也非常欢迎大家能够一起贡献 PR 来不断完善此博客,不断完善,让其成为一个真正的 Agent Handbook。

## 知识点

### Agent Introduction

[![What's next for AI agentic workflows ft. Andrew Ng of AI Fund](https://img.youtube.com/vi/sal78ACtGTc/0.jpg)](https://www.youtube.com/watch?v=sal78ACtGTc)

### Agent Workflow

## Paper Reading

* **ORPO: Monolithic Preference Optimization without Reference Model**
### ORPO: Monolithic Preference Optimization without Reference Model

ORPO 提出了一个非常创新的方法:将 模型对齐阶段 和 SFT阶段 融合到一起,进而提升模型的训练方法。

在 SFT 阶段,就直接将对齐的数据加入到训练当中,进而在SFT 阶段就已经实现了模型对齐的能力。

* **Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models**
### Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

`解决的问题`:此论文旨在提升提供一个创建高质量指令跟随数据集的方法,进而提高在不同方法中指令学习的能力。

Expand Down

0 comments on commit ade3c15

Please sign in to comment.