Direct preference optimization - Your language model is secretly a reward model

ai
Author

Seil Kang

Published

April 16, 2024

이 글은 Rafailov, Rafael, et al. (Stanford University)이 NeurIPS`23 (oral)에 게재한 Direct preference optimization: Your language model is secretly a reward model를 읽고 정리한 글입니다.

TO BE UPDATED

Reuse