Direct preference optimization - Your language model is secretly a reward model

Author

Seil Kang

Published

April 16, 2024

이 글은 Rafailov, Rafael, et al. (Stanford University)이 NeurIPS`23 (oral)에 게재한 Direct preference optimization: Your language model is secretly a reward model를 읽고 정리한 글입니다.

TO BE UPDATED

Reuse

CC BY-NC-SA 4.0

Copyright

Copyright 2024. Seil Kang. All rights reserved. All content and materials on this website and articles are the property of Seil Kang. No part of this website and articles may be reproduced, distributed, transmitted, reused, or modified without prior written permission. Unauthorized use of this website and articles may violate copyright laws and international treaties.