Direct preference optimization - Your language model is secretly a reward model
ai
이 글은 Rafailov, Rafael, et al. (Stanford University)이 NeurIPS`23 (oral)에 게재한 Direct preference optimization: Your language model is secretly a reward model를 읽고 정리한 글입니다.
TO BE UPDATED
Reuse
Copyright
Copyright 2024. Seil Kang. All rights reserved. All content and materials on this website and articles are the property of Seil Kang. No part of this website and articles may be reproduced, distributed, transmitted, reused, or modified without prior written permission. Unauthorized use of this website and articles may violate copyright laws and international treaties.