Xiaobo's Blog
|
中文
Posts
Archive
Search
Tags
Contact
Archive
2025
1
March
1
Gradient Estimation of KL Divergence in Large Language Model Reinforcement Learning
March 14, 2025
· 10 min · 2021 words · Xiaobo Yang