Xiaobo's Blog
zh

 

Xiaobo's Blog

field notes on reinforcement learning, language models, and the math underneath.

Posts — 1

  1. Gradient Estimation of KL Divergence in Large Language Model Reinforcement Learning

    Three estimators, one loss, and the subtle gap between ∇E[·] and E[∇·].

    Reinforcement LearningKL divergenceTheory