Skip to content

Xiaobo's Blog

field notes on reinforcement learning, language models, and the math underneath.

Posts — 1

Gradient Estimation of KL Divergence in Large Language Model Reinforcement Learning Mar 14, 2025
Three estimators, one loss, and the subtle gap between ∇E[·] and E[∇·].

Reinforcement LearningKL divergenceTheory