ericxliu-me

eric/ericxliu-me

Fork 0

Files

History

Eric Liu 88cbb7efd5

Hugo Publish CI / build-and-deploy (push) Successful in 14s

Details

✨ (posts): add deep dive into PPO for language models post

This commit introduces a new blog post detailing the Proximal Policy Optimization (PPO) algorithm as used in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).

The post covers:
- The mapping of RL concepts to text generation.
- The roles of the Actor, Critic, and Reward Model.
- The use of Generalized Advantage Estimation (GAE) for stable credit assignment.
- The PPO clipped surrogate objective for safe policy updates.
- The importance of pretraining loss to prevent catastrophic forgetting.
- The full iterative training loop.

2025-08-02 15:46:24 -07:00

A Deep Dive into PPO for Language Models.md

✨ (posts): add deep dive into PPO for language models post

2025-08-02 15:46:24 -07:00

useful.md

Add rootCA

2020-10-26 04:47:36 +00:00