ericxliu-me

Author	SHA1	Message	Date
Eric Liu	a3ccac4cd2	✨ (content): add new image file to posts directory All checks were successful Hugo Publish CI / build-and-deploy (push) Successful in 16s Details	2025-08-02 15:49:50 -07:00
Eric Liu	88cbb7efd5	✨ (posts): add deep dive into PPO for language models post All checks were successful Hugo Publish CI / build-and-deploy (push) Successful in 14s Details This commit introduces a new blog post detailing the Proximal Policy Optimization (PPO) algorithm as used in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs). The post covers: - The mapping of RL concepts to text generation. - The roles of the Actor, Critic, and Reward Model. - The use of Generalized Advantage Estimation (GAE) for stable credit assignment. - The PPO clipped surrogate objective for safe policy updates. - The importance of pretraining loss to prevent catastrophic forgetting. - The full iterative training loop.	2025-08-02 15:46:24 -07:00
eric Liu	291f598d8c	Delete content/posts/credit_card.html All checks were successful continuous-integration/drone Build is passing Details	2023-09-24 05:25:39 +00:00
Eric Liu	0794ee0bce	Add rootCA	2020-10-26 04:47:36 +00:00
Eric Liu	74b5002bff	Add credit card spending	2020-06-16 23:30:17 -07:00
Eric Liu	2f0990f161	initial commit	2019-02-05 05:18:26 +00:00