Commit Graph

10 Commits

Author SHA1 Message Date
Automated Publisher
73f53ff6b9 📚 Auto-publish: Add/update 2 blog posts
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 3m10s
Generated on: Sun Aug  3 02:37:55 UTC 2025
Source: md-personal repository
2025-08-03 02:37:56 +00:00
38bbe8cbae 🗑️ (posts): remove unused image and its reference in markdown file
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 42s
2025-08-02 19:24:41 -07:00
Automated Publisher
b6192ca3ca 📚 Auto-publish: Add/update 2 blog posts
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 3m31s
Generated on: Sun Aug  3 01:47:39 UTC 2025
Source: md-personal repository
2025-08-03 01:47:39 +00:00
Automated Publisher
0b377b2189 📚 Auto-publish: Add/update 2 blog posts
Some checks failed
Hugo Publish CI / build-and-deploy (push) Failing after 11m2s
Generated on: Sat Aug  2 18:07:06 PDT 2025
Source: md-personal repository
2025-08-02 18:07:06 -07:00
a3ccac4cd2 (content): add new image file to posts directory
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 16s
2025-08-02 15:49:50 -07:00
88cbb7efd5 (posts): add deep dive into PPO for language models post
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 14s
This commit introduces a new blog post detailing the Proximal Policy Optimization (PPO) algorithm as used in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).

The post covers:
- The mapping of RL concepts to text generation.
- The roles of the Actor, Critic, and Reward Model.
- The use of Generalized Advantage Estimation (GAE) for stable credit assignment.
- The PPO clipped surrogate objective for safe policy updates.
- The importance of pretraining loss to prevent catastrophic forgetting.
- The full iterative training loop.
2025-08-02 15:46:24 -07:00
291f598d8c Delete content/posts/credit_card.html
All checks were successful
continuous-integration/drone Build is passing
2023-09-24 05:25:39 +00:00
Eric Liu
0794ee0bce Add rootCA 2020-10-26 04:47:36 +00:00
Eric Liu
74b5002bff Add credit card spending 2020-06-16 23:30:17 -07:00
Eric Liu
2f0990f161 initial commit 2019-02-05 05:18:26 +00:00