📚 Auto-publish: Add/update 6 blog posts
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 12s
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 12s
Generated on: Tue Sep 23 06:20:36 UTC 2025 Source: md-personal repository
This commit is contained in:
@@ -1 +1,2 @@
|
|||||||
Pasted image 20250816140700.png|.png
|
Pasted image 20250816140700.png|.png
|
||||||
|
image-3632d923eed983f171fba4341825273101f1fc94.png|7713bd3ecf27442e939b9190fa08165d.png|6db5ae66ae4b0212cd6c93ff12d3dc8f
|
||||||
|
@@ -1 +1,2 @@
|
|||||||
Pasted image 20250819211718.png|.png
|
Pasted image 20250819211718.png|.png
|
||||||
|
image-c64b0f9df1e4981c4ecdb3b60e8bc78c426ffa68.png|c7fe4af2633840cfbc81d7c4e3e42d0c.png|42301b756414623256388f1cffc6b76f
|
||||||
|
@@ -8,8 +8,7 @@ draft: false
|
|||||||
Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don't inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
|
Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don't inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
|
||||||
|
|
||||||
You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.
|
You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.
|
||||||
|

|
||||||

|
|
||||||
|
|
||||||
This post will decode that diagram, piece by piece. We'll explore the "why" behind each component, moving from high-level concepts to the deep technical reasoning that makes this process work.
|
This post will decode that diagram, piece by piece. We'll explore the "why" behind each component, moving from high-level concepts to the deep technical reasoning that makes this process work.
|
||||||
|
|
||||||
|
@@ -40,7 +40,7 @@ The dimensions of the weight matrices are as follows:
|
|||||||
### 3. Deconstructing Multi-Head Attention (MHA)
|
### 3. Deconstructing Multi-Head Attention (MHA)
|
||||||
|
|
||||||
The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
|
The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
|
||||||

|

|
||||||
#### 3.1. The "Why": Beyond a Single Attention
|
#### 3.1. The "Why": Beyond a Single Attention
|
||||||
A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating `h` parallel subspaces. Each "head" can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.
|
A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating `h` parallel subspaces. Each "head" can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.
|
||||||
|
|
||||||
|
Binary file not shown.
After Width: | Height: | Size: 1.2 MiB |
Binary file not shown.
After Width: | Height: | Size: 216 KiB |
Reference in New Issue
Block a user