📚 Auto-publish: Add/update 6 blog posts

Generated on: Tue Sep 23 06:20:36 UTC 2025 Source: md-personal repository
2025-09-23 06:20:36 +00:00
parent 7cd5bd6558
commit 2b2203c6f7
6 changed files with 4 additions and 3 deletions
--- a/content/posts/transformer-s-core-mechanics.md
+++ b/content/posts/transformer-s-core-mechanics.md
@@ -40,7 +40,7 @@ The dimensions of the weight matrices are as follows:
 ### 3. Deconstructing Multi-Head Attention (MHA)

 The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
-![](/images/transformer-s-core-mechanics/.png)
+![S3 File](/images/transformer-s-core-mechanics/c7fe4af2633840cfbc81d7c4e3e42d0c.png)
 #### 3.1. The "Why": Beyond a Single Attention
 A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating `h` parallel subspaces. Each "head" can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.