📚 Auto-publish: Add/update 6 blog posts

Generated on: Thu Oct 2 08:42:39 UTC 2025 Source: md-personal repository
2025-10-02 08:42:39 +00:00
parent ca873828aa
commit 7ef6ce1987
6 changed files with 6 additions and 5 deletions
--- a/content/posts/transformer-s-core-mechanics.md
+++ b/content/posts/transformer-s-core-mechanics.md
@@ -40,7 +40,7 @@ The dimensions of the weight matrices are as follows:
 ### 3. Deconstructing Multi-Head Attention (MHA)

 The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
-![S3 File](http://localhost:4998/attachments/image-c64b0f9df1e4981c4ecdb3b60e8bc78c426ffa68.png?client=default&bucket=obsidian)
+![S3 File](/images/transformer-s-core-mechanics/c7fe4af2633840cfbc81d7c4e3e42d0c.png)
 #### 3.1. The "Why": Beyond a Single Attention
 A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating `h` parallel subspaces. Each "head" can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.