diff --git a/404.html b/404.html
index 43db4d1..a52d79f 100644
--- a/404.html
+++ b/404.html
@@ -4,4 +4,4 @@
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/about/index.html b/about/index.html
index 5d5e13a..29f7360 100644
--- a/about/index.html
+++ b/about/index.html
@@ -4,4 +4,4 @@
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/categories/index.html b/categories/index.html
index 39026ab..3a4a75a 100644
--- a/categories/index.html
+++ b/categories/index.html
@@ -4,4 +4,4 @@
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/images/ppo-for-language-models/.png b/images/ppo-for-language-models/.png
new file mode 100644
index 0000000..2d74573
Binary files /dev/null and b/images/ppo-for-language-models/.png differ
diff --git a/images/transformer-s-core-mechanics/.png b/images/transformer-s-core-mechanics/.png
new file mode 100644
index 0000000..a7d5e5d
Binary files /dev/null and b/images/transformer-s-core-mechanics/.png differ
diff --git a/index.html b/index.html
index b39061a..52ddc34 100644
--- a/index.html
+++ b/index.html
@@ -4,4 +4,4 @@
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/index.xml b/index.xml
index 111d1bf..9939ea3 100644
--- a/index.xml
+++ b/index.xml
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 20 Aug 2025 06:02:35 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>A Technical Deep Dive into the Transformer's Core Mechanics</title><link>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 20 Aug 2025 06:04:36 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>A Technical Deep Dive into the Transformer's Core Mechanics</title><link>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
 &lt;h3 id="1-the-channel-a-foundational-view-of-d_model">
  1. The &amp;ldquo;Channel&amp;rdquo;: A Foundational View of &lt;code>d_model&lt;/code>
  &lt;a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
@@ -6,7 +6,23 @@
  &lt;span class="sr-only">Link to heading&lt;/span>
  &lt;/a>
 &lt;/h3>
-&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>Quantization in LLMs</title><link>/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/quantization-in-llms/</guid><description>&lt;p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.&lt;/p></description></item><item><title>A Comprehensive Guide to Breville Barista Pro Maintenance</title><link>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
+&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>Quantization in LLMs</title><link>/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/quantization-in-llms/</guid><description>&lt;p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.&lt;/p></description></item><item><title>Transformer's Core Mechanics</title><link>/posts/transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
+&lt;h3 id="1-the-channel-a-foundational-view-of-d_model">
+ 1. The &amp;ldquo;Channel&amp;rdquo;: A Foundational View of &lt;code>d_model&lt;/code>
+ &lt;a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
+ &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading">&lt;/i>
+ &lt;span class="sr-only">Link to heading&lt;/span>
+ &lt;/a>
+&lt;/h3>
+&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>A Comprehensive Guide to Breville Barista Pro Maintenance</title><link>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
+&lt;h4 id="understanding-the-two-primary-maintenance-cycles">
+ &lt;strong>Understanding the Two Primary Maintenance Cycles&lt;/strong>
+ &lt;a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
+ &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading">&lt;/i>
+ &lt;span class="sr-only">Link to heading&lt;/span>
+ &lt;/a>
+&lt;/h4>
+&lt;p>The Breville Barista Pro has two distinct, automated maintenance procedures: the &lt;strong>Cleaning (Flush) Cycle&lt;/strong> and the &lt;strong>Descale Cycle&lt;/strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.&lt;/p></description></item><item><title>Breville Barista Pro Maintenance</title><link>/posts/breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
 &lt;h4 id="understanding-the-two-primary-maintenance-cycles">
  &lt;strong>Understanding the Two Primary Maintenance Cycles&lt;/strong>
  &lt;a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
@@ -22,6 +38,7 @@
 &lt;p>That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.&lt;/p></description></item><item><title>Beyond Words: How RVQ Teaches LLMs to See and Hear</title><link>/posts/how-rvq-teaches-llms-to-see-and-hear/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/how-rvq-teaches-llms-to-see-and-hear/</guid><description>&lt;p>Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?&lt;/p>
 &lt;p>The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is &lt;strong>Residual Vector Quantization (RVQ)&lt;/strong>.&lt;/p></description></item><item><title>Supabase Deep Dive: It's Not Magic, It's Just Postgres</title><link>/posts/supabase-deep-dive/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/supabase-deep-dive/</guid><description>&lt;p>In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what&amp;rsquo;s really going on.&lt;/p>
 &lt;p>Supabase enters this space with a radically different philosophy: &lt;strong>transparency&lt;/strong>. It provides the convenience of a BaaS, but it’s built on the world&amp;rsquo;s most trusted relational database: PostgreSQL. The &amp;ldquo;magic&amp;rdquo; isn&amp;rsquo;t a proprietary black box; it&amp;rsquo;s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/a-deep-dive-into-ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-deep-dive-into-ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
+&lt;p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
 &lt;p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.&lt;/p></description></item><item><title>Mixture-of-Experts (MoE) Models Challenges &amp; Solutions in Practice</title><link>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</link><pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate><guid>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</guid><description>&lt;p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &amp;ldquo;experts&amp;rdquo;) to specialize in different types of inputs. A &amp;ldquo;gating network&amp;rdquo; or &amp;ldquo;router&amp;rdquo; learns to dispatch each input (or &amp;ldquo;token&amp;rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.&lt;/p>
 &lt;h3 id="1-challenge-non-differentiability-of-routing-functions">
  1. Challenge: Non-Differentiability of Routing Functions
diff --git a/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/index.html b/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/index.html
index 39671da..1d67f42 100644
--- a/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/index.html
+++ b/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/index.html
@@ -25,4 +25,4 @@ Understanding the Two Primary Maintenance Cycles Link to heading The Breville Ba
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/a-deep-dive-into-ppo-for-language-models/index.html b/posts/a-deep-dive-into-ppo-for-language-models/index.html
index e31a38e..47df78d 100644
--- a/posts/a-deep-dive-into-ppo-for-language-models/index.html
+++ b/posts/a-deep-dive-into-ppo-for-language-models/index.html
@@ -23,4 +23,4 @@ where <code>δ_t = r_t + γV(s_{t+1}) - V(s_t)</code></p><ul><li><strong>γ (gam
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/index.html b/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/index.html
index 4a34494..0063340 100644
--- a/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/index.html
+++ b/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/index.html
@@ -36,4 +36,4 @@ In deep learning, a &ldquo;channel&rdquo; can be thought of as a feature dimensi
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/breville-barista-pro-maintenance/index.html b/posts/breville-barista-pro-maintenance/index.html
new file mode 100644
index 0000000..24c269f
--- /dev/null
+++ b/posts/breville-barista-pro-maintenance/index.html
@@ -0,0 +1,28 @@
+<!doctype html><html lang=en><head><title>Breville Barista Pro Maintenance · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta name=author content="Eric X. Liu"><meta name=description content="Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
+
+  Understanding the Two Primary Maintenance Cycles
+  
+    
+    Link to heading
+  
+
+The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Breville Barista Pro Maintenance"><meta name=twitter:description content="Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
+Understanding the Two Primary Maintenance Cycles Link to heading The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine."><meta property="og:url" content="/posts/breville-barista-pro-maintenance/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Breville Barista Pro Maintenance"><meta property="og:description" content="Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
+Understanding the Two Primary Maintenance Cycles Link to heading The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-08-16T00:00:00+00:00"><meta property="article:modified_time" content="2025-08-20T06:04:36+00:00"><link rel=canonical href=/posts/breville-barista-pro-maintenance/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.6445a802b9389c9660e1b07b724dcf5718b1065ed2d71b4eeaf981cc7cc5fc46.css integrity="sha256-ZEWoArk4nJZg4bB7ck3PVxixBl7S1xtO6vmBzHzF/EY=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/>Eric X. Liu's Personal Page
+</a><input type=checkbox id=menu-toggle>
+<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=/posts/breville-barista-pro-maintenance/>Breville Barista Pro Maintenance</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
+<time datetime=2025-08-16T00:00:00Z>August 16, 2025
+</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
+5-minute read</span></div></div></header><div class=post-content><p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.</p><h4 id=understanding-the-two-primary-maintenance-cycles><strong>Understanding the Two Primary Maintenance Cycles</strong>
+<a class=heading-link href=#understanding-the-two-primary-maintenance-cycles><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>The Breville Barista Pro has two distinct, automated maintenance procedures: the <strong>Cleaning (Flush) Cycle</strong> and the <strong>Descale Cycle</strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.</p><ul><li><strong>Cleaning Cycle (Flush):</strong> This process is designed to remove coffee oils and granulated residue from the group head, shower screen, and portafilter system.</li><li><strong>Descale Cycle:</strong> This process targets the internal components of the machine, such as the thermocoil and water lines, to remove mineral and limescale deposits from water.</li></ul><h4 id=procedure-1-the-cleaning-flush-cycle><strong>Procedure 1: The Cleaning (Flush) Cycle</strong>
+<a class=heading-link href=#procedure-1-the-cleaning-flush-cycle><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>The machine will indicate when a cleaning cycle is needed by displaying a &ldquo;FLUSH&rdquo; alert on the LCD screen. This typically occurs after approximately 200 extractions.</p><p><strong>Required Materials:</strong></p><ul><li>1-Cup filter basket</li><li>Grey silicone cleaning disc (provided with the machine)</li><li>One cleaning tablet</li></ul><p><strong>Step-by-Step Instructions:</strong></p><ol><li>Insert the 1-cup filter basket into the portafilter.</li><li>Place the grey silicone cleaning disc inside the basket.</li><li>Position one cleaning tablet in the center of the disc.</li><li>Lock the portafilter firmly into the group head.</li><li>Ensure the drip tray is empty and the water tank is filled.</li><li>Press the &lsquo;MENU&rsquo; button and use the &lsquo;Grind Amount&rsquo; dial to navigate to the &lsquo;FLUSH&rsquo; option. Press the dial to select it.</li><li>The &lsquo;1 CUP&rsquo; button will illuminate. Press it to initiate the cycle.</li><li>The cleaning process will last approximately five minutes, with the machine backflushing water under pressure. The remaining time will be displayed on the screen.</li><li>Upon completion, the machine will beep and return to its ready state.</li><li>Remove the portafilter and discard the water and dissolved tablet residue. Thoroughly rinse the portafilter, cleaning disc, and filter basket.</li><li>Re-insert the portafilter (without the disc or tablet) and run a shot of hot water through the group head to rinse any remaining cleaning solution.</li></ol><h4 id=procedure-2-the-descale-cycle><strong>Procedure 2: The Descale Cycle</strong>
+<a class=heading-link href=#procedure-2-the-descale-cycle><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>The machine will alert you when descaling is required. The frequency depends on water hardness and usage but is generally recommended every 2-3 months.</p><p><strong>Required Materials:</strong></p><ul><li>Breville-recommended descaling solution</li><li>A large container (minimum 2-liter capacity)</li></ul><p><strong>Step-by-Step Instructions:</strong></p><p><strong>Part A: Preparation</strong></p><ol><li>Empty the drip tray and re-insert it.</li><li>Remove the water filter from the water tank.</li><li>Pour the descaling solution into the empty water tank and add fresh water up to the indicated &ldquo;DESCALE&rdquo; line.</li><li>Place a large container under the group head, hot water outlet, and steam wand.</li></ol><p><strong>Part B: The Descaling Process</strong></p><ol><li>Turn the machine on and press the &lsquo;MENU&rsquo; button. Navigate to the &lsquo;DESCALE&rsquo; option and select it by pressing the dial.</li><li>Press the illuminated &lsquo;1 CUP&rsquo; button to begin.</li><li>The cycle proceeds in three stages. You must manually advance through them using the steam dial based on the LCD prompts:<ul><li><strong>Group Head (d3):</strong> The machine descales the coffee brewing components.</li><li><strong>Hot Water (d2):</strong> After a beep, the LCD shows &ldquo;d2&rdquo;. Turn the steam dial to the hot water position.</li><li><strong>Steam (d1):</strong> After another beep, the display reads &ldquo;d1&rdquo;. Turn the dial to the steam position.</li></ul></li></ol><p><strong>Part C: The Rinse Cycle</strong></p><ol><li>Once the descaling solution is expended, the machine will beep and prompt for a rinse cycle (&ldquo;r&rdquo;).</li><li>Empty the large container and rinse the water tank thoroughly.</li><li>Fill the water tank with fresh, cold water to the MAX line and re-insert it.</li><li>Place the empty container back under the outlets and press the &lsquo;1 CUP&rsquo; button.</li><li>The rinse cycle will mirror the descaling process, prompting you to engage the group head (&ldquo;r3&rdquo;), hot water (&ldquo;r2&rdquo;), and steam wand (&ldquo;r1&rdquo;) in sequence.</li><li>After the rinse is complete, the machine will exit the maintenance mode and return to its ready state.</li></ol><h4 id=routine-and-preventative-maintenance-schedule><strong>Routine and Preventative Maintenance Schedule</strong>
+<a class=heading-link href=#routine-and-preventative-maintenance-schedule><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>In addition to the automated cycles, regular manual cleaning is essential for machine health.</p><p><strong>Daily Tasks:</strong></p><ul><li><strong>Purge Group Head:</strong> After the final use of the day, run hot water through the group head (without the portafilter) to clear grounds.</li><li><strong>Clean Portafilter & Baskets:</strong> Do not let used coffee grounds sit in the portafilter. Rinse with hot water after every use.</li><li><strong>Clean Steam Wand:</strong> Immediately after texturing milk, wipe the wand with a damp cloth and purge steam for 2-3 seconds to clear internal passages.</li><li><strong>Empty Drip Tray:</strong> Empty and rinse the drip tray regularly.</li></ul><p><strong>Weekly Tasks:</strong></p><ul><li><strong>Soak Components:</strong> Remove the filter basket from the portafilter. Soak both components in a solution of hot water and a cleaning tablet (or specific espresso cleaner) for 20-30 minutes to dissolve accumulated coffee oils. Rinse thoroughly.</li><li><strong>Clean Grinder:</strong> Empty the bean hopper. Run the grinder to clear any remaining beans, then use a brush and/or vacuum to clean out fines and oil residue from the burrs and chute.</li></ul><p><strong>Periodic Tasks (Every 2-3 Months):</strong></p><ul><li><strong>Replace Water Filter:</strong> The water filter located inside the water tank should be replaced every 3 months. This reduces the rate of scale buildup.</li><li><strong>Inspect Shower Screen:</strong> Use a brush to gently scrub the shower screen inside the group head to remove any stubborn coffee grounds.</li></ul><p>By adhering to this comprehensive maintenance schedule, you can ensure your Breville Barista Pro operates at peak performance and consistently produces high-quality espresso.</p><hr><p><strong>Reference:</strong></p><ul><li>Breville Barista Pro Instruction Manual and official manufacturer guidelines.</li></ul></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
+2016 -
+2025
+Eric X. Liu
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html b/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html
index eced718..9e06fd2 100644
--- a/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html
+++ b/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html
@@ -20,4 +20,4 @@ Our overarching philosophy is simple: isolate and change only one variable at a
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/how-rvq-teaches-llms-to-see-and-hear/index.html b/posts/how-rvq-teaches-llms-to-see-and-hear/index.html
index ef5b34b..1b649d9 100644
--- a/posts/how-rvq-teaches-llms-to-see-and-hear/index.html
+++ b/posts/how-rvq-teaches-llms-to-see-and-hear/index.html
@@ -18,4 +18,4 @@ The answer lies in creating a universal language—a bridge between the continuo
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/index.html b/posts/index.html
index 6b32107..82336d7 100644
--- a/posts/index.html
+++ b/posts/index.html
@@ -2,16 +2,16 @@
 </a><input type=checkbox id=menu-toggle>
 <label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container list"><header><h1 class=title><a class=title-link href=/posts/>Posts</a></h1></header><ul><li><span class=date>August 19, 2025</span>
 <a class=title href=/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/>A Technical Deep Dive into the Transformer's Core Mechanics</a></li><li><span class=date>August 19, 2025</span>
-<a class=title href=/posts/quantization-in-llms/>Quantization in LLMs</a></li><li><span class=date>August 16, 2025</span>
-<a class=title href=/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/>A Comprehensive Guide to Breville Barista Pro Maintenance</a></li><li><span class=date>August 9, 2025</span>
+<a class=title href=/posts/quantization-in-llms/>Quantization in LLMs</a></li><li><span class=date>August 19, 2025</span>
+<a class=title href=/posts/transformer-s-core-mechanics/>Transformer's Core Mechanics</a></li><li><span class=date>August 16, 2025</span>
+<a class=title href=/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/>A Comprehensive Guide to Breville Barista Pro Maintenance</a></li><li><span class=date>August 16, 2025</span>
+<a class=title href=/posts/breville-barista-pro-maintenance/>Breville Barista Pro Maintenance</a></li><li><span class=date>August 9, 2025</span>
 <a class=title href=/posts/secure-boot-dkms-and-mok-on-proxmox-debian/>Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian</a></li><li><span class=date>August 7, 2025</span>
 <a class=title href=/posts/how-rvq-teaches-llms-to-see-and-hear/>Beyond Words: How RVQ Teaches LLMs to See and Hear</a></li><li><span class=date>August 3, 2025</span>
 <a class=title href=/posts/supabase-deep-dive/>Supabase Deep Dive: It's Not Magic, It's Just Postgres</a></li><li><span class=date>August 2, 2025</span>
-<a class=title href=/posts/a-deep-dive-into-ppo-for-language-models/>A Deep Dive into PPO for Language Models</a></li><li><span class=date>July 2, 2025</span>
-<a class=title href=/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice</a></li><li><span class=date>June 1, 2025</span>
-<a class=title href=/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/>An Architectural Deep Dive of T5</a></li><li><span class=date>May 1, 2025</span>
-<a class=title href=/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/>Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso</a></li></ul><ul class=pagination><li>1</li><li><a href=/posts/page/2/>2</a></li><li class=hidden><a href=/posts/page/2/>&#8250;</a></li><li><a href=/posts/page/2/>&#187;</a></li></ul></section></div><footer class=footer><section class=container>©
+<a class=title href=/posts/a-deep-dive-into-ppo-for-language-models/>A Deep Dive into PPO for Language Models</a></li><li><span class=date>August 2, 2025</span>
+<a class=title href=/posts/ppo-for-language-models/>A Deep Dive into PPO for Language Models</a></li></ul><ul class=pagination><li>1</li><li><a href=/posts/page/2/>2</a></li><li class=hidden><a href=/posts/page/2/>&#8250;</a></li><li><a href=/posts/page/2/>&#187;</a></li></ul></section></div><footer class=footer><section class=container>©
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/index.xml b/posts/index.xml
index 19e4d70..83e757c 100644
--- a/posts/index.xml
+++ b/posts/index.xml
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Eric X. Liu's Personal Page</title><link>/posts/</link><description>Recent content in Posts on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 20 Aug 2025 06:02:35 +0000</lastBuildDate><atom:link href="/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>A Technical Deep Dive into the Transformer's Core Mechanics</title><link>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Eric X. Liu's Personal Page</title><link>/posts/</link><description>Recent content in Posts on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 20 Aug 2025 06:04:36 +0000</lastBuildDate><atom:link href="/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>A Technical Deep Dive into the Transformer's Core Mechanics</title><link>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
 &lt;h3 id="1-the-channel-a-foundational-view-of-d_model">
  1. The &amp;ldquo;Channel&amp;rdquo;: A Foundational View of &lt;code>d_model&lt;/code>
  &lt;a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
@@ -6,7 +6,23 @@
  &lt;span class="sr-only">Link to heading&lt;/span>
  &lt;/a>
 &lt;/h3>
-&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>Quantization in LLMs</title><link>/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/quantization-in-llms/</guid><description>&lt;p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.&lt;/p></description></item><item><title>A Comprehensive Guide to Breville Barista Pro Maintenance</title><link>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
+&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>Quantization in LLMs</title><link>/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/quantization-in-llms/</guid><description>&lt;p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.&lt;/p></description></item><item><title>Transformer's Core Mechanics</title><link>/posts/transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
+&lt;h3 id="1-the-channel-a-foundational-view-of-d_model">
+ 1. The &amp;ldquo;Channel&amp;rdquo;: A Foundational View of &lt;code>d_model&lt;/code>
+ &lt;a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
+ &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading">&lt;/i>
+ &lt;span class="sr-only">Link to heading&lt;/span>
+ &lt;/a>
+&lt;/h3>
+&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>A Comprehensive Guide to Breville Barista Pro Maintenance</title><link>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
+&lt;h4 id="understanding-the-two-primary-maintenance-cycles">
+ &lt;strong>Understanding the Two Primary Maintenance Cycles&lt;/strong>
+ &lt;a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
+ &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading">&lt;/i>
+ &lt;span class="sr-only">Link to heading&lt;/span>
+ &lt;/a>
+&lt;/h4>
+&lt;p>The Breville Barista Pro has two distinct, automated maintenance procedures: the &lt;strong>Cleaning (Flush) Cycle&lt;/strong> and the &lt;strong>Descale Cycle&lt;/strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.&lt;/p></description></item><item><title>Breville Barista Pro Maintenance</title><link>/posts/breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
 &lt;h4 id="understanding-the-two-primary-maintenance-cycles">
  &lt;strong>Understanding the Two Primary Maintenance Cycles&lt;/strong>
  &lt;a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
@@ -22,6 +38,7 @@
 &lt;p>That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.&lt;/p></description></item><item><title>Beyond Words: How RVQ Teaches LLMs to See and Hear</title><link>/posts/how-rvq-teaches-llms-to-see-and-hear/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/how-rvq-teaches-llms-to-see-and-hear/</guid><description>&lt;p>Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?&lt;/p>
 &lt;p>The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is &lt;strong>Residual Vector Quantization (RVQ)&lt;/strong>.&lt;/p></description></item><item><title>Supabase Deep Dive: It's Not Magic, It's Just Postgres</title><link>/posts/supabase-deep-dive/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/supabase-deep-dive/</guid><description>&lt;p>In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what&amp;rsquo;s really going on.&lt;/p>
 &lt;p>Supabase enters this space with a radically different philosophy: &lt;strong>transparency&lt;/strong>. It provides the convenience of a BaaS, but it’s built on the world&amp;rsquo;s most trusted relational database: PostgreSQL. The &amp;ldquo;magic&amp;rdquo; isn&amp;rsquo;t a proprietary black box; it&amp;rsquo;s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/a-deep-dive-into-ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-deep-dive-into-ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
+&lt;p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
 &lt;p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.&lt;/p></description></item><item><title>Mixture-of-Experts (MoE) Models Challenges &amp; Solutions in Practice</title><link>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</link><pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate><guid>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</guid><description>&lt;p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &amp;ldquo;experts&amp;rdquo;) to specialize in different types of inputs. A &amp;ldquo;gating network&amp;rdquo; or &amp;ldquo;router&amp;rdquo; learns to dispatch each input (or &amp;ldquo;token&amp;rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.&lt;/p>
 &lt;h3 id="1-challenge-non-differentiability-of-routing-functions">
  1. Challenge: Non-Differentiability of Routing Functions
diff --git a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html
index b4e5a7e..aee9e37 100644
--- a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html
+++ b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html
@@ -44,4 +44,4 @@ The <strong>Top-K routing</strong> mechanism, as illustrated in the provided ima
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/page/2/index.html b/posts/page/2/index.html
index 73c9206..5b1d63a 100644
--- a/posts/page/2/index.html
+++ b/posts/page/2/index.html
@@ -1,8 +1,11 @@
 <!doctype html><html lang=en><head><title>Posts · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta name=author content="Eric X. Liu"><meta name=description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Posts"><meta name=twitter:description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:url" content="/posts/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Posts"><meta property="og:description" content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:locale" content="en"><meta property="og:type" content="website"><link rel=canonical href=/posts/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.6445a802b9389c9660e1b07b724dcf5718b1065ed2d71b4eeaf981cc7cc5fc46.css integrity="sha256-ZEWoArk4nJZg4bB7ck3PVxixBl7S1xtO6vmBzHzF/EY=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><link rel=alternate type=application/rss+xml href=/posts/index.xml title="Eric X. Liu's Personal Page"></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/>Eric X. Liu's Personal Page
 </a><input type=checkbox id=menu-toggle>
-<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container list"><header><h1 class=title><a class=title-link href=/posts/>Posts</a></h1></header><ul><li><span class=date>October 26, 2020</span>
+<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container list"><header><h1 class=title><a class=title-link href=/posts/>Posts</a></h1></header><ul><li><span class=date>July 2, 2025</span>
+<a class=title href=/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice</a></li><li><span class=date>June 1, 2025</span>
+<a class=title href=/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/>An Architectural Deep Dive of T5</a></li><li><span class=date>May 1, 2025</span>
+<a class=title href=/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/>Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso</a></li><li><span class=date>October 26, 2020</span>
 <a class=title href=/posts/useful/>Some useful files</a></li></ul><ul class=pagination><li><a href=/posts/>&#171;</a></li><li class=hidden><a href=/posts/>&#8249;</a></li><li><a href=/posts/>1</a></li><li>2</li></ul></section></div><footer class=footer><section class=container>©
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/ppo-for-language-models/index.html b/posts/ppo-for-language-models/index.html
new file mode 100644
index 0000000..711837d
--- /dev/null
+++ b/posts/ppo-for-language-models/index.html
@@ -0,0 +1,26 @@
+<!doctype html><html lang=en><head><title>A Deep Dive into PPO for Language Models · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta name=author content="Eric X. Liu"><meta name=description content="Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
+You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="A Deep Dive into PPO for Language Models"><meta name=twitter:description content="Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don’t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
+You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows."><meta property="og:url" content="/posts/ppo-for-language-models/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="A Deep Dive into PPO for Language Models"><meta property="og:description" content="Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don’t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
+You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-08-02T00:00:00+00:00"><meta property="article:modified_time" content="2025-08-20T06:04:36+00:00"><link rel=canonical href=/posts/ppo-for-language-models/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.6445a802b9389c9660e1b07b724dcf5718b1065ed2d71b4eeaf981cc7cc5fc46.css integrity="sha256-ZEWoArk4nJZg4bB7ck3PVxixBl7S1xtO6vmBzHzF/EY=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/>Eric X. Liu's Personal Page
+</a><input type=checkbox id=menu-toggle>
+<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=/posts/ppo-for-language-models/>A Deep Dive into PPO for Language Models</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
+<time datetime=2025-08-02T00:00:00Z>August 2, 2025
+</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
+7-minute read</span></div></div></header><div class=post-content><p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p><p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p><p><img src=/images/ppo-for-language-models/.png alt></p><p>This post will decode that diagram, piece by piece. We&rsquo;ll explore the &ldquo;why&rdquo; behind each component, moving from high-level concepts to the deep technical reasoning that makes this process work.</p><h3 id=translating-rl-to-a-conversation>Translating RL to a Conversation
+<a class=heading-link href=#translating-rl-to-a-conversation><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>The first step is to understand how the traditional language of reinforcement learning maps to the world of text generation.</p><ul><li><strong>State (<code>s_t</code>)</strong>: In a chat setting, the &ldquo;state&rdquo; is the context of the conversation so far. It&rsquo;s the initial prompt (<code>x</code>) plus all the text the model has generated up to the current moment (<code>y₁, ..., y_{t-1}</code>).</li><li><strong>Action (<code>a_t</code>)</strong>: The &ldquo;action&rdquo; is the model&rsquo;s decision at each step. For an LLM, this means generating the very next token (<code>y_t</code>). A full response is a sequence of these actions.blob:https://aistudio.google.com/872e746f-88c1-40ec-8e45-fa0efce97299</li><li><strong>Reward (<code>r</code>)</strong>: The &ldquo;reward&rdquo; is a numeric score that tells the model how good its full response (<code>y</code>) was. This score comes from a separate <strong>Reward Model</strong>, which has been trained on a large dataset of human preference comparisons (e.g., humans rating which of two responses is better). This reward is often only awarded at the end of the entire generated sequence.</li></ul><p>Let&rsquo;s make this concrete. If a user provides the prompt <strong>(x)</strong>: <em>&ldquo;The best thing about AI is&rdquo;</em>, and the model generates the response <strong>(y)</strong>: <em>&ldquo;its potential to solve problems.&rdquo;</em>, here is how it&rsquo;s broken down for training:</p><ul><li><strong>State 1</strong>: &ldquo;The best thing about AI is&rdquo;<ul><li><strong>Action 1</strong>: &ldquo;its&rdquo;</li></ul></li><li><strong>State 2</strong>: &ldquo;The best thing about AI is its&rdquo;<ul><li><strong>Action 2</strong>: " potential"</li></ul></li><li><strong>State 3</strong>: &ldquo;The best thing about AI is its potential&rdquo;<ul><li><strong>Action 3</strong>: " to"</li></ul></li><li>&mldr;and so on for every generated token.</li></ul><p>This breakdown transforms a single prompt-response pair into a rich trajectory of state-action pairs, which becomes the raw data for our learning algorithm.</p><h3 id=the-cast-of-models-an-actor-critic-ensemble>The Cast of Models: An Actor-Critic Ensemble
+<a class=heading-link href=#the-cast-of-models-an-actor-critic-ensemble><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>The PPO process doesn&rsquo;t rely on a single model but an ensemble where each member has a distinct role.</p><ol><li><strong>The Actor (Policy LM)</strong>: This is the star of the show—the LLM we are actively fine-tuning. Its role is to take a state (the current text) and decide on an action (the next token). We refer to its decision-making process as its &ldquo;policy&rdquo; (<code>π</code>).</li><li><strong>The Critic (Value Model)</strong>: This is the Actor&rsquo;s coach. The Critic doesn&rsquo;t generate text. Instead, it observes a state and estimates the <em>potential future reward</em> the Actor is likely to receive from that point onward. This estimate is called the &ldquo;value&rdquo; (<code>V(s_t)</code>). The Critic&rsquo;s feedback helps the Actor understand whether it&rsquo;s in a promising or a dead-end situation, which is a much more immediate learning signal than waiting for the final reward.</li><li><strong>The Reward Model</strong>: This is the ultimate judge. As mentioned, it&rsquo;s a separate model trained on human preference data that provides the final score for a complete generation. Its judgment is treated as the ground truth for training both the Actor and the Critic.</li></ol><h3 id=the-challenge-of-credit-assignment-generalized-advantage-estimation-gae>The Challenge of Credit Assignment: Generalized Advantage Estimation (GAE)
+<a class=heading-link href=#the-challenge-of-credit-assignment-generalized-advantage-estimation-gae><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>A key problem in RL is assigning credit. If a 20-token response gets a high reward, was it because of the first token, the last one, or all of them? The Critic helps solve this. By comparing the reward at each step with the Critic&rsquo;s value estimate, we can calculate the <strong>Advantage (<code>Â</code>)</strong>.</p><p>A simple advantage calculation might be: <code>Advantage = reward + Value_of_next_state - Value_of_current_state</code>.</p><p>However, this can be noisy. PPO uses a more sophisticated technique called <strong>Generalized Advantage Estimation (GAE)</strong>. The formula looks complex, but the idea is intuitive:</p><p><code>Â(s_t, a_t) = Σ(γλ)^l * δ_{t+l}</code>
+where <code>δ_t = r_t + γV(s_{t+1}) - V(s_t)</code></p><ul><li><strong>γ (gamma)</strong> is a discount factor (e.g., 0.99), which values immediate rewards slightly more than distant ones.</li><li><strong>λ (lambda)</strong> is a smoothing parameter that balances the trade-off between bias and variance. It creates a weighted average of advantages over multiple future time steps.</li></ul><p>In essence, GAE provides a more stable and accurate estimate of how much better a specific action was compared to the policy&rsquo;s average behavior in that state.</p><h3 id=the-heart-of-ppo-the-quest-for-stable-updates>The Heart of PPO: The Quest for Stable Updates
+<a class=heading-link href=#the-heart-of-ppo-the-quest-for-stable-updates><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>Now we arrive at the core innovation of PPO. We want to update our Actor model to take actions with higher advantages. The naive way to do this is to re-weight our training objective by an <strong>importance sampling ratio</strong>: <code>(π_new / π_old)</code>. This corrects for the fact that the data we are learning from was generated by a slightly older version of our policy.</p><p>However, this ratio is incredibly dangerous. If the new policy becomes very different from the old one, the ratio can explode, leading to massive, unstable gradient updates that destroy the model.</p><p>PPO solves this with its signature <strong>Clipped Surrogate Objective</strong>. The PPO loss function is:</p><p><code>L_CLIP(θ) = Ê_t [ min( r_t(θ)Â_t, clip(r_t(θ), 1 - ε, 1 + ε)Â_t ) ]</code></p><p>Let&rsquo;s translate this from math to English:</p><ul><li><code>r_t(θ)</code> is the probability ratio <code>π_new(a_t|s_t) / π_old(a_t|s_t)</code>.</li><li>The goal is to increase the objective by an amount proportional to the advantage <code>Â_t</code>.</li><li><strong>The <code>clip</code> function is the crucial safeguard.</strong> It forbids the probability ratio from moving outside a small window (e.g., <code>[0.8, 1.2]</code>).</li></ul><p>This means the algorithm says: &ldquo;Let&rsquo;s update our policy to favor this good action. But if the required update would change the policy too drastically from the old one, we&rsquo;ll &lsquo;clip&rsquo; the update to a more modest size.&rdquo; This creates a &ldquo;trust region,&rdquo; ensuring stable, incremental improvements.</p><h3 id=avoiding-amnesia-the-pretraining-loss>Avoiding Amnesia: The Pretraining Loss
+<a class=heading-link href=#avoiding-amnesia-the-pretraining-loss><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>There&rsquo;s one final problem. If we only optimize for the PPO loss, the model might learn to &ldquo;hack&rdquo; the reward model by generating repetitive or nonsensical text that gets a high score. In doing so, it could suffer from <strong>catastrophic forgetting</strong>, losing its fundamental grasp of grammar and facts.</p><p>To prevent this, we introduce a second loss term. As seen in the diagram, we mix in data from the original <strong>Pretraining Data</strong> (or the dataset used for Supervised Fine-Tuning). We calculate a standard next-token prediction loss (<code>LM Loss</code>) on this high-quality data.</p><p>The final loss for the Actor is a combination of both objectives:</p><p><strong>Total Loss = Loss_PPO + <code>λ_ptx</code> * Loss_LM</strong></p><p>This brilliantly balances two goals:</p><ol><li>The <code>Loss_PPO</code> pushes the model towards behaviors that align with human preferences.</li><li>The <code>Loss_LM</code> acts as a regularizer, pulling the model back towards its core language capabilities and preventing it from drifting into gibberish.</li></ol><h3 id=the-full-training-loop>The Full Training Loop
+<a class=heading-link href=#the-full-training-loop><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>Now, we can assemble the entire process into a clear, iterative loop:</p><ol><li><strong>Collect</strong>: The current Actor policy <code>π_k</code> generates responses to a batch of prompts. These experiences—<code>(state, action, probability, reward, value)</code>—are stored in an <strong>Experience Buffer</strong>.</li><li><strong>Calculate</strong>: Once the buffer is full, we use the collected data to compute the advantage estimates <code>Â_t</code> for every single token-generation step.</li><li><strong>Optimize</strong>: For a few epochs, we repeatedly sample mini-batches from the buffer and update the Actor and Critic models. The Actor is updated using the combined <code>PPO-clip Loss</code> and <code>LM Loss</code>. The Critic is updated to improve its value predictions.</li><li><strong>Flush and Repeat</strong>: After the optimization phase, the entire experience buffer is discarded. The data is now &ldquo;stale&rdquo; because our policy has changed. The newly updated policy <code>π_{k+1}</code> becomes the new Actor, and we return to step 1 to collect fresh data.</li></ol><p>This cycle of collection and optimization allows the language model to gradually and safely steer its behavior towards human-defined goals, creating the helpful and aligned AI assistants we interact with today.</p><hr><p><strong>References:</strong></p><ol><li>Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). <em>Proximal Policy Optimization Algorithms</em>. arXiv preprint arXiv:1707.06347.</li><li>Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). <em>High-Dimensional Continuous Control Using Generalized Advantage Estimation</em>. arXiv preprint arXiv:1506.02438.</li><li>Ouyang, L., et al. (2022). <em>Training language models to follow instructions with human feedback</em>. Advances in Neural Information Processing Systems 35.</li></ol></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
+2016 -
+2025
+Eric X. Liu
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/quantization-in-llms/index.html b/posts/quantization-in-llms/index.html
index 6dd8bdb..b822f0b 100644
--- a/posts/quantization-in-llms/index.html
+++ b/posts/quantization-in-llms/index.html
@@ -7,4 +7,4 @@
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html b/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html
index 22090fe..e3054af 100644
--- a/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html
+++ b/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html
@@ -59,4 +59,4 @@ nvidia-smi failed to communicate with the NVIDIA driver modprobe nvidia → “K
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/supabase-deep-dive/index.html b/posts/supabase-deep-dive/index.html
index 4d3ab8a..d7a9992 100644
--- a/posts/supabase-deep-dive/index.html
+++ b/posts/supabase-deep-dive/index.html
@@ -90,4 +90,4 @@ Supabase enters this space with a radically different philosophy: transparency.
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html
index f10df58..64d239b 100644
--- a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html
+++ b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html
@@ -30,4 +30,4 @@ But to truly understand the field, we must look at the pivotal models that explo
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/transformer-s-core-mechanics/index.html b/posts/transformer-s-core-mechanics/index.html
new file mode 100644
index 0000000..dbd3b08
--- /dev/null
+++ b/posts/transformer-s-core-mechanics/index.html
@@ -0,0 +1,39 @@
+<!doctype html><html lang=en><head><title>Transformer's Core Mechanics · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta name=author content="Eric X. Liu"><meta name=description content="The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &ldquo;channels&rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.
+
+  1. The &ldquo;Channel&rdquo;: A Foundational View of d_model
+  
+    
+    Link to heading
+  
+
+In deep learning, a &ldquo;channel&rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&rsquo;s primary embedding dimension, commonly referred to as d_model."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Transformer's Core Mechanics"><meta name=twitter:description content="The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of “channels” to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.
+1. The “Channel”: A Foundational View of d_model Link to heading In deep learning, a “channel” can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model’s primary embedding dimension, commonly referred to as d_model."><meta property="og:url" content="/posts/transformer-s-core-mechanics/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Transformer's Core Mechanics"><meta property="og:description" content="The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of “channels” to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.
+1. The “Channel”: A Foundational View of d_model Link to heading In deep learning, a “channel” can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model’s primary embedding dimension, commonly referred to as d_model."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-08-19T00:00:00+00:00"><meta property="article:modified_time" content="2025-08-20T06:04:36+00:00"><link rel=canonical href=/posts/transformer-s-core-mechanics/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.6445a802b9389c9660e1b07b724dcf5718b1065ed2d71b4eeaf981cc7cc5fc46.css integrity="sha256-ZEWoArk4nJZg4bB7ck3PVxixBl7S1xtO6vmBzHzF/EY=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/>Eric X. Liu's Personal Page
+</a><input type=checkbox id=menu-toggle>
+<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=/posts/transformer-s-core-mechanics/>Transformer's Core Mechanics</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
+<time datetime=2025-08-19T00:00:00Z>August 19, 2025
+</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
+7-minute read</span></div></div></header><div class=post-content><p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &ldquo;channels&rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.</p><h3 id=1-the-channel-a-foundational-view-of-d_model>1. The &ldquo;Channel&rdquo;: A Foundational View of <code>d_model</code>
+<a class=heading-link href=#1-the-channel-a-foundational-view-of-d_model><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>In deep learning, a &ldquo;channel&rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&rsquo;s primary embedding dimension, commonly referred to as <code>d_model</code>.</p><p>An input text is first tokenized, and each token is mapped to a vector of size <code>d_model</code> (e.g., 4096). Each of the 4096 dimensions in this vector can be considered a &ldquo;channel,&rdquo; representing a different semantic or syntactic feature of the token.</p><p>As this data, represented by a tensor of shape <code>[batch_size, sequence_length, d_model]</code>, progresses through the layers of the Transformer, these channels are continuously transformed. However, a critical design choice is that the output dimension of every main sub-layer (like the attention block or the FFN block) is also <code>d_model</code>. This consistency is essential for enabling <strong>residual connections</strong>, where the input to a block is added to its output (<code>output = input + SubLayer(input)</code>). This technique is vital for training the extremely deep networks common today.</p><h3 id=2-the-building-blocks-dimensions-of-key-layers>2. The Building Blocks: Dimensions of Key Layers
+<a class=heading-link href=#2-the-building-blocks-dimensions-of-key-layers><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>A Transformer layer is primarily composed of two sub-layers: a Multi-Head Attention block and a position-wise Feed-Forward Network (FFN). The parameters for these are stored in several key weight matrices. Understanding their dimensions is crucial.</p><p>Let&rsquo;s define our variables:</p><ul><li><code>d_model</code>: The core embedding dimension.</li><li><code>d_ff</code>: The inner dimension of the FFN, typically <code>4 * d_model</code>.</li><li><code>h</code>: The number of attention heads.</li><li><code>d_head</code>: The dimension of each attention head, where <code>d_model = h * d_head</code>.</li></ul><p>The dimensions of the weight matrices are as follows:</p><table><thead><tr><th>Layer</th><th>Weight Matrix</th><th>Input Vector Shape</th><th>Output Vector Shape</th><th><strong>Weight Matrix Dimension</strong></th></tr></thead><tbody><tr><td><strong>Attention Projections</strong></td><td></td><td></td><td></td><td></td></tr><tr><td>Query</td><td><code>W_Q</code></td><td><code>d_model</code></td><td><code>d_model</code></td><td><strong><code>[d_model, d_model]</code></strong></td></tr><tr><td>Key</td><td><code>W_K</code></td><td><code>d_model</code></td><td><code>d_model</code></td><td><strong><code>[d_model, d_model]</code></strong></td></tr><tr><td>Value</td><td><code>W_V</code></td><td><code>d_model</code></td><td><code>d_model</code></td><td><strong><code>[d_model, d_model]</code></strong></td></tr><tr><td>Output</td><td><code>W_O</code></td><td><code>d_model</code></td><td><code>d_model</code></td><td><strong><code>[d_model, d_model]</code></strong></td></tr><tr><td><strong>Feed-Forward Network</strong></td><td></td><td></td><td></td><td></td></tr><tr><td>Layer 1 (Up-projection)</td><td><code>W_ff1</code></td><td><code>d_model</code></td><td><code>d_ff</code></td><td><strong><code>[d_model, d_ff]</code></strong></td></tr><tr><td>Layer 2 (Down-projection)</td><td><code>W_ff2</code></td><td><code>d_ff</code></td><td><code>d_model</code></td><td><strong><code>[d_ff, d_model]</code></strong></td></tr></tbody></table><h3 id=3-deconstructing-multi-head-attention-mha>3. Deconstructing Multi-Head Attention (MHA)
+<a class=heading-link href=#3-deconstructing-multi-head-attention-mha><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
+<img src=/images/transformer-s-core-mechanics/.png alt></p><h4 id=31-the-why-beyond-a-single-attention>3.1. The &ldquo;Why&rdquo;: Beyond a Single Attention
+<a class=heading-link href=#31-the-why-beyond-a-single-attention><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating <code>h</code> parallel subspaces. Each &ldquo;head&rdquo; can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.</p><h4 id=32-an-encodingdecoding-analogy>3.2. An Encoding/Decoding Analogy
+<a class=heading-link href=#32-an-encodingdecoding-analogy><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>A powerful way to conceptualize the attention calculation is as a two-stage process:</p><ol><li><strong>Encoding Relationships:</strong> The first part of the calculation, <code>softmax(Q @ K.T)</code>, can be seen as an encoding step. It does not use the actual &ldquo;content&rdquo; of the tokens (the <code>V</code> vectors). Instead, it uses the Queries and Keys to build a dynamic &ldquo;relationship map&rdquo; between tokens in the sequence. This map, a matrix of attention scores, answers the question: &ldquo;For each token, how important is every other token right now?&rdquo;</li><li><strong>Decoding via Information Retrieval:</strong> The second part, <code>scores @ V</code>, acts as a decoding step. It uses the relationship map to retrieve and synthesize information. For each token, it creates a new vector by taking a weighted sum of all the <code>V</code> vectors in the sequence, using the scores as the precise mixing recipe. It decodes the relational structure into a new, context-aware representation.</li></ol><h4 id=33-the-how-a-step-by-step-flow>3.3. The &ldquo;How&rdquo;: A Step-by-Step Flow
+<a class=heading-link href=#33-the-how-a-step-by-step-flow><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h4><p>The MHA process is designed for maximum computational efficiency.</p><ol><li><strong>Initial Projections:</strong> The input vectors (shape <code>[seq_len, d_model]</code>) are multiplied by <code>W_Q</code>, <code>W_K</code>, and <code>W_V</code>. These matrices are all <code>[d_model, d_model]</code> not to create one large query, but to <strong>efficiently compute the vectors for all <code>h</code> heads at once</strong>. The single large output vector is then reshaped into <code>h</code> separate vectors, each of size <code>d_head</code>.</li><li><strong>Attention Score Calculation:</strong> For each head <code>i</code>, a score matrix is calculated: <code>scores_i = softmax( (Q_i @ K_i.T) / sqrt(d_head) )</code>. Note that <code>Q_i</code> and <code>K_i</code> have dimensions <code>[seq_len, d_head]</code>, so the resulting <code>scores_i</code> matrix has a dimension of <strong><code>[seq_len, seq_len]</code></strong>.</li><li><strong>Weighted Value Calculation:</strong> The scores are used to create a weighted sum of the Value vectors for each head: <code>output_i = scores_i @ V_i</code>. Since <code>scores_i</code> is <code>[seq_len, seq_len]</code> and <code>V_i</code> is <code>[seq_len, d_head]</code>, the resulting <code>output_i</code> has a dimension of <strong><code>[seq_len, d_head]</code></strong>. This is the final output of a single head.</li><li><strong>Concatenation and Final Projection:</strong> The outputs of all <code>h</code> heads are concatenated along the last dimension. This produces a single large matrix of shape <code>[seq_len, h * d_head]</code>, which is equivalent to <code>[seq_len, d_model]</code>. This matrix is then passed through the final output projection layer, <code>W_O</code> (shape <code>[d_model, d_model]</code>), to produce the attention block&rsquo;s final output. The <code>W_O</code> matrix learns the optimal way to mix the information from all the specialized heads into a single, unified representation.</li></ol><h3 id=4-optimizing-attention-gqa-and-mqa>4. Optimizing Attention: GQA and MQA
+<a class=heading-link href=#4-optimizing-attention-gqa-and-mqa><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>During inference, storing the Key and Value vectors for all previous tokens (the KV Cache) is a major memory bottleneck. <strong>Grouped-Query Attention (GQA)</strong> and <strong>Multi-Query Attention (MQA)</strong> are architectural modifications that address this by allowing multiple Query heads to share the same Key and Value heads.</p><p>Let&rsquo;s use a concrete example, similar to Llama 2 7B:</p><ul><li><code>d_model</code> = 4096</li><li><code>h</code> = 32 Q heads</li><li><code>d_head</code> = 128</li><li><code>g</code> = 8 KV head groups for GQA</li></ul><p>The key insight is that only the dimensions of the <code>W_K</code> and <code>W_V</code> matrices change, which in turn reduces the size of the KV cache. The <code>W_Q</code> and <code>W_O</code> matrices remain <code>[4096, 4096]</code>.</p><table><thead><tr><th>Attention Type</th><th>No. of Q Heads</th><th>No. of KV Heads</th><th><code>W_K</code> & <code>W_V</code> Dimension</th><th>Relative KV Cache Size</th></tr></thead><tbody><tr><td><strong>MHA</strong> (Multi-Head)</td><td>32</td><td>32</td><td><code>[4096, 32*128]</code> = <code>[4096, 4096]</code></td><td>1x (Baseline)</td></tr><tr><td><strong>GQA</strong> (Grouped)</td><td>32</td><td>8</td><td><code>[4096, 8*128]</code> = <code>[4096, 1024]</code></td><td>1/4x</td></tr><tr><td><strong>MQA</strong> (Multi-Query)</td><td>32</td><td>1</td><td><code>[4096, 1*128]</code> = <code>[4096, 128]</code></td><td>1/32x</td></tr></tbody></table><p>GQA provides a robust balance, significantly reducing the memory and bandwidth requirements for the KV cache with negligible impact on model performance, making it a popular choice in modern LLMs.</p><h3 id=5-mha-vs-mixture-of-experts-moe-a-clarification>5. MHA vs. Mixture of Experts (MoE): A Clarification
+<a class=heading-link href=#5-mha-vs-mixture-of-experts-moe-a-clarification><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><p>While both MHA and MoE use the concept of &ldquo;experts,&rdquo; they are functionally and architecturally distinct.</p><ul><li><strong>MHA:</strong> The &ldquo;experts&rdquo; are the <strong>attention heads</strong>. All heads are active for every token to build a rich representation within the attention layer. This is akin to a board meeting where every member analyzes and contributes to every decision.</li><li><strong>MoE:</strong> The &ldquo;experts&rdquo; are full <strong>Feed-Forward Networks</strong>. A routing network selects a small subset of these FFNs for each token. This is a scaling strategy to increase a model&rsquo;s parameter count for greater capacity while keeping the computational cost fixed. It replaces the standard FFN block, whereas MHA <em>is</em> the attention block.</li></ul><p>By understanding these technical details, from the basic concept of a channel to the sophisticated interplay of heads and experts, one can build a more complete and accurate mental model of how LLMs truly operate.</p><hr><h3 id=references>References
+<a class=heading-link href=#references><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
+<span class=sr-only>Link to heading</span></a></h3><ol><li>Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., &mldr; & Polosukhin, I. (2017). Attention is all you need. <em>Advances in neural information processing systems</em>, 30.</li><li>Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. <em>arXiv preprint arXiv:1701.06538</em>.</li><li>Ainslie, J., Ontanon, J., Cakka, E., Dosovitskiy, A., & Le, Q. V. (2023). GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. <em>arXiv preprint arXiv:2305.13245</em>.</li></ol></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
+2016 -
+2025
+Eric X. Liu
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/posts/useful/index.html b/posts/useful/index.html
index 2d4da63..e42e5c9 100644
--- a/posts/useful/index.html
+++ b/posts/useful/index.html
@@ -9,4 +9,4 @@ One-minute read</span></div></div></header><div class=post-content><ul><li><a hr
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index af9ccf6..51d27e8 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</loc><lastmod>2025-08-20T04:48:53+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/</loc><lastmod>2025-08-20T06:02:35+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/</loc><lastmod>2025-08-20T06:02:35+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/quantization-in-llms/</loc><lastmod>2025-08-20T06:02:35+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</loc><lastmod>2025-08-20T04:48:53+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/secure-boot-dkms-and-mok-on-proxmox-debian/</loc><lastmod>2025-08-14T06:50:22+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/how-rvq-teaches-llms-to-see-and-hear/</loc><lastmod>2025-08-08T17:36:52+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/supabase-deep-dive/</loc><lastmod>2025-08-04T03:59:37+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/a-deep-dive-into-ppo-for-language-models/</loc><lastmod>2025-08-16T21:13:18+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</loc><lastmod>2025-08-03T06:02:48+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/</loc><lastmod>2025-08-03T03:41:10+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/</loc><lastmod>2025-08-03T04:20:20+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/useful/</loc><lastmod>2025-08-03T08:37:28-07:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/about/</loc><lastmod>2020-06-16T23:30:17-07:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/categories/</loc><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/tags/</loc><changefreq>weekly</changefreq><priority>0.5</priority></url></urlset>
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</loc><lastmod>2025-08-20T04:48:53+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/</loc><lastmod>2025-08-20T06:04:36+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/</loc><lastmod>2025-08-20T06:04:36+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/quantization-in-llms/</loc><lastmod>2025-08-20T06:02:35+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/transformer-s-core-mechanics/</loc><lastmod>2025-08-20T06:04:36+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</loc><lastmod>2025-08-20T04:48:53+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/breville-barista-pro-maintenance/</loc><lastmod>2025-08-20T06:04:36+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/secure-boot-dkms-and-mok-on-proxmox-debian/</loc><lastmod>2025-08-14T06:50:22+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/how-rvq-teaches-llms-to-see-and-hear/</loc><lastmod>2025-08-08T17:36:52+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/supabase-deep-dive/</loc><lastmod>2025-08-04T03:59:37+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/a-deep-dive-into-ppo-for-language-models/</loc><lastmod>2025-08-16T21:13:18+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/ppo-for-language-models/</loc><lastmod>2025-08-20T06:04:36+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</loc><lastmod>2025-08-03T06:02:48+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/</loc><lastmod>2025-08-03T03:41:10+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/</loc><lastmod>2025-08-03T04:20:20+00:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/posts/useful/</loc><lastmod>2025-08-03T08:37:28-07:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/about/</loc><lastmod>2020-06-16T23:30:17-07:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/categories/</loc><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>/tags/</loc><changefreq>weekly</changefreq><priority>0.5</priority></url></urlset>
\ No newline at end of file
diff --git a/tags/index.html b/tags/index.html
index 10de90e..b65b4d5 100644
--- a/tags/index.html
+++ b/tags/index.html
@@ -4,4 +4,4 @@
 2016 -
 2025
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/ba596e7">[ba596e7]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3ee20f1">[3ee20f1]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
\ No newline at end of file

Layer	Weight Matrix	Input Vector Shape	Output Vector Shape	Weight Matrix Dimension
Attention Projections
Query	`W_Q`	`d_model`	`d_model`	`[d_model, d_model]`
Key	`W_K`	`d_model`	`d_model`	`[d_model, d_model]`
Value	`W_V`	`d_model`	`d_model`	`[d_model, d_model]`
Output	`W_O`	`d_model`	`d_model`	`[d_model, d_model]`
Feed-Forward Network
Layer 1 (Up-projection)	`W_ff1`	`d_model`	`d_ff`	`[d_model, d_ff]`
Layer 2 (Down-projection)	`W_ff2`	`d_ff`	`d_model`	`[d_ff, d_model]`
Attention Type	No. of Q Heads	No. of KV Heads	`W_K` & `W_V` Dimension	Relative KV Cache Size
MHA (Multi-Head)	32	32	`[4096, 32*128]` = `[4096, 4096]`	1x (Baseline)
GQA (Grouped)	32	8	`[4096, 8*128]` = `[4096, 1024]`	1/4x
MQA (Multi-Query)	32	1	`[4096, 1*128]` = `[4096, 128]`	1/32x