This commit is contained in:
eric
2025-08-20 06:24:28 +00:00
parent ea9c28dce4
commit 2aadf95801
24 changed files with 30 additions and 160 deletions

View File

@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 20 Aug 2025 06:04:36 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>A Technical Deep Dive into the Transformer's Core Mechanics</title><link>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-technical-deep-dive-into-the-transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 20 Aug 2025 06:04:36 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>Quantization in LLMs</title><link>/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/quantization-in-llms/</guid><description>&lt;p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.&lt;/p></description></item><item><title>Transformer's Core Mechanics</title><link>/posts/transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
&lt;h3 id="1-the-channel-a-foundational-view-of-d_model">
1. The &amp;ldquo;Channel&amp;rdquo;: A Foundational View of &lt;code>d_model&lt;/code>
&lt;a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
@@ -6,23 +6,7 @@
&lt;span class="sr-only">Link to heading&lt;/span>
&lt;/a>
&lt;/h3>
&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>Quantization in LLMs</title><link>/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/quantization-in-llms/</guid><description>&lt;p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.&lt;/p></description></item><item><title>Transformer's Core Mechanics</title><link>/posts/transformer-s-core-mechanics/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/transformer-s-core-mechanics/</guid><description>&lt;p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &amp;ldquo;channels&amp;rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.&lt;/p>
&lt;h3 id="1-the-channel-a-foundational-view-of-d_model">
1. The &amp;ldquo;Channel&amp;rdquo;: A Foundational View of &lt;code>d_model&lt;/code>
&lt;a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
&lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading">&lt;/i>
&lt;span class="sr-only">Link to heading&lt;/span>
&lt;/a>
&lt;/h3>
&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>A Comprehensive Guide to Breville Barista Pro Maintenance</title><link>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-comprehensive-guide-to-breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
&lt;h4 id="understanding-the-two-primary-maintenance-cycles">
&lt;strong>Understanding the Two Primary Maintenance Cycles&lt;/strong>
&lt;a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
&lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading">&lt;/i>
&lt;span class="sr-only">Link to heading&lt;/span>
&lt;/a>
&lt;/h4>
&lt;p>The Breville Barista Pro has two distinct, automated maintenance procedures: the &lt;strong>Cleaning (Flush) Cycle&lt;/strong> and the &lt;strong>Descale Cycle&lt;/strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.&lt;/p></description></item><item><title>Breville Barista Pro Maintenance</title><link>/posts/breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
&lt;p>In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code>d_model&lt;/code>.&lt;/p></description></item><item><title>Breville Barista Pro Maintenance</title><link>/posts/breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/breville-barista-pro-maintenance/</guid><description>&lt;p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.&lt;/p>
&lt;h4 id="understanding-the-two-primary-maintenance-cycles">
&lt;strong>Understanding the Two Primary Maintenance Cycles&lt;/strong>
&lt;a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
@@ -37,8 +21,7 @@
&lt;/ul>
&lt;p>That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.&lt;/p></description></item><item><title>Beyond Words: How RVQ Teaches LLMs to See and Hear</title><link>/posts/how-rvq-teaches-llms-to-see-and-hear/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/how-rvq-teaches-llms-to-see-and-hear/</guid><description>&lt;p>Large Language Models (LLMs) are masters of text, but the world is not made of text alone. Its a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?&lt;/p>
&lt;p>The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is &lt;strong>Residual Vector Quantization (RVQ)&lt;/strong>.&lt;/p></description></item><item><title>Supabase Deep Dive: It's Not Magic, It's Just Postgres</title><link>/posts/supabase-deep-dive/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/supabase-deep-dive/</guid><description>&lt;p>In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what&amp;rsquo;s really going on.&lt;/p>
&lt;p>Supabase enters this space with a radically different philosophy: &lt;strong>transparency&lt;/strong>. It provides the convenience of a BaaS, but its built on the world&amp;rsquo;s most trusted relational database: PostgreSQL. The &amp;ldquo;magic&amp;rdquo; isn&amp;rsquo;t a proprietary black box; it&amp;rsquo;s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/a-deep-dive-into-ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/a-deep-dive-into-ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
&lt;p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
&lt;p>Supabase enters this space with a radically different philosophy: &lt;strong>transparency&lt;/strong>. It provides the convenience of a BaaS, but its built on the world&amp;rsquo;s most trusted relational database: PostgreSQL. The &amp;ldquo;magic&amp;rdquo; isn&amp;rsquo;t a proprietary black box; it&amp;rsquo;s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.&lt;/p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>/posts/ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>/posts/ppo-for-language-models/</guid><description>&lt;p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&amp;rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).&lt;/p>
&lt;p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.&lt;/p></description></item><item><title>Mixture-of-Experts (MoE) Models Challenges &amp; Solutions in Practice</title><link>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</link><pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate><guid>/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</guid><description>&lt;p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &amp;ldquo;experts&amp;rdquo;) to specialize in different types of inputs. A &amp;ldquo;gating network&amp;rdquo; or &amp;ldquo;router&amp;rdquo; learns to dispatch each input (or &amp;ldquo;token&amp;rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.&lt;/p>
&lt;h3 id="1-challenge-non-differentiability-of-routing-functions">
1. Challenge: Non-Differentiability of Routing Functions