This commit is contained in:
eric
2025-12-20 07:02:49 +00:00
parent e48bde719b
commit 6e752d8af2
26 changed files with 43 additions and 29 deletions

View File

@@ -1,4 +1,7 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>https://ericxliu.me/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 19 Dec 2025 21:21:55 +0000</lastBuildDate><atom:link href="https://ericxliu.me/index.xml" rel="self" type="application/rss+xml"/><item><title>The Convergence of Fast Weights, Linear Attention, and State Space Models</title><link>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</guid><description>&lt;p&gt;Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformers attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&amp;ldquo;Fast Weights&amp;rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).&lt;/p&gt;
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>https://ericxliu.me/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 19 Dec 2025 23:02:31 -0800</lastBuildDate><atom:link href="https://ericxliu.me/index.xml" rel="self" type="application/rss+xml"/><item><title>About</title><link>https://ericxliu.me/about/</link><pubDate>Fri, 19 Dec 2025 22:46:12 -0800</pubDate><guid>https://ericxliu.me/about/</guid><description>&lt;p&gt;Hi, I&amp;rsquo;m &lt;strong&gt;Eric Liu&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I am a &lt;strong&gt;Staff Software Engineer and Tech Lead Manager (TLM)&lt;/strong&gt; at &lt;strong&gt;Google&lt;/strong&gt;, based in Sunnyvale, CA.&lt;/p&gt;
&lt;p&gt;My work focuses on &lt;strong&gt;Platforms Performance and Customer Engineering&lt;/strong&gt;, specifically for &lt;strong&gt;GPUs and TPUs&lt;/strong&gt;. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale.&lt;/p&gt;
&lt;p&gt;Beyond the code, I maintain this &amp;ldquo;digital garden&amp;rdquo; where I document my projects and learnings. It serves as my second brain, capturing everything from technical deep dives to random musings.&lt;/p&gt;</description></item><item><title>The Convergence of Fast Weights, Linear Attention, and State Space Models</title><link>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</guid><description>&lt;p&gt;Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformers attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&amp;ldquo;Fast Weights&amp;rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).&lt;/p&gt;
&lt;p&gt;This article explores the mathematical equivalence between Hintons concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.&lt;/p&gt;</description></item><item><title>vAttention</title><link>https://ericxliu.me/posts/vattention/</link><pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/vattention/</guid><description>&lt;p&gt;Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While &lt;strong&gt;PagedAttention&lt;/strong&gt; (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPUs native hardware Memory Management Unit (MMU) offers a more performant and portable solution.&lt;/p&gt;
&lt;h4 id="the-status-quo-pagedattention-and-software-tables"&gt;
The Status Quo: PagedAttention and Software Tables
@@ -75,4 +78,4 @@ Many routing mechanisms, especially &amp;ldquo;Top-K routing,&amp;rdquo; involve
&lt;/h3&gt;
&lt;p&gt;In deep learning, a &amp;ldquo;channel&amp;rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&amp;rsquo;s primary embedding dimension, commonly referred to as &lt;code&gt;d_model&lt;/code&gt;.&lt;/p&gt;</description></item><item><title>Some useful files</title><link>https://ericxliu.me/posts/useful/</link><pubDate>Mon, 26 Oct 2020 04:14:43 +0000</pubDate><guid>https://ericxliu.me/posts/useful/</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ericxliu.me/rootCA.crt" &gt;rootCA.pem&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>About</title><link>https://ericxliu.me/about/</link><pubDate>Fri, 01 Jun 2018 07:13:52 +0000</pubDate><guid>https://ericxliu.me/about/</guid><description/></item></channel></rss>
&lt;/ul&gt;</description></item></channel></rss>