This commit is contained in:
eric
2025-12-20 01:50:43 +00:00
parent 19d2678a16
commit 40a88799ee
27 changed files with 141 additions and 58 deletions

View File

@@ -1,4 +1,13 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 04 Oct 2025 20:41:50 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</title><link>/posts/benchmarking-llms-on-jetson-orin-nano/</link><pubDate>Sat, 04 Oct 2025 00:00:00 +0000</pubDate><guid>/posts/benchmarking-llms-on-jetson-orin-nano/</guid><description>&lt;h2 id="introduction"&gt;
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 19 Dec 2025 21:21:55 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>The Convergence of Fast Weights, Linear Attention, and State Space Models</title><link>/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</guid><description>&lt;p&gt;Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformers attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&amp;ldquo;Fast Weights&amp;rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).&lt;/p&gt;
&lt;p&gt;This article explores the mathematical equivalence between Hintons concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.&lt;/p&gt;</description></item><item><title>vAttention</title><link>/posts/vattention/</link><pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><guid>/posts/vattention/</guid><description>&lt;p&gt;Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While &lt;strong&gt;PagedAttention&lt;/strong&gt; (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPUs native hardware Memory Management Unit (MMU) offers a more performant and portable solution.&lt;/p&gt;
&lt;h4 id="the-status-quo-pagedattention-and-software-tables"&gt;
The Status Quo: PagedAttention and Software Tables
&lt;a class="heading-link" href="#the-status-quo-pagedattention-and-software-tables"&gt;
&lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
&lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
&lt;/a&gt;
&lt;/h4&gt;
&lt;p&gt;Prior to PagedAttention, systems allocated contiguous memory for the maximum possible context length, leading to severe fragmentation and wasted memory. PagedAttention addressed this by chunking the KV cache into non-contiguous blocks, managed by a software-defined &amp;ldquo;page table&amp;rdquo; (the Block Table) [1].&lt;/p&gt;</description></item><item><title>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</title><link>/posts/benchmarking-llms-on-jetson-orin-nano/</link><pubDate>Sat, 04 Oct 2025 00:00:00 +0000</pubDate><guid>/posts/benchmarking-llms-on-jetson-orin-nano/</guid><description>&lt;h2 id="introduction"&gt;
Introduction
&lt;a class="heading-link" href="#introduction"&gt;
&lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;