diff --git a/404.html b/404.html index fe7f300..7560813 100644 --- a/404.html +++ b/404.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/about/index.html b/about/index.html index f14f8ec..3ec98a7 100644 --- a/about/index.html +++ b/about/index.html @@ -13,4 +13,4 @@ My work focuses on Infrastructure Performance and Customer Engineering, specific 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/authors/index.html b/authors/index.html index dc6b047..e61c29c 100644 --- a/authors/index.html +++ b/authors/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/categories/index.html b/categories/index.html index 5446176..24f611d 100644 --- a/categories/index.html +++ b/categories/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/images/technical-deep-dive-llm-categorization/eedb3be8259a4a70aa7029b78a029364.png b/images/technical-deep-dive-llm-categorization/eedb3be8259a4a70aa7029b78a029364.png new file mode 100644 index 0000000..bbef16a Binary files /dev/null and b/images/technical-deep-dive-llm-categorization/eedb3be8259a4a70aa7029b78a029364.png differ diff --git a/index.html b/index.html index 972ebfe..446d99c 100644 --- a/index.html +++ b/index.html @@ -1,8 +1,8 @@ -Eric X. Liu's Personal Page
avatar

Eric X. Liu

  • +
\ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/index.xml b/index.xml index d1317ea..8d79e8f 100644 --- a/index.xml +++ b/index.xml @@ -1,8 +1,9 @@ -Eric X. Liu's Personal Pagehttps://ericxliu.me/Recent content on Eric X. Liu's Personal PageHugoenSat, 20 Dec 2025 09:52:07 -0800Abouthttps://ericxliu.me/about/Fri, 19 Dec 2025 22:46:12 -0800https://ericxliu.me/about/<img src="https://ericxliu.me/images/about.jpeg" alt="Eric Liu" width="300" style="float: left; margin-right: 1.5rem; margin-bottom: 1rem; border-radius: 8px;"/> +Eric X. Liu's Personal Pagehttps://ericxliu.me/Recent content on Eric X. Liu's Personal PageHugoenSat, 27 Dec 2025 21:18:10 +0000Abouthttps://ericxliu.me/about/Fri, 19 Dec 2025 22:46:12 -0800https://ericxliu.me/about/<img src="https://ericxliu.me/images/about.jpeg" alt="Eric Liu" width="300" style="float: left; margin-right: 1.5rem; margin-bottom: 1rem; border-radius: 8px;"/> <p>Hi, I&rsquo;m <strong>Eric Liu</strong>.</p> <p>I am a <strong>Staff Software Engineer and Tech Lead Manager (TLM)</strong> at <strong>Google</strong>, based in Sunnyvale, CA.</p> <p>My work focuses on <strong>Infrastructure Performance and Customer Engineering</strong>, specifically for <strong>GPUs and TPUs</strong>. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it&rsquo;s debugging race conditions across thousands of chips or designing API surfaces for next-gen models.</p>The Convergence of Fast Weights, Linear Attention, and State Space Modelshttps://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/Fri, 19 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/<p>Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&ldquo;Fast Weights&rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).</p> -<p>This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.</p>vAttentionhttps://ericxliu.me/posts/vattention/Mon, 08 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/vattention/<p>Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While <strong>PagedAttention</strong> (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.</p> +<p>This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.</p>From Gemini-3-Flash to T5-Gemma-2 A Journey in Distilling a Family Finance LLMhttps://ericxliu.me/posts/technical-deep-dive-llm-categorization/Mon, 08 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/technical-deep-dive-llm-categorization/<p>Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and &ldquo;wait, was this dinner or <em>vacation</em> dinner?&rdquo; questions.</p> +<p>For years, I relied on a rule-based system to categorize our credit card transactions. It worked&hellip; mostly. But maintaining <code>if &quot;UBER&quot; in description and amount &gt; 50</code> style rules is a never-ending battle against the entropy of merchant names and changing habits.</p>vAttentionhttps://ericxliu.me/posts/vattention/Mon, 08 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/vattention/<p>Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While <strong>PagedAttention</strong> (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.</p> <h4 id="the-status-quo-pagedattention-and-software-tables"> The Status Quo: PagedAttention and Software Tables <a class="heading-link" href="#the-status-quo-pagedattention-and-software-tables"> diff --git a/posts/benchmarking-llms-on-jetson-orin-nano/index.html b/posts/benchmarking-llms-on-jetson-orin-nano/index.html index 5d71fd1..3468e93 100644 --- a/posts/benchmarking-llms-on-jetson-orin-nano/index.html +++ b/posts/benchmarking-llms-on-jetson-orin-nano/index.html @@ -62,4 +62,4 @@ After running 66 inference tests across seven different language models ranging 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/breville-barista-pro-maintenance/index.html b/posts/breville-barista-pro-maintenance/index.html index d587bf8..e97bc94 100644 --- a/posts/breville-barista-pro-maintenance/index.html +++ b/posts/breville-barista-pro-maintenance/index.html @@ -25,4 +25,4 @@ Understanding the Two Primary Maintenance Cycles Link to heading The Breville Ba 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html b/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html index 21693ce..e0a54c9 100644 --- a/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html +++ b/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html @@ -20,4 +20,4 @@ Our overarching philosophy is simple: isolate and change only one variable at a 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html b/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html index 0f5791c..d3cc223 100644 --- a/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html +++ b/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html @@ -168,4 +168,4 @@ Flashing NVIDIA Jetson devices remotely presents unique challenges when the host 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/how-rvq-teaches-llms-to-see-and-hear/index.html b/posts/how-rvq-teaches-llms-to-see-and-hear/index.html index 1986df5..1008eae 100644 --- a/posts/how-rvq-teaches-llms-to-see-and-hear/index.html +++ b/posts/how-rvq-teaches-llms-to-see-and-hear/index.html @@ -18,4 +18,4 @@ The answer lies in creating a universal language—a bridge between the continuo 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/index.html b/posts/index.html index 4a8c222..40bd9c6 100644 --- a/posts/index.html +++ b/posts/index.html @@ -2,6 +2,7 @@ \ No newline at end of file diff --git a/posts/index.xml b/posts/index.xml index 3803bf2..f4b9681 100644 --- a/posts/index.xml +++ b/posts/index.xml @@ -1,5 +1,6 @@ -Posts on Eric X. Liu's Personal Pagehttps://ericxliu.me/posts/Recent content in Posts on Eric X. Liu's Personal PageHugoenFri, 19 Dec 2025 21:21:55 +0000The Convergence of Fast Weights, Linear Attention, and State Space Modelshttps://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/Fri, 19 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/<p>Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&ldquo;Fast Weights&rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).</p> -<p>This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.</p>vAttentionhttps://ericxliu.me/posts/vattention/Mon, 08 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/vattention/<p>Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While <strong>PagedAttention</strong> (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.</p> +Posts on Eric X. Liu's Personal Pagehttps://ericxliu.me/posts/Recent content in Posts on Eric X. Liu's Personal PageHugoenSat, 27 Dec 2025 21:18:10 +0000The Convergence of Fast Weights, Linear Attention, and State Space Modelshttps://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/Fri, 19 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/<p>Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&ldquo;Fast Weights&rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).</p> +<p>This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.</p>From Gemini-3-Flash to T5-Gemma-2 A Journey in Distilling a Family Finance LLMhttps://ericxliu.me/posts/technical-deep-dive-llm-categorization/Mon, 08 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/technical-deep-dive-llm-categorization/<p>Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and &ldquo;wait, was this dinner or <em>vacation</em> dinner?&rdquo; questions.</p> +<p>For years, I relied on a rule-based system to categorize our credit card transactions. It worked&hellip; mostly. But maintaining <code>if &quot;UBER&quot; in description and amount &gt; 50</code> style rules is a never-ending battle against the entropy of merchant names and changing habits.</p>vAttentionhttps://ericxliu.me/posts/vattention/Mon, 08 Dec 2025 00:00:00 +0000https://ericxliu.me/posts/vattention/<p>Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While <strong>PagedAttention</strong> (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.</p> <h4 id="the-status-quo-pagedattention-and-software-tables"> The Status Quo: PagedAttention and Software Tables <a class="heading-link" href="#the-status-quo-pagedattention-and-software-tables"> diff --git a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html index b1f9218..35ec354 100644 --- a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html +++ b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html @@ -44,4 +44,4 @@ The Top-K routing mechanism, as illustrated in the provided ima 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html b/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html index 7810f6c..950bb4d 100644 --- a/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html +++ b/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html @@ -98,4 +98,4 @@ When using WireGuard together with MWAN3 on OpenWrt, the tunnel can fail to esta 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/page/2/index.html b/posts/page/2/index.html index 994dbd7..2787eef 100644 --- a/posts/page/2/index.html +++ b/posts/page/2/index.html @@ -1,6 +1,7 @@ Posts · Eric X. Liu's Personal Page
\ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/ppo-for-language-models/index.html b/posts/ppo-for-language-models/index.html index 6f2c58a..770d873 100644 --- a/posts/ppo-for-language-models/index.html +++ b/posts/ppo-for-language-models/index.html @@ -25,4 +25,4 @@ where δ_t = r_t + γV(s_{t+1}) - V(s_t)

  • γ (gam 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/quantization-in-llms/index.html b/posts/quantization-in-llms/index.html index 1c7a390..34450e0 100644 --- a/posts/quantization-in-llms/index.html +++ b/posts/quantization-in-llms/index.html @@ -7,4 +7,4 @@ 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html b/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html index 03755d5..ab98506 100644 --- a/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html +++ b/posts/secure-boot-dkms-and-mok-on-proxmox-debian/index.html @@ -59,4 +59,4 @@ nvidia-smi failed to communicate with the NVIDIA driver modprobe nvidia → “K 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/supabase-deep-dive/index.html b/posts/supabase-deep-dive/index.html index 13bc4ca..611b20b 100644 --- a/posts/supabase-deep-dive/index.html +++ b/posts/supabase-deep-dive/index.html @@ -90,4 +90,4 @@ Supabase enters this space with a radically different philosophy: transparency. 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html index c44b388..d4f408d 100644 --- a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html +++ b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html @@ -30,4 +30,4 @@ But to truly understand the field, we must look at the pivotal models that explo 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/technical-deep-dive-llm-categorization/index.html b/posts/technical-deep-dive-llm-categorization/index.html new file mode 100644 index 0000000..72ff7b8 --- /dev/null +++ b/posts/technical-deep-dive-llm-categorization/index.html @@ -0,0 +1,76 @@ +From Gemini-3-Flash to T5-Gemma-2 A Journey in Distilling a Family Finance LLM · Eric X. Liu's Personal Page

    From Gemini-3-Flash to T5-Gemma-2 A Journey in Distilling a Family Finance LLM

    Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and “wait, was this dinner or vacation dinner?” questions.

    For years, I relied on a rule-based system to categorize our credit card transactions. It worked… mostly. But maintaining if "UBER" in description and amount > 50 style rules is a never-ending battle against the entropy of merchant names and changing habits.

    Recently, I decided to modernize this stack using Large Language Models (LLMs). This post details the technical journey from using an off-the-shelf commercial model to distilling that knowledge into a small, efficient local model (google/t5gemma-2-270m) that runs on my own hardware while maintaining high accuracy.

    Phase 1: The Proof of Concept with Commercial LLMs + +Link to heading

    My first step was to replace the spaghetti code of regex rules with a prompt. I used Gemini-3-Flash (via litellm) as my categorization engine.

    The core challenge was context. A transaction like MCDONALDS could be:

    • Dining: A quick lunch during work.
    • Travel-Dining: A meal while on a road trip.

    To solve this, I integrated my private Google Calendar (via .ics export). The prompt doesn’t just see the transaction; it sees where I was and what I was doing on that day.

    The “God Prompt” + +Link to heading

    The system prompt was designed to return strict JSON, adhering to a schema of Categories (e.g., Dining, Travel, Bills) and Sub-Categories (e.g., Travel -> Accommodation).

    {
    +  "Category": "Travel",
    +  "Travel Category": "Dining",
    +  "Reasoning": "User is on 'Trip: 34TH ARCH CANYON 2025', distinguishing this from regular dining."
    +}
    +

    This worked well. The “Reasoning” field even gave me explanations for why it flagged something as Entertainment vs Shopping. But relying on an external API for every single transaction felt like overkill for a personal project, and I wanted to own the stack.

    Phase 2: Distilling Knowledge + +Link to heading

    I wanted to train a smaller model to mimic Gemini’s performance. But I didn’t want to manually label thousands of transactions.

    Consistency Filtering + +Link to heading

    I had a massive CSV of historical transactions (years of data). However, that data was “noisy”—some manual labels were outdated or inconsistent.

    I built a Distillation Pipeline (distill_reasoning.py) that uses the Teacher Model (Gemini) to re-label the historical data. But here’s the twist: I only added a data point to my training set if the Teacher’s prediction matched the Historical Ground Truth.

    # Pseudo-code for consistency filtering
    +teacher_pred = gemini.categorize(transaction)
    +historical_label = row['Category']
    +
    +if teacher_pred.category == historical_label:
    +    # High confidence sample!
    +    training_data.append({
    +        "input": format_transaction(transaction),
    +        "output": teacher_pred.to_json()
    +    })
    +else:
    +    # Discard: Either history is wrong OR teacher hallucinated.
    +    log_fail(transaction)
    +

    This filtered out the noise, leaving me with ~2,000 high-quality, “verified” examples where both the human (me, years ago) and the AI agreed.

    Phase 3: Training the Little Guy + +Link to heading

    For the local model, I chose google/t5gemma-2-270m. This is a Seq2Seq model, which fits the “Text-to-JSON” task perfectly, and it’s tiny (270M parameters), meaning it can run on almost anything.

    The Stack + +Link to heading

    • Library: transformers, peft, bitsandbytes
    • Technique: LoRA (Low-Rank Adaptation). I targeted all linear layers (q_proj, k_proj, v_proj, etc.) with r=16.
    • Optimization: AdamW with linear decay.

    Pitfall #1: The “Loss is 0” Initial Panic + +Link to heading

    My first training run showed a loss of exactly 0.000 essentially immediately. In deep learning, if it looks too good to be true, it’s a bug. +It turned out to be a syntax error in my arguments passed to the Trainer (or rather, my custom loop). Once fixed, the loss looked “healthy”—starting high and decaying noisily.

    Pitfall #2: Stability vs. Noise + +Link to heading

    The loss curve was initially extremely erratic. The batch size on my GPU was limited (Physical Batch Size = 4). +The Fix: I implemented Gradient Accumulation (accumulating over 8 steps) to simulate a batch size of 32. This smoothed out the optimization landscape significantly. +S3 File

    Pitfall #3: Overfitting + +Link to heading

    With a small dataset (~2k samples), overfitting is a real risk. I employed a multi-layered defense strategy:

    1. Data Quality First: The “Consistency Filtering” phase was the most critical step. By discarding ambiguous samples where the teacher model disagreed with history, I prevented the model from memorizing noise.
    2. Model Regularization:
      • LoRA Dropout: I set lora_dropout=0.1, randomly dropping 10% of the trainable adapter connections during training to force robust feature learning.
      • Gradient Clipping: We capped the gradient norm at 1.0. This prevents the “exploding gradient” problem and keeps weight updates stable.
      • AdamW: Using the AdamW optimizer adds decoupled weight decay, implicitly penalizing overly complex weights.

    I also set up a rigorous evaluation loop (10% validation split, eval every 50 steps) to monitor the Train Loss vs Eval Loss in real-time. The final curves showed them tracking downwards together, confirming generalization.

    Phase 4: Results and The “Travel” Edge Case + +Link to heading

    The distilled model is surprisingly capable. It learned the JSON schema very well. Although I included a regex fallback in the inference script as a safety net, the model generates valid JSON the vast majority of the time.

    Head-to-Head: Local Model vs Gemini-Flash + +Link to heading

    I ran a blind evaluation on 20 random unseen transactions.

    • Gemini-3-Flash Accuracy: 90% (18/20)
    • Local T5-Gemma-2 Accuracy: 85% (17/20)

    The gap is surprisingly small. In fact, the local model sometimes outperformed the API because it was fine-tuned on my specific data distribution.

    Win for Local Model:

    Transaction: XX RANCH #1702 +Local Prediction: Groceries (Correct) +API Prediction: Gas (Incorrect) +Local Reasoning: " XX RANCH refers to a well-known supermarket chain. +API Reasoning: “XX RANCH is a known convenience store and gas station chain.” +Analysis: The local model “knows” (from training data) that XX Ranch is a Asian grocery store I frequent, whereas the general-purpose API assumed it was a gas station based on the name pattern.

    Win for API (World Knowledge):

    Transaction: LOVE'S #0792 +Local Prediction: Dining (Hallucination) +API Prediction: Travel-Gas (Correct) +Local Reasoning: “Love’s is a well-known restaurant chain, which falls under the Dining category.” +API Reasoning: “Love’s is a well-known gas station chain, and the transaction occurred during a trip to Moab, categorizing it as travel-related fuel.” +Analysis: The API knows “Love’s” is a major gas station chain. The small local model lacks this world knowledge and hallucinates it as a restaurant, highlighting the pure “Knowledge Gap” between a 270M and a 70B+ model. Additionally, Gemini Flash has Google Search grounding enabled, allowing it to verify real-world entities in real-time—a capability our isolated local model intrinsically lacks.

    Surprise Win: JSON Stability + +Link to heading

    One pleasant surprise was the format adherence. I initially feared I’d need constrained generation tools like outlines or a simplified schema for a 270M parameter model. However, the distilled T5-Gemma model followed the complex JSON schema (including nested fields) with near-perfect reliability, proving that specific structure can be learned effectively through fine-tuning alone.

    Key Lesson: The “Noisy Ground Truth” Trap + +Link to heading

    Since this is a distillation (SFT) pipeline, not Reinforcement Learning, the model has no way to “unlearn” bad habits via negative rewards. It relies entirely on the quality of the teacher’s reasoning.

    Transaction: [TRAVEL] SWEETHOME KITCHEN +Local Prediction: Dining +API Prediction: Travel-Dining +Local Reasoning: “The description ‘SWEETHOME KITCHEN’ indicates a restaurant or dining establishment, which falls under the Dining category.” +API Reasoning: “The transaction is for a kitchen/restaurant and occurred while the user was traveling to Pfeiffer Big Sur SP, making it a travel-related dining expense.”

    In this case, the API correctly used the calendar context (“User is in Big Sur”). The local model missed this link. This highlights that simply having the data isn’t enough—the reasoning in the training set must explicitly force the model to look at the context, or it will revert to simple pattern matching (Kitchen = Dining).

    Conclusion + +Link to heading

    We often think we need 70B parameter models for everything. usage shows that for a specific, well-defined task with consistent formatting, a 270M parameter model—fine-tuned on high-quality, distilled data—can punch way above its weight class.

    The key was data quality over quantity. By using the commercial model to “verify” my historical data, I created a dataset that was cleaner than either source alone.

    \ No newline at end of file diff --git a/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/index.html b/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/index.html index 3a8803e..1bc4ad3 100644 --- a/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/index.html +++ b/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/index.html @@ -26,4 +26,4 @@ This article explores the mathematical equivalence between Hinton’s concept of 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/transformer-s-core-mechanics/index.html b/posts/transformer-s-core-mechanics/index.html index 670c75f..cbebc79 100644 --- a/posts/transformer-s-core-mechanics/index.html +++ b/posts/transformer-s-core-mechanics/index.html @@ -36,4 +36,4 @@ In deep learning, a “channel” can be thought of as a feature dimensi 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/unifi-vlan-migration-to-zone-based-architecture/index.html b/posts/unifi-vlan-migration-to-zone-based-architecture/index.html index 4076fbe..5aee0b4 100644 --- a/posts/unifi-vlan-migration-to-zone-based-architecture/index.html +++ b/posts/unifi-vlan-migration-to-zone-based-architecture/index.html @@ -28,4 +28,4 @@ This article documents that journey. It details the pitfalls encountered, the co 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/useful/index.html b/posts/useful/index.html index 7c04ff6..c1815ce 100644 --- a/posts/useful/index.html +++ b/posts/useful/index.html @@ -9,4 +9,4 @@ One-minute read
    • [79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/posts/vattention/index.html b/posts/vattention/index.html index de2e1b3..e66611b 100644 --- a/posts/vattention/index.html +++ b/posts/vattention/index.html @@ -31,4 +31,4 @@ The GPU TLB hierarchy is sensitive to page sizes.

      • 4KB Pages:< 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/series/index.html b/series/index.html index e91bfc1..440892e 100644 --- a/series/index.html +++ b/series/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index e060d20..94f6324 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -1 +1 @@ -https://ericxliu.me/about/2025-12-20T09:52:07-08:00weekly0.5https://ericxliu.me/2025-12-20T09:52:07-08:00weekly0.5https://ericxliu.me/posts/2025-12-19T21:21:55+00:00weekly0.5https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/2025-12-19T21:21:55+00:00weekly0.5https://ericxliu.me/posts/vattention/2025-12-19T21:21:55+00:00weekly0.5https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/2025-10-04T20:41:50+00:00weekly0.5https://ericxliu.me/posts/flashing-jetson-orin-nano-in-virtualized-environments/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/openwrt-mwan3-wireguard-endpoint-exclusion/2025-10-02T08:34:05+00:00weekly0.5https://ericxliu.me/posts/unifi-vlan-migration-to-zone-based-architecture/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/quantization-in-llms/2025-08-20T06:02:35+00:00weekly0.5https://ericxliu.me/posts/breville-barista-pro-maintenance/2025-08-20T06:04:36+00:00weekly0.5https://ericxliu.me/posts/secure-boot-dkms-and-mok-on-proxmox-debian/2025-08-14T06:50:22+00:00weekly0.5https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/2025-08-08T17:36:52+00:00weekly0.5https://ericxliu.me/posts/supabase-deep-dive/2025-08-04T03:59:37+00:00weekly0.5https://ericxliu.me/posts/ppo-for-language-models/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/2025-08-03T06:02:48+00:00weekly0.5https://ericxliu.me/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/2025-08-03T03:41:10+00:00weekly0.5https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/2025-08-03T04:20:20+00:00weekly0.5https://ericxliu.me/posts/transformer-s-core-mechanics/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/useful/2025-08-03T08:37:28-07:00weekly0.5https://ericxliu.me/authors/weekly0.5https://ericxliu.me/categories/weekly0.5https://ericxliu.me/series/weekly0.5https://ericxliu.me/tags/weekly0.5 \ No newline at end of file +https://ericxliu.me/about/2025-12-20T09:52:07-08:00weekly0.5https://ericxliu.me/2025-12-27T21:18:10+00:00weekly0.5https://ericxliu.me/posts/2025-12-27T21:18:10+00:00weekly0.5https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/2025-12-19T21:21:55+00:00weekly0.5https://ericxliu.me/posts/technical-deep-dive-llm-categorization/2025-12-27T21:18:10+00:00weekly0.5https://ericxliu.me/posts/vattention/2025-12-19T21:21:55+00:00weekly0.5https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/2025-10-04T20:41:50+00:00weekly0.5https://ericxliu.me/posts/flashing-jetson-orin-nano-in-virtualized-environments/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/openwrt-mwan3-wireguard-endpoint-exclusion/2025-10-02T08:34:05+00:00weekly0.5https://ericxliu.me/posts/unifi-vlan-migration-to-zone-based-architecture/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/quantization-in-llms/2025-08-20T06:02:35+00:00weekly0.5https://ericxliu.me/posts/breville-barista-pro-maintenance/2025-08-20T06:04:36+00:00weekly0.5https://ericxliu.me/posts/secure-boot-dkms-and-mok-on-proxmox-debian/2025-08-14T06:50:22+00:00weekly0.5https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/2025-08-08T17:36:52+00:00weekly0.5https://ericxliu.me/posts/supabase-deep-dive/2025-08-04T03:59:37+00:00weekly0.5https://ericxliu.me/posts/ppo-for-language-models/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/2025-08-03T06:02:48+00:00weekly0.5https://ericxliu.me/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/2025-08-03T03:41:10+00:00weekly0.5https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/2025-08-03T04:20:20+00:00weekly0.5https://ericxliu.me/posts/transformer-s-core-mechanics/2025-10-02T08:42:39+00:00weekly0.5https://ericxliu.me/posts/useful/2025-08-03T08:37:28-07:00weekly0.5https://ericxliu.me/authors/weekly0.5https://ericxliu.me/categories/weekly0.5https://ericxliu.me/series/weekly0.5https://ericxliu.me/tags/weekly0.5 \ No newline at end of file diff --git a/tags/index.html b/tags/index.html index bcb4bfd..1cf3308 100644 --- a/tags/index.html +++ b/tags/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[79473f5] \ No newline at end of file +[cd4cace] \ No newline at end of file