deploy: f1178d37f5

2025-12-29 07:17:38 +00:00
parent 0d2993f39b
commit 346f1f1450
32 changed files with 129 additions and 37 deletions
--- a/posts/index.xml
+++ b/posts/index.xml
@@ -1,4 +1,5 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Eric X. Liu's Personal Page</title><link>https://ericxliu.me/posts/</link><description>Recent content in Posts on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 28 Dec 2025 21:21:42 +0000</lastBuildDate><atom:link href="https://ericxliu.me/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>From Gemini-3-Flash to T5-Gemma-2 A Journey in Distilling a Family Finance LLM</title><link>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</link><pubDate>Sat, 27 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</guid><description>&lt;p&gt;Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and &amp;ldquo;wait, was this dinner or &lt;em&gt;vacation&lt;/em&gt; dinner?&amp;rdquo; questions.&lt;/p&gt;
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Eric X. Liu's Personal Page</title><link>https://ericxliu.me/posts/</link><description>Recent content in Posts on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 29 Dec 2025 07:15:58 +0000</lastBuildDate><atom:link href="https://ericxliu.me/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>How I Got Open WebUI Talking to OpenAI Web Search</title><link>https://ericxliu.me/posts/open-webui-openai-websearch/</link><pubDate>Mon, 29 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/open-webui-openai-websearch/</guid><description>&lt;p&gt;OpenAI promised native web search in GPT‑5, but LiteLLM proxy deployments (and by extension Open WebUI) still choke on it—issue &lt;a href="https://github.com/BerriAI/litellm/issues/13042" class="external-link" target="_blank" rel="noopener"&gt;#13042&lt;/a&gt; tracks the fallout. I needed grounded answers inside Open WebUI anyway, so I built a workaround: route GPT‑5 traffic through the Responses API and mask every &lt;code&gt;web_search_call&lt;/code&gt; before the UI ever sees it.&lt;/p&gt;
+&lt;p&gt;This post documents the final setup, the hotfix script that keeps LiteLLM honest, and the tests that prove Open WebUI now streams cited answers without trying to execute the tool itself.&lt;/p&gt;</description></item><item><title>From Gemini-3-Flash to T5-Gemma-2 A Journey in Distilling a Family Finance LLM</title><link>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</link><pubDate>Sat, 27 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</guid><description>&lt;p&gt;Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and &amp;ldquo;wait, was this dinner or &lt;em&gt;vacation&lt;/em&gt; dinner?&amp;rdquo; questions.&lt;/p&gt;
 &lt;p&gt;For years, I relied on a rule-based system to categorize our credit card transactions. It worked&amp;hellip; mostly. But maintaining &lt;code&gt;if &amp;quot;UBER&amp;quot; in description and amount &amp;gt; 50&lt;/code&gt; style rules is a never-ending battle against the entropy of merchant names and changing habits.&lt;/p&gt;</description></item><item><title>The Convergence of Fast Weights, Linear Attention, and State Space Models</title><link>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</guid><description>&lt;p&gt;Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&amp;ldquo;Fast Weights&amp;rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).&lt;/p&gt;
 &lt;p&gt;This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.&lt;/p&gt;</description></item><item><title>vAttention</title><link>https://ericxliu.me/posts/vattention/</link><pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/vattention/</guid><description>&lt;p&gt;Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While &lt;strong&gt;PagedAttention&lt;/strong&gt; (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.&lt;/p&gt;
 &lt;h4 id="the-status-quo-pagedattention-and-software-tables"&gt;