Eric X. Liu's Personal Page

Hacking a Chinese Car Stereo to fulfill my Knight Rider dreams

Wed, 21 Jan 2026 00:00:00 +0000

“Vibe coding” has become my latest obsession. It’s that flow state where the tools disappear, and you’re just manipulating logic at the speed of thought. Usually, this happens in a high-end IDE like Antigravity. But lately, I’ve been trying to answer a childhood dream.

Growing up in China before the internet age, my window to the outside world was CCTV-6. Along with Baywatch, one of the first American TV shows I ever watched was Knight Rider. I don’t remember the exact plot lines, but the core concept stuck with me forever: KITT. A car that could talk, think, and do things for you.

How I Built a Blog Agent that Writes About Itself

Fri, 16 Jan 2026 00:00:00 +0000

I’ve been spending a lot of time “vibe coding” in the Antigravity IDE lately. It’s an incredible flow state—intense, iterative, and fast. But it has a major flaw: the context is ephemeral. Once the session is over, that rich history of decisions, wrong turns, and “aha!” moments is locked away in an opaque, internal format.

I wanted to capture that value. I wanted a system that could take my chaotic coding sessions and distill them into structured, technical blog posts (like the one you’re reading right now).

Why I Downgraded Magisk to Root My Pixel 2 XL

Wed, 07 Jan 2026 00:00:00 +0000

For the past few weeks, I’ve been stuck in a stalemate with my EcoFlow Bluetooth Protocol Reverse Engineering Project. I have the hci snoop logs, I have the decompiled APK, and I have a strong suspicion about where the authentication logic is hiding. But suspicion isn’t proof.

Static analysis has its limits. I found the “smoking gun” function—a native method responsible for encrypting the login payload—but understanding how it constructs that payload within a strict 13-byte limit purely from assembly (ARM64) was proving to be a headache.

Why Your "Resilient" Homelab is Slower Than a Raspberry Pi

Fri, 02 Jan 2026 00:00:00 +0000

In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running “production” at home, there is only one metric that truly matters: The Wife Acceptance Factor (WAF).

My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was “slow sometimes.” She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage.

How I Got Open WebUI Talking to OpenAI Web Search

Mon, 29 Dec 2025 00:00:00 +0000

OpenAI promised native web search in GPT‑5, but LiteLLM proxy deployments (and by extension Open WebUI) still choke on it—issue #13042 tracks the fallout. I needed grounded answers inside Open WebUI anyway, so I built a workaround: route GPT‑5 traffic through the Responses API and mask every web_search_call before the UI ever sees it.

This post documents the final setup, the hotfix script that keeps LiteLLM honest, and the tests that prove Open WebUI now streams cited answers without trying to execute the tool itself.

From Gemini-3-Flash to T5-Gemma-2: A Journey in Distilling a Family Finance LLM

Sat, 27 Dec 2025 00:00:00 +0000

Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and “wait, was this dinner or vacation dinner?” questions.

For years, I relied on a rule-based system to categorize our credit card transactions. It worked… mostly. But maintaining if "UBER" in description and amount > 50 style rules is a never-ending battle against the entropy of merchant names and changing habits.

About

Fri, 19 Dec 2025 22:46:12 -0800

Hi, I’m Eric Liu.

I am a Staff Software Engineer and Tech Lead Manager (TLM) at Google, based in Sunnyvale, CA.

My work focuses on Infrastructure Performance and Customer Engineering, specifically for GPUs and TPUs. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it’s debugging race conditions across thousands of chips or designing API surfaces for next-gen models.

The Convergence of Fast Weights, Linear Attention, and State Space Models

Fri, 19 Dec 2025 00:00:00 +0000

Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (“Fast Weights”) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).

This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.

vAttention

Mon, 08 Dec 2025 00:00:00 +0000

Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While PagedAttention (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.

The Status Quo: PagedAttention and Software Tables Link to heading

Prior to PagedAttention, systems allocated contiguous memory for the maximum possible context length, leading to severe fragmentation and wasted memory. PagedAttention addressed this by chunking the KV cache into non-contiguous blocks, managed by a software-defined “page table” (the Block Table) [1].

Setting Up Jellyfin SSO with Authentik: Surviving the Beta

Sat, 15 Nov 2025 00:00:00 +0000

I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren’t immediately obvious.

The Setup Link to heading

The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent.

Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)

Sat, 04 Oct 2025 00:00:00 +0000

Introduction Link to heading

NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.

After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.

Flashing Jetson Orin Nano in Virtualized Environments

Thu, 02 Oct 2025 00:00:00 +0000

Flashing Jetson Orin Nano in Virtualized Environments Link to heading

Introduction Link to heading

Flashing NVIDIA Jetson devices remotely presents unique challenges when the host machine is virtualized. This article documents the technical challenges, failures, and eventual success of flashing a Jetson Orin Nano Super developer kit using NVIDIA SDK Manager in various virtualized environments, specifically focusing on QEMU/KVM virtual machines and LXC containers on Proxmox VE.

OpenWrt: Fix WireGuard Connectivity with MWAN3 by Excluding the VPN Endpoint

Sun, 28 Sep 2025 00:00:00 +0000

Overview Link to heading

When using WireGuard together with MWAN3 on OpenWrt, the tunnel can fail to establish or flap when the peer’s IP is routed into the tunnel itself. This is a classic routing bootstrap problem: WireGuard wants to route 0.0.0.0/0 into the tunnel, but the UDP packets to the peer’s public endpoint also get captured, so they never reach the Internet to bring the tunnel up.

UniFi VLAN Migration to Zone-Based Architecture

Mon, 22 Sep 2025 00:00:00 +0000

Embarking on a network migration to a properly segmented VLAN architecture is a rite of passage for any serious home lab or small business operator. The goal is clear: improve security and organization by separating traffic. However, the path from a flat network to a segmented one is often paved with subtle but critical configuration details that can lead to hours of frustrating troubleshooting.

This article documents that journey. It details the pitfalls encountered, the core networking concepts that were essential to understand, and the best practices that ultimately led to a stable, secure, and logical network design built on a zone-based firewall model.

Quantization in LLMs

Tue, 19 Aug 2025 00:00:00 +0000

The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.

Breville Barista Pro Maintenance

Sat, 16 Aug 2025 00:00:00 +0000

Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.

Understanding the Two Primary Maintenance Cycles Link to heading

The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.

Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian

Sat, 09 Aug 2025 00:00:00 +0000

I hit an issue where all GPU Operator pods on one node were stuck in Init after migrating from Legacy BIOS to UEFI. The common error was NVIDIA components waiting for “toolkit-ready,” while the toolkit init container looped with:

nvidia-smi failed to communicate with the NVIDIA driver
modprobe nvidia → “Key was rejected by service”

That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.

Beyond Words: How RVQ Teaches LLMs to See and Hear

Thu, 07 Aug 2025 00:00:00 +0000

Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?

The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is Residual Vector Quantization (RVQ).

Supabase Deep Dive: It's Not Magic, It's Just Postgres

Sun, 03 Aug 2025 00:00:00 +0000

In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what’s really going on.

Supabase enters this space with a radically different philosophy: transparency. It provides the convenience of a BaaS, but it’s built on the world’s most trusted relational database: PostgreSQL. The “magic” isn’t a proprietary black box; it’s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.

A Deep Dive into PPO for Language Models

Sat, 02 Aug 2025 00:00:00 +0000

Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don’t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).

You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.

Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice

Wed, 02 Jul 2025 00:00:00 +0000

Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called “experts”) to specialize in different types of inputs. A “gating network” or “router” learns to dispatch each input (or “token”) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.

1. Challenge: Non-Differentiability of Routing Functions Link to heading

The Problem: Many routing mechanisms, especially “Top-K routing,” involve a discrete, hard selection process. A common function is KeepTopK(v, k), which selects the top k scoring elements from a vector v and sets others to $-\infty$ or $0$.

An Architectural Deep Dive of T5

Sun, 01 Jun 2025 00:00:00 +0000

In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the “decoder-only” model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.

But to truly understand the field, we must look at the pivotal models that explored different paths. Google’s T5, or Text-to-Text Transfer Transformer, stands out as one of the most influential. It didn’t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.

Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso

Thu, 01 May 2025 00:00:00 +0000

Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.

Our overarching philosophy is simple: isolate and change only one variable at a time. While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your grind size is your most powerful lever.

Transformer's Core Mechanics

Tue, 01 Apr 2025 00:00:00 +0000

The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of “channels” to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.

1. The “Channel”: A Foundational View of `d_model` Link to heading

In deep learning, a “channel” can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model’s primary embedding dimension, commonly referred to as d_model.

Some useful files

Mon, 26 Oct 2020 04:14:43 +0000

rootCA.pem

Eric X. Liu's Personal Page

Hacking a Chinese Car Stereo to fulfill my Knight Rider dreams

How I Built a Blog Agent that Writes About Itself

Why I Downgraded Magisk to Root My Pixel 2 XL

Why Your "Resilient" Homelab is Slower Than a Raspberry Pi

How I Got Open WebUI Talking to OpenAI Web Search

From Gemini-3-Flash to T5-Gemma-2: A Journey in Distilling a Family Finance LLM

About

The Convergence of Fast Weights, Linear Attention, and State Space Models

vAttention

The Status Quo: PagedAttention and Software Tables Link to heading

Setting Up Jellyfin SSO with Authentik: Surviving the Beta

The Setup Link to heading

Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)

Introduction Link to heading

Flashing Jetson Orin Nano in Virtualized Environments

Flashing Jetson Orin Nano in Virtualized Environments Link to heading

Introduction Link to heading

OpenWrt: Fix WireGuard Connectivity with MWAN3 by Excluding the VPN Endpoint

Overview Link to heading

UniFi VLAN Migration to Zone-Based Architecture

Quantization in LLMs

Breville Barista Pro Maintenance

Understanding the Two Primary Maintenance Cycles Link to heading

Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian

Beyond Words: How RVQ Teaches LLMs to See and Hear

Supabase Deep Dive: It's Not Magic, It's Just Postgres

A Deep Dive into PPO for Language Models

Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice

1. Challenge: Non-Differentiability of Routing Functions Link to heading

An Architectural Deep Dive of T5

Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso

Transformer's Core Mechanics

1. The “Channel”: A Foundational View of d_model Link to heading

Some useful files

1. The “Channel”: A Foundational View of `d_model` Link to heading