Compare commits
127 Commits
master
...
gitea-page
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
bd862cb238 | ||
|
|
293f0bfa77 | ||
|
|
c15d37458e | ||
|
|
07438a27e9 | ||
|
|
4355096bdc | ||
|
|
9c66ed1b1b | ||
|
|
598c74df0a | ||
|
|
41ec0626e2 | ||
|
|
346f1f1450 | ||
|
|
0d2993f39b | ||
|
|
786f535c82 | ||
|
|
ab14cbc592 | ||
|
|
dc0feb72a8 | ||
|
|
8bf55a3b50 | ||
|
|
c75c89c088 | ||
|
|
4c7d9f4905 | ||
|
|
cde81e78d7 | ||
|
|
008e4afff6 | ||
|
|
ff2b69c081 | ||
|
|
2cbf345452 | ||
|
|
9616c3681f | ||
|
|
2d7d143cbf | ||
|
|
6e752d8af2 | ||
|
|
e48bde719b | ||
|
|
d9dccae876 | ||
|
|
960c082536 | ||
|
|
3e84d0613e | ||
|
|
645963ca87 | ||
|
|
22b2a53fc9 | ||
|
|
184c07ebff | ||
|
|
40a88799ee | ||
|
|
19d2678a16 | ||
|
|
175644c1bf | ||
|
|
c8d7b92351 | ||
|
|
7a88de8adc | ||
|
|
7864b7a14d | ||
|
|
7ff7d71dcb | ||
|
|
48268a2fc1 | ||
|
|
4808a62cd0 | ||
|
|
811c80144e | ||
|
|
ad8faa17fc | ||
|
|
66d0011843 | ||
|
|
f0b04beb1f | ||
|
|
dbe2d5d1b0 | ||
|
|
2aadf95801 | ||
|
|
ea9c28dce4 | ||
|
|
1be19a7328 | ||
|
|
073fbfe081 | ||
|
|
ed03d0a873 | ||
|
|
798e6c7d75 | ||
|
|
dff213a604 | ||
|
|
238fcb29b4 | ||
|
|
dc3978a294 | ||
|
|
6dfed70e80 | ||
|
|
596dc4948b | ||
|
|
cb921d30e0 | ||
|
|
50e9f52f56 | ||
|
|
b4e2b7f818 | ||
|
|
c2b8a4f233 | ||
|
|
8d18da2143 | ||
|
|
34ee48a56c | ||
|
|
df3c006010 | ||
|
|
c8813b97f3 | ||
|
|
52a6e87d0d | ||
|
|
5e1e4efc08 | ||
|
|
f50ba780e1 | ||
|
|
a9192dd7da | ||
|
|
a50fee0dcf | ||
|
|
9454edc7ed | ||
|
|
9efdd85826 | ||
|
|
95df119b6d | ||
|
|
a6a4ee4adb | ||
|
|
a977deebd1 | ||
|
|
8c3be83b91 | ||
|
|
76c539f415 | ||
|
|
c1be16072c | ||
|
|
11b8ac016c | ||
|
|
d03a2c49dd | ||
|
|
0ae24eb647 | ||
|
|
ce7b6b17b2 | ||
|
|
ef26adac81 | ||
|
|
fb47a09d9b | ||
|
|
b98d88fd0f | ||
|
|
144a1b1692 | ||
|
|
df6ffb4bc0 | ||
|
|
219a24e3a5 | ||
|
|
335ed1d107 | ||
|
|
8f3c545991 | ||
|
|
e8ae2242e3 | ||
|
|
d801fe9307 | ||
|
|
20c1888f78 | ||
|
|
9603629d20 | ||
|
|
e60475c8ac | ||
|
|
e83c0477c7 | ||
|
|
e86aa5f8cb | ||
|
|
8832dff8d6 | ||
|
|
ebed172a21 | ||
|
|
e7fda8a866 | ||
|
|
a147bbd8c4 | ||
|
|
38518686d9 | ||
|
|
38703cd607 | ||
|
|
22b4234f06 | ||
|
|
ef9bc708e1 | ||
| 61a3e5a38d | |||
| 303714c386 | |||
| 203b36bc6c | |||
| 085d1dd3f7 | |||
| 1f3238519a | |||
| 7ab352cdde | |||
| e5c7ad2ee3 | |||
| b14698604d | |||
| 482899015a | |||
| 396b46d31e | |||
| 1d53e2965c | |||
| 77bd58c48f | |||
| 4e79964a24 | |||
| 7667b0ebf3 | |||
| a9765b4d5b | |||
| d5b6868b70 | |||
| bd7fe9345f | |||
| f20a18d653 | |||
| c6d8e2aae6 | |||
| 685d7272e1 | |||
| 2f5387a7a3 | |||
| 4b1dd1a9bf | |||
| c05622c64f | |||
| b562560bbb |
104
.cursorrules
@@ -1,104 +0,0 @@
|
||||
# Hugo Site Development Rules
|
||||
|
||||
## Project Overview
|
||||
This is a Hugo static site using the hugo-coder theme with Obsidian markdown compatibility.
|
||||
|
||||
## Hugo Best Practices
|
||||
|
||||
### Content Creation
|
||||
- **DO** place all content files in `content/` directory
|
||||
- **DO** use front matter with `title`, `date`, and `draft` fields
|
||||
- **DO** set `draft: false` for published content
|
||||
- **DO** use lowercase filenames with hyphens (e.g., `my-post.md`)
|
||||
- **DON'T** create content files outside the `content/` directory
|
||||
|
||||
### Markdown Usage
|
||||
- **DO** use standard markdown syntax
|
||||
- **DO** use `$$` for block math and `$` for inline math
|
||||
- **DO** use `- [ ]` and `- [x]` for task lists
|
||||
- **DO** use `==text==` for highlighting
|
||||
- **DO** use footnotes with `[^1]` syntax
|
||||
- **DON'T** use `$$$$` as special delimiters (not supported)
|
||||
- **DON'T** rely on Obsidian-specific features like wiki-links `[[]]`
|
||||
|
||||
### Theme Customization
|
||||
- **DO** override theme files by creating matching structure in `layouts/`
|
||||
- **DO** place custom partials in `layouts/partials/`
|
||||
- **DO** use `static/` for static assets (images, CSS, JS)
|
||||
- **DON'T** modify files directly in `themes/` directory
|
||||
- **DON'T** commit theme modifications
|
||||
|
||||
### Configuration
|
||||
- **DO** use `config.toml` for site configuration
|
||||
- **DO** test configuration changes locally before deploying
|
||||
- **DO** enable features in `[markup.goldmark.extensions]` for Obsidian compatibility
|
||||
- **DON'T** modify theme configuration files directly
|
||||
|
||||
### Development Workflow
|
||||
- **DO** run `hugo server` for local development
|
||||
- **DO** use `hugo --logLevel info` for detailed build output
|
||||
- **DO** test builds with `hugo` before deployment
|
||||
- **DON'T** commit the `public/` directory (build output)
|
||||
- **DON'T** commit temporary Hugo binaries
|
||||
|
||||
### File Organization
|
||||
```
|
||||
├── content/ # All markdown content
|
||||
│ ├── posts/ # Blog posts
|
||||
│ └── about.md # Static pages
|
||||
├── layouts/ # Custom theme overrides
|
||||
│ └── partials/ # Custom partial templates
|
||||
├── static/ # Static assets
|
||||
│ └── images/ # Image files
|
||||
├── themes/ # Hugo themes (don't modify)
|
||||
└── config.toml # Site configuration
|
||||
```
|
||||
|
||||
### Math and Special Content
|
||||
- **DO** enable math with `math = true` in front matter or site config
|
||||
- **DO** use KaTeX-compatible LaTeX syntax
|
||||
- **DO** test math rendering after changes
|
||||
- **DON'T** assume all LaTeX packages are available
|
||||
|
||||
### Performance
|
||||
- **DO** optimize images before adding to `static/`
|
||||
- **DO** use appropriate image formats (WebP, PNG, JPG)
|
||||
- **DO** minimize custom CSS/JS
|
||||
- **DON'T** add unnecessary JavaScript libraries
|
||||
|
||||
### SEO and Metadata
|
||||
- **DO** include descriptive titles and descriptions
|
||||
- **DO** use proper heading hierarchy (H1 -> H2 -> H3)
|
||||
- **DO** add alt text to images
|
||||
- **DON'T** duplicate titles across pages
|
||||
|
||||
### Common Pitfalls to Avoid
|
||||
- **DON'T** use absolute paths in content (use relative paths)
|
||||
- **DON'T** assume Obsidian plugins work in Hugo
|
||||
- **DON'T** use Hugo-specific shortcodes without testing
|
||||
- **DON'T** modify theme files without creating proper overrides
|
||||
- **DON'T** forget to set `draft: false` for published content
|
||||
|
||||
### Git Workflow
|
||||
- **DO** commit source files (content, config, layouts)
|
||||
- **DO** use meaningful commit messages
|
||||
- **DON'T** commit build artifacts (`public/`, temporary files)
|
||||
- **DON'T** commit sensitive configuration (API keys, etc.)
|
||||
|
||||
### Testing Checklist
|
||||
Before deployment, verify:
|
||||
- [ ] All content renders correctly
|
||||
- [ ] Math formulas display properly
|
||||
- [ ] Images load correctly
|
||||
- [ ] Links work (internal and external)
|
||||
- [ ] Site builds without errors
|
||||
- [ ] Mobile responsiveness
|
||||
- [ ] Dark/light theme switching works
|
||||
|
||||
### Emergency Fixes
|
||||
If site breaks:
|
||||
1. Check `hugo --logLevel info` for build errors
|
||||
2. Verify `config.toml` syntax
|
||||
3. Check for missing front matter in content files
|
||||
4. Ensure all required assets exist in `static/`
|
||||
5. Test with `hugo server` locally first
|
||||
@@ -1,63 +0,0 @@
|
||||
name: Hugo Publish CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
build-and-deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Add SSH Key
|
||||
run: |
|
||||
echo "${{ secrets.SSH_KEY }}" > $HOME/.ssh/id_rsa
|
||||
chmod 600 $HOME/.ssh/id_rsa
|
||||
ssh-keyscan -H git.ericxliu.me >> $HOME/.ssh/known_hosts
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ssh-key: ${{ secrets.SSH_KEY }}
|
||||
persist-credentials: false
|
||||
submodules: true # Fetch Hugo themes (true OR recursive)
|
||||
fetch-depth: 0 # Fetch all history for .GitInfo and .Lastmod
|
||||
|
||||
- name: Fix nested quotes in post titles
|
||||
run: |
|
||||
# Find markdown files with titles containing at least 3 double quotes (indicating nesting)
|
||||
# and replace internal double quotes with single quotes.
|
||||
find content -type f -name "*.md" -print0 | \
|
||||
xargs -0 grep -lZ '^title: .*".*".*"' | \
|
||||
xargs -0 -r sed -i "/^title: \"/{s/\"/\\\\\"/g;s/^\(title: *\)\\\\\"/\1\"/;s/\\\\\" *$/\"/;}"
|
||||
|
||||
- name: Build site with Hugo
|
||||
uses: peaceiris/actions-hugo@v3
|
||||
with:
|
||||
hugo-version: "latest"
|
||||
extended: true
|
||||
|
||||
- name: Build
|
||||
run: hugo --minify
|
||||
|
||||
- name: Replace [commit] with short commit hash and hyperlink
|
||||
run: |
|
||||
SHORT_COMMIT=$(git rev-parse --short HEAD)
|
||||
COMMIT_URL="https://git.ericxliu.me/eric/ericxliu-me/commit/$SHORT_COMMIT"
|
||||
find ./public -type f -exec sed -i "s|\[commit\]|<a href=\"$COMMIT_URL\">\[$SHORT_COMMIT\]</a>|g" {} +
|
||||
|
||||
- name: Publish
|
||||
uses: peaceiris/actions-gh-pages@v4
|
||||
with:
|
||||
personal_token: ${{ secrets.GIT_PAGES_TOKEN }}
|
||||
publish_dir: ./public
|
||||
publish_branch: gitea-pages
|
||||
|
||||
- name: Deploy
|
||||
run: |
|
||||
K8S_TOKEN="${{secrets.K8S_TOKEN}}"
|
||||
echo "K8S_TOKEN length: $(echo "$K8S_TOKEN" | wc -c)"
|
||||
echo "K8S_TOKEN starts with: $(echo "$K8S_TOKEN" | head -c 20)..."
|
||||
curl -X DELETE "https://10.10.0.10:6443/api/v1/namespaces/hugo/pods?labelSelector=app.kubernetes.io/name=hugo" \
|
||||
--header "Authorization: Bearer $K8S_TOKEN" \
|
||||
--insecure --fail-with-body --show-error --verbose
|
||||
41
.gitignore
vendored
@@ -1,41 +0,0 @@
|
||||
# Generated files by hugo
|
||||
/public/
|
||||
/resources/_gen/
|
||||
/assets/jsconfig.json
|
||||
hugo_stats.json
|
||||
|
||||
# Executable may be added to repository
|
||||
hugo.exe
|
||||
hugo.darwin
|
||||
hugo.linux
|
||||
|
||||
# Temporary lock file while building
|
||||
/.hugo_build.lock
|
||||
|
||||
# General
|
||||
.DS_Store
|
||||
.AppleDouble
|
||||
.LSOverride
|
||||
|
||||
# Icon must end with two \r
|
||||
Icon
|
||||
|
||||
|
||||
# Thumbnails
|
||||
._*
|
||||
|
||||
# Files that might appear in the root of a volume
|
||||
.DocumentRevisions-V100
|
||||
.fseventsd
|
||||
.Spotlight-V100
|
||||
.TemporaryItems
|
||||
.Trashes
|
||||
.VolumeIcon.icns
|
||||
.com.apple.timemachine.donotpresent
|
||||
|
||||
# Directories potentially created on remote AFP share
|
||||
.AppleDB
|
||||
.AppleDesktop
|
||||
Network Trash Folder
|
||||
Temporary Items
|
||||
.apdisk
|
||||
6
.gitmodules
vendored
@@ -1,6 +0,0 @@
|
||||
[submodule "themes/hugo-coder"]
|
||||
path = themes/hugo-coder
|
||||
url = https://github.com/luizdepra/hugo-coder
|
||||
[submodule "themes/hugo-cloak-email"]
|
||||
path = themes/hugo-cloak-email
|
||||
url = https://github.com/martignoni/hugo-cloak-email
|
||||
@@ -1,2 +0,0 @@
|
||||
Pasted image 20250730232756.png|64bfdb4b-678e-4bfc-8b62-0c05c243f6a9.png
|
||||
Pasted image 20250816140700.png|.png
|
||||
@@ -1 +0,0 @@
|
||||
Pasted image 20250819211718.png|.png
|
||||
@@ -1,3 +0,0 @@
|
||||
image-b25565d6f47e1ba4ce2deca7e161726b86df356e.png|388f43c3f800483aae5ea487e8f45922.png|387cde4274484063c4c7e1f9f37c185a
|
||||
image-7913a54157c2f4b8d0b7f961640a9c359b2d2a4f.png|ee04876d75d247f9b27a647462555777.png|2371421b04f856f7910dc8b46a7a6fb9
|
||||
image-79378d40267258c0d8968238cc62bd197dc894fa.png|16d64bdc9cf14b05b7c40c4718b8091b.png|ff2625e796efd7187614b6e0a8542af6
|
||||
@@ -1 +0,0 @@
|
||||
image-7c88938eaa4db1b7eafc437b9067b8790998fc71.png|2803b917b5794452870bc8a0aa896381.png|dd23c4ffd5f4e6bdec5dc03ba85140c8
|
||||
@@ -1,2 +0,0 @@
|
||||
Pasted image 20250816140700.png|.png
|
||||
image-3632d923eed983f171fba4341825273101f1fc94.png|7713bd3ecf27442e939b9190fa08165d.png|6db5ae66ae4b0212cd6c93ff12d3dc8f
|
||||
@@ -1 +0,0 @@
|
||||
image-1b23344ea5541d156e5ac20823d12d7c6723b691.png|eedb3be8259a4a70aa7029b78a029364.png|e0fb329f437f21bc3385472bfeb91597
|
||||
@@ -1,2 +0,0 @@
|
||||
Pasted image 20250819211718.png|.png
|
||||
image-c64b0f9df1e4981c4ecdb3b60e8bc78c426ffa68.png|c7fe4af2633840cfbc81d7c4e3e42d0c.png|42301b756414623256388f1cffc6b76f
|
||||
@@ -1,2 +0,0 @@
|
||||
image-167d5cef9e79e622fff779f3671492a8a5a343ea.png|472bf0cd504f4cd7ab7a33cd3322a5f1.png|36ef949c96dde80394f9ad066f5972a5
|
||||
image-4b9dbea5f7ceb0446d517305bc281b74e7f22ffc.png|663d732d14fc4fa8ad051c6926523efb.png|39263412375da54265b588e204fe5f6d
|
||||
@@ -1 +0,0 @@
|
||||
image-dfe7c6fbab7a2fede4b64ec3cad6449970a13f05.png|399000b0b5ee4f5e8961e1d76b6c23c8.png|1207ac1bd91b33c00fd086d10fdf3b86
|
||||
7
404.html
Normal file
@@ -0,0 +1,7 @@
|
||||
<!doctype html><html lang=en><head><title>Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="404 Page not found"><meta name=twitter:description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:url" content="https://ericxliu.me/404.html"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="404 Page not found"><meta property="og:description" content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:locale" content="en"><meta property="og:type" content="website"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/404.html><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container centered"><div class=error><h1>404</h1><h2>Page Not Found</h2><p>Sorry, this page does not exist.<br>You can head back to the <a href=https://ericxliu.me/>homepage</a>.</p></div></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
32
AGENTS.md
@@ -1,32 +0,0 @@
|
||||
# Hugo Partials Structure
|
||||
|
||||
This site uses the **hugo-coder** theme with the **hugo-cloak-email** plugin.
|
||||
|
||||
## Partial Directories
|
||||
|
||||
### `layouts/partials/`
|
||||
Standard Hugo partial override directory. Files here override theme partials when called via `partial "name.html"`.
|
||||
|
||||
**Used for:**
|
||||
- `cloakemail.html` — Overrides hugo-cloak-email's default partial
|
||||
- `head.html` — Adds font preloading and AdSense
|
||||
- `footer.html` — Custom footer with `[commit]` placeholder
|
||||
- `home/social.html` — Integrates cloakemail for email obfuscation
|
||||
- `home/avatar.html` — Custom avatar rendering
|
||||
|
||||
### `layouts/_partials/`
|
||||
Non-standard directory used by **hugo-coder** theme. Files here override theme partials when called via `partial "_partials/name.html"`.
|
||||
|
||||
**Used for:**
|
||||
- `csp.html` — Custom Content Security Policy with additional script sources
|
||||
|
||||
## How Hugo Resolves Partials
|
||||
|
||||
Hugo's lookup order tries user overrides first, then falls back to theme files:
|
||||
|
||||
1. `layouts/partials/` or `layouts/_partials/` (your overrides)
|
||||
2. `themes/<theme>/layouts/partials/` or `themes/<theme>/layouts/_partials/`
|
||||
|
||||
## Verification
|
||||
|
||||
Run `hugo --templateMetrics` to see which templates are actually being used and their execution counts.
|
||||
37
README.md
@@ -1,37 +0,0 @@
|
||||
# 🌟 ericxliu.me
|
||||
|
||||
Welcome to the repository for my personal website! 🚀
|
||||
|
||||
## 🛠️ Built With
|
||||
|
||||
This website is built using:
|
||||
- [Hugo](https://gohugo.io/) - A fast and modern static site generator
|
||||
- [Hugo Coder](https://github.com/luizdepra/hugo-coder/) - A minimalist and elegant Hugo theme
|
||||
|
||||
## 🌐 Website
|
||||
|
||||
Visit my website at [ericxliu.me](https://ericxliu.me)
|
||||
|
||||
## 🚀 Features
|
||||
|
||||
- 📱 Responsive design
|
||||
- 🎨 Clean and minimalist layout
|
||||
- 📝 Blog section for articles and thoughts
|
||||
- 👨💻 Portfolio showcase
|
||||
- 📬 Contact information
|
||||
|
||||
## 🛠️ Local Development
|
||||
|
||||
To run this website locally:
|
||||
|
||||
1. Clone this repository
|
||||
2. Install Hugo (extended version)
|
||||
3. Navigate to the project directory
|
||||
4. Run `hugo server -D`
|
||||
5. Open your browser and visit `http://localhost:1313`
|
||||
|
||||
## 📄 License
|
||||
|
||||
This project is open source and available under the [MIT License](LICENSE).
|
||||
|
||||
Thank you for visiting my website repository! 😊
|
||||
16
about/index.html
Normal file
@@ -0,0 +1,16 @@
|
||||
<!doctype html><html lang=en><head><title>About · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="
|
||||
Hi, I’m Eric Liu.
|
||||
I am a Staff Software Engineer and Tech Lead Manager (TLM) at Google, based in Sunnyvale, CA.
|
||||
My work focuses on Infrastructure Performance and Customer Engineering, specifically for GPUs and TPUs. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it’s debugging race conditions across thousands of chips or designing API surfaces for next-gen models."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="About"><meta name=twitter:description content="Hi, I’m Eric Liu.
|
||||
I am a Staff Software Engineer and Tech Lead Manager (TLM) at Google, based in Sunnyvale, CA.
|
||||
My work focuses on Infrastructure Performance and Customer Engineering, specifically for GPUs and TPUs. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it’s debugging race conditions across thousands of chips or designing API surfaces for next-gen models."><meta property="og:url" content="https://ericxliu.me/about/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="About"><meta property="og:description" content="Hi, I’m Eric Liu.
|
||||
I am a Staff Software Engineer and Tech Lead Manager (TLM) at Google, based in Sunnyvale, CA.
|
||||
My work focuses on Infrastructure Performance and Customer Engineering, specifically for GPUs and TPUs. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it’s debugging race conditions across thousands of chips or designing API surfaces for next-gen models."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:published_time" content="2025-12-19T22:46:12-08:00"><meta property="article:modified_time" content="2025-12-20T09:52:07-08:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/about/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"About","genre":"Blog","wordcount":"201","url":"https:\/\/ericxliu.me\/about\/","datePublished":"2025-12-19T22:46:12-08:00","dateModified":"2025-12-20T09:52:07-08:00","description":"\u003cimg src=\u0022\/images\/about.jpeg\u0022 alt=\u0022Eric Liu\u0022 width=\u0022300\u0022 style=\u0022float: left; margin-right: 1.5rem; margin-bottom: 1rem; border-radius: 8px;\u0022\/\u003e\n\u003cp\u003eHi, I\u0026rsquo;m \u003cstrong\u003eEric Liu\u003c\/strong\u003e.\u003c\/p\u003e\n\u003cp\u003eI am a \u003cstrong\u003eStaff Software Engineer and Tech Lead Manager (TLM)\u003c\/strong\u003e at \u003cstrong\u003eGoogle\u003c\/strong\u003e, based in Sunnyvale, CA.\u003c\/p\u003e\n\u003cp\u003eMy work focuses on \u003cstrong\u003eInfrastructure Performance and Customer Engineering\u003c\/strong\u003e, specifically for \u003cstrong\u003eGPUs and TPUs\u003c\/strong\u003e. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it\u0026rsquo;s debugging race conditions across thousands of chips or designing API surfaces for next-gen models.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container page"><article><header><h1 class=title><a class=title-link href=https://ericxliu.me/about/>About</a></h1></header><img src=/images/about.jpeg alt="Eric Liu" width=300 style=float:left;margin-right:1.5rem;margin-bottom:1rem;border-radius:8px><p>Hi, I’m <strong>Eric Liu</strong>.</p><p>I am a <strong>Staff Software Engineer and Tech Lead Manager (TLM)</strong> at <strong>Google</strong>, based in Sunnyvale, CA.</p><p>My work focuses on <strong>Infrastructure Performance and Customer Engineering</strong>, specifically for <strong>GPUs and TPUs</strong>. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it’s debugging race conditions across thousands of chips or designing API surfaces for next-gen models.</p><p>Beyond the code, I maintain this “digital garden” where I document my projects and learnings. It serves as my second brain, capturing everything from technical deep dives to random musings. I believe in <strong>“learning in public”</strong>—so you’ll find unpolished notes on troubleshooting Kubernetes clusters alongside recipes I’m refining. It’s not just a blog; it’s a living repository of my curiosity.</p><h3 id=personal-interests>Personal Interests
|
||||
<a class=heading-link href=#personal-interests><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>I’m a tinkerer at heart, whether digital or physical:</p><ul><li><strong>Homelab</strong>: Kubernetes, Proxmox, and self-hosted services. I love over-engineering my home network.</li><li><strong>DIY & Jeep</strong>: Maintaining and modifying my Jeep, and general DIY projects.</li><li><strong>Cooking</strong>: experimenting with new recipes and techniques.</li></ul><p>Welcome to my corner of the internet.</p></article></section><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
@@ -1,6 +0,0 @@
|
||||
---
|
||||
title: "{{ replace .Name "-" " " | title }}"
|
||||
date: {{ .Date }}
|
||||
draft: true
|
||||
---
|
||||
|
||||
7
authors/index.html
Normal file
@@ -0,0 +1,7 @@
|
||||
<!doctype html><html lang=en><head><title>Authors · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Authors"><meta name=twitter:description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:url" content="https://ericxliu.me/authors/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Authors"><meta property="og:description" content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:locale" content="en"><meta property="og:type" content="website"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/authors/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><link rel=alternate type=application/rss+xml href=/authors/index.xml title="Eric X. Liu's Personal Page"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container list"><header><h1 class=title><a class=title-link href=https://ericxliu.me/authors/>Authors</a></h1></header><ul></ul></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
1
authors/index.xml
Normal file
@@ -0,0 +1 @@
|
||||
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Authors on Eric X. Liu's Personal Page</title><link>https://ericxliu.me/authors/</link><description>Recent content in Authors on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><atom:link href="https://ericxliu.me/authors/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
|
||||
1
authors/page/1/index.html
Normal file
@@ -0,0 +1 @@
|
||||
<!doctype html><html lang=en><head><title>https://ericxliu.me/authors/</title><link rel=canonical href=https://ericxliu.me/authors/><meta charset=utf-8><meta http-equiv=refresh content="0; url=https://ericxliu.me/authors/"></head></html>
|
||||
7
categories/index.html
Normal file
@@ -0,0 +1,7 @@
|
||||
<!doctype html><html lang=en><head><title>Categories · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Categories"><meta name=twitter:description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:url" content="https://ericxliu.me/categories/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Categories"><meta property="og:description" content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:locale" content="en"><meta property="og:type" content="website"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/categories/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><link rel=alternate type=application/rss+xml href=/categories/index.xml title="Eric X. Liu's Personal Page"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container list"><header><h1 class=title><a class=title-link href=https://ericxliu.me/categories/>Categories</a></h1></header><ul></ul></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
1
categories/index.xml
Normal file
@@ -0,0 +1 @@
|
||||
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Categories on Eric X. Liu's Personal Page</title><link>https://ericxliu.me/categories/</link><description>Recent content in Categories on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><atom:link href="https://ericxliu.me/categories/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
|
||||
1
categories/page/1/index.html
Normal file
@@ -0,0 +1 @@
|
||||
<!doctype html><html lang=en><head><title>https://ericxliu.me/categories/</title><link rel=canonical href=https://ericxliu.me/categories/><meta charset=utf-8><meta http-equiv=refresh content="0; url=https://ericxliu.me/categories/"></head></html>
|
||||
241
config.toml
@@ -1,241 +0,0 @@
|
||||
title = "Eric X. Liu's Personal Page"
|
||||
baseURL = "https://ericxliu.me/"
|
||||
theme = ["hugo-cloak-email", "hugo-coder"]
|
||||
languageCode = "en"
|
||||
defaultcontentlanguage = "en"
|
||||
|
||||
pygmentsstyle = "bw"
|
||||
pygmentscodefences = true
|
||||
pygmentscodefencesguesssyntax = true
|
||||
enableEmoji = true
|
||||
enableTwemoji = true
|
||||
enableGitInfo = true
|
||||
enableRobotsTXT = true
|
||||
|
||||
[taxonomies]
|
||||
category = "categories"
|
||||
series = "series"
|
||||
tag = "tags"
|
||||
author = "authors"
|
||||
|
||||
# Disqus comments configuration
|
||||
[services]
|
||||
[services.disqus]
|
||||
shortname = "ericxliu-me"
|
||||
|
||||
# Goldmark configuration for Obsidian compatibility
|
||||
[markup]
|
||||
defaultMarkdownHandler = "goldmark"
|
||||
|
||||
[markup.goldmark]
|
||||
[markup.goldmark.extensions]
|
||||
# Enable definition lists (useful for Obsidian-style definitions)
|
||||
definitionList = true
|
||||
# Enable footnotes (common in Obsidian)
|
||||
footnote = true
|
||||
# Enable linkification
|
||||
linkify = true
|
||||
# Enable strikethrough
|
||||
strikethrough = true
|
||||
# Enable tables
|
||||
table = true
|
||||
# Enable task lists (checkboxes)
|
||||
taskList = true
|
||||
# Enable typographer for better typography
|
||||
[markup.goldmark.extensions.typographer]
|
||||
disable = false
|
||||
# Enable math via passthrough for LaTeX
|
||||
[markup.goldmark.extensions.passthrough]
|
||||
enable = true
|
||||
[markup.goldmark.extensions.passthrough.delimiters]
|
||||
# Block math delimiters
|
||||
block = [["$$", "$$"], ["\\[", "\\]"]]
|
||||
# Inline math delimiters
|
||||
inline = [["$", "$"], ["\\(", "\\)"]]
|
||||
# Enable extra extensions for better compatibility
|
||||
[markup.goldmark.extensions.extras]
|
||||
[markup.goldmark.extensions.extras.subscript]
|
||||
enable = true
|
||||
[markup.goldmark.extensions.extras.superscript]
|
||||
enable = true
|
||||
[markup.goldmark.extensions.extras.mark]
|
||||
enable = true
|
||||
[markup.goldmark.extensions.extras.insert]
|
||||
enable = true
|
||||
[markup.goldmark.extensions.extras.delete]
|
||||
enable = true
|
||||
|
||||
[markup.goldmark.parser]
|
||||
# Enable attributes for better styling
|
||||
[markup.goldmark.parser.attribute]
|
||||
block = true
|
||||
title = true
|
||||
# Auto-generate heading IDs
|
||||
autoHeadingID = true
|
||||
autoHeadingIDType = "github"
|
||||
# Don't wrap standalone images in paragraphs (better for Obsidian compatibility)
|
||||
wrapStandAloneImageWithinParagraph = false
|
||||
|
||||
[markup.goldmark.renderer]
|
||||
# Allow unsafe HTML (needed for some Obsidian features)
|
||||
unsafe = true
|
||||
|
||||
[markup.highlight]
|
||||
style = "github-dark"
|
||||
|
||||
# Table of contents configuration (compatible with Obsidian heading structure)
|
||||
[markup.tableOfContents]
|
||||
startLevel = 1
|
||||
endLevel = 6
|
||||
ordered = false
|
||||
|
||||
[params] # theme parameters
|
||||
author = "Eric X. Liu"
|
||||
info = ["Software & Performance Engineer @Google", "DIY Overlander & Rock Crawler", "Tech Enthusiast"]
|
||||
description = "Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."
|
||||
keywords = "software engineer, performance engineering, Google engineer, tech blog, software development, performance optimization, Eric Liu, engineering blog, mountain biking, Jeep enthusiast, overlanding, camping, outdoor adventures"
|
||||
avatarurl = "images/gravatar.png"
|
||||
hideFooter = false
|
||||
hideCredits = true
|
||||
hideCopyright = false
|
||||
since = 2016
|
||||
rtl= false
|
||||
commit="https://git.ericxliu.me/eric/ericxliu-me/commit"
|
||||
colorscheme = "auto"
|
||||
hideColorSchemeToggle = false
|
||||
|
||||
# Series see also post count
|
||||
maxSeeAlsoItems = 5
|
||||
|
||||
# Custom CSS
|
||||
custom_css = []
|
||||
|
||||
# Custom JS
|
||||
custom_js = []
|
||||
|
||||
# Enable math rendering (for LaTeX support including $$$$ blocks)
|
||||
math = true
|
||||
|
||||
# AdSense Configuration
|
||||
[params.adsense]
|
||||
client = "ca-pub-3972604619956476"
|
||||
|
||||
# Add new SEO-related parameters
|
||||
[params.seo]
|
||||
# Enable OpenGraph for better social media sharing
|
||||
opengraph = true
|
||||
# Enable Twitter Cards
|
||||
twitter_cards = true
|
||||
# Your Twitter handle (optional)
|
||||
# twitter_handle = "@yourtwitterhandle"
|
||||
# Default image for social sharing
|
||||
default_image = "images/gravatar.png"
|
||||
# Site name for social sharing
|
||||
site_name = "Eric X. Liu's Personal Page"
|
||||
|
||||
# Add structured data for Google Search
|
||||
[params.schema]
|
||||
type = "Person"
|
||||
name = "Eric X. Liu"
|
||||
description = "Software & Performance Engineer at Google"
|
||||
sameAs = [
|
||||
"https://www.linkedin.com/in/eric-x-liu-46648b93/",
|
||||
"https://git.ericxliu.me/eric"
|
||||
]
|
||||
|
||||
# Add sitemap configuration
|
||||
[sitemap]
|
||||
changefreq = "weekly"
|
||||
filename = "sitemap.xml"
|
||||
priority = 0.5
|
||||
|
||||
# If you want to implement a Content-Security-Policy, add this section
|
||||
[params.csp]
|
||||
childsrc = ["'self'"]
|
||||
fontsrc = ["'self'", "https://fonts.gstatic.com", "https://cdn.jsdelivr.net/"]
|
||||
formaction = ["'self'"]
|
||||
framesrc = ["'self'", "https://www.youtube.com", "https://disqus.com"]
|
||||
imgsrc = ["'self'", "https://referrer.disqus.com", "https://c.disquscdn.com", "https://*.disqus.com"]
|
||||
objectsrc = ["'none'"]
|
||||
stylesrc = [
|
||||
"'self'",
|
||||
"'unsafe-inline'",
|
||||
"https://fonts.googleapis.com/",
|
||||
"https://cdn.jsdelivr.net/",
|
||||
]
|
||||
scriptsrc = [
|
||||
"'self'",
|
||||
"'unsafe-inline'",
|
||||
"https://www.google-analytics.com",
|
||||
"https://cdn.jsdelivr.net/",
|
||||
"https://pagead2.googlesyndication.com",
|
||||
"https://static.cloudflareinsights.com",
|
||||
"https://unpkg.com",
|
||||
"https://ericxliu-me.disqus.com",
|
||||
"https://disqus.com",
|
||||
"https://*.disqus.com",
|
||||
"https://*.disquscdn.com",
|
||||
]
|
||||
prefetchsrc = ["'self'"]
|
||||
# connect-src directive – defines valid targets for to XMLHttpRequest (AJAX), WebSockets or EventSource
|
||||
connectsrc = ["'self'", "https://www.google-analytics.com", "https://pagead2.googlesyndication.com", "https://cloudflareinsights.com", "ws://localhost:1313", "ws://localhost:*", "wss://localhost:*", "https://links.services.disqus.com", "https://*.disqus.com"]
|
||||
|
||||
# Social links
|
||||
[[params.social]]
|
||||
name = "Git"
|
||||
icon = "fa-brands fa-git fa-2x"
|
||||
weight = 1
|
||||
url = "https://git.ericxliu.me/eric"
|
||||
[[params.social]]
|
||||
name = "linkedin"
|
||||
icon = "fa-brands fa-linkedin fa-2x"
|
||||
weight = 2
|
||||
url = "https://www.linkedin.com/in/eric-x-liu-46648b93/"
|
||||
[[params.social]]
|
||||
name = "Personal email"
|
||||
icon = "fa fa-envelope fa-2x"
|
||||
weight = 3
|
||||
email = "eric@ericxliu.me"
|
||||
[[params.social]]
|
||||
name = "RSS"
|
||||
icon = "fa-solid fa-rss fa-2x"
|
||||
weight = 6
|
||||
url = "https://ericxliu.me/index.xml"
|
||||
rel = "alternate"
|
||||
type = "application/rss+xml"
|
||||
|
||||
# Menu links
|
||||
[languages]
|
||||
[languages.en]
|
||||
languagename = "English"
|
||||
[[languages.en.menu.main]]
|
||||
name = "Posts"
|
||||
weight = 1
|
||||
url = "/posts/"
|
||||
[[languages.en.menu.main]]
|
||||
name = "Chat"
|
||||
weight = 2
|
||||
url = "https://chat.ericxliu.me"
|
||||
[[languages.en.menu.main]]
|
||||
name = "Git"
|
||||
weight = 3
|
||||
url = "https://git.ericxliu.me/user/oauth2/Authenitk"
|
||||
[[languages.en.menu.main]]
|
||||
name = "Coder"
|
||||
weight = 4
|
||||
url = "https://coder.ericxliu.me/api/v2/users/oidc/callback"
|
||||
[[languages.en.menu.main]]
|
||||
name = "About"
|
||||
weight = 5
|
||||
url = "/about/"
|
||||
[[languages.en.menu.main]]
|
||||
name = "|"
|
||||
weight = 10
|
||||
[[languages.en.menu.main]]
|
||||
name = "Sign in"
|
||||
weight = 11
|
||||
url = "https://sso.ericxliu.me"
|
||||
|
||||
# Cloudflare Web Analytics configuration
|
||||
[params.cloudflare]
|
||||
token = "987638e636ce4dbb932d038af74c17d1"
|
||||
@@ -1,25 +0,0 @@
|
||||
---
|
||||
title: "About"
|
||||
date: 2025-12-19T22:46:12-08:00
|
||||
draft: false
|
||||
---
|
||||
|
||||
<img src="/images/about.jpeg" alt="Eric Liu" width="300" style="float: left; margin-right: 1.5rem; margin-bottom: 1rem; border-radius: 8px;"/>
|
||||
|
||||
Hi, I'm **Eric Liu**.
|
||||
|
||||
I am a **Staff Software Engineer and Tech Lead Manager (TLM)** at **Google**, based in Sunnyvale, CA.
|
||||
|
||||
My work focuses on **Infrastructure Performance and Customer Engineering**, specifically for **GPUs and TPUs**. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it's debugging race conditions across thousands of chips or designing API surfaces for next-gen models.
|
||||
|
||||
Beyond the code, I maintain this "digital garden" where I document my projects and learnings. It serves as my second brain, capturing everything from technical deep dives to random musings. I believe in **"learning in public"**—so you'll find unpolished notes on troubleshooting Kubernetes clusters alongside recipes I'm refining. It's not just a blog; it's a living repository of my curiosity.
|
||||
|
||||
### Personal Interests
|
||||
|
||||
I'm a tinkerer at heart, whether digital or physical:
|
||||
|
||||
* **Homelab**: Kubernetes, Proxmox, and self-hosted services. I love over-engineering my home network.
|
||||
* **DIY & Jeep**: Maintaining and modifying my Jeep, and general DIY projects.
|
||||
* **Cooking**: experimenting with new recipes and techniques.
|
||||
|
||||
Welcome to my corner of the internet.
|
||||
@@ -1,208 +0,0 @@
|
||||
---
|
||||
title: "Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"
|
||||
date: 2025-10-04
|
||||
draft: false
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
NVIDIA's Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there's a catch—one that reveals a fundamental tension in modern edge AI hardware design.
|
||||
|
||||
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device's computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn't computation—it's memory bandwidth. This isn't just a quirk of one device; it's a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.
|
||||
|
||||
## The Hardware: What We're Working With
|
||||
|
||||
The NVIDIA Jetson Orin Nano 8GB I tested features:
|
||||
|
||||
- **GPU**: NVIDIA Ampere architecture with 1024 CUDA cores and 32 Tensor Cores
|
||||
- **Compute Performance**: 40 TOPS (INT8), 10 TFLOPS (FP16), 5 TFLOPS (FP32)
|
||||
- **Memory**: 8GB LPDDR5 unified memory with 68 GB/s bandwidth
|
||||
- **Available VRAM**: Approximately 5.2GB after OS overhead
|
||||
- **CPU**: 6-core ARM Cortex-A78AE (ARMv8.2, 64-bit)
|
||||
- **TDP**: 7-25W configurable
|
||||
|
||||
The unified memory architecture is a double-edged sword: CPU and GPU share the same physical memory pool, which eliminates PCIe transfer overhead but also means you're working with just 5.2GB of usable VRAM after the OS takes its share. This constraint shapes everything about LLM deployment on this device.
|
||||
|
||||
## Testing Methodology
|
||||
|
||||
### The Models
|
||||
|
||||
I tested seven models ranging from 0.5B to 5.4B parameters—essentially the entire practical deployment range for this hardware. The selection covered two inference backends (Ollama and vLLM) and various quantization strategies:
|
||||
|
||||
**Ollama-served models (with quantization):**
|
||||
- Gemma 3 1B (Q4_K_M, 815MB)
|
||||
- Gemma 3n E2B (Q4_K_M, 3.5GB, 5.44B total params, 2B effective)
|
||||
- Qwen 2.5 0.5B (Q4_K_M, 350MB)
|
||||
- Qwen 3 0.6B (FP8, 600MB)
|
||||
|
||||
**vLLM-served models (minimal/no quantization):**
|
||||
- google/gemma-3-1b-it (FP16, 2GB)
|
||||
- Qwen/Qwen2.5-0.5B-Instruct (FP16, 1GB)
|
||||
- Qwen/Qwen3-0.6B-FP8 (FP8, 600MB)
|
||||
|
||||
### The Testing Process
|
||||
|
||||
Each model faced 10-12 prompts of varying complexity—from simple arithmetic to technical explanations about LLMs themselves. All tests ran with batch size = 1, simulating a single user interacting with a local chatbot—the typical edge deployment scenario. Out of 84 planned tests, 66 completed successfully (78.6% success rate). The failures? Mostly out-of-memory crashes on larger models and occasional inference engine instability.
|
||||
|
||||
### Understanding the Limits: Roofline Analysis
|
||||
|
||||
To understand where performance hits its ceiling, I applied roofline analysis—a method that reveals whether a workload is compute-bound (limited by processing power) or memory-bound (limited by data transfer speed). For each model, I calculated:
|
||||
|
||||
- **FLOPs per token**: Approximately 2 × total_parameters (accounting for matrix multiplications in forward pass)
|
||||
- **Bytes per token**: model_size × 1.1 (including 10% overhead for activations and KV cache)
|
||||
- **Operational Intensity (OI)**: FLOPs per token / Bytes per token
|
||||
- **Theoretical performance**: min(compute_limit, bandwidth_limit)
|
||||
|
||||
The roofline model works by comparing a workload's operational intensity (how many calculations you do per byte of data moved) against the device's balance point. If your operational intensity is too low, you're bottlenecked by memory bandwidth—and as we'll see, that's exactly what happens with LLM inference.
|
||||
|
||||

|
||||
|
||||
|
||||
## The Results: Speed and Efficiency
|
||||
|
||||
### What Actually Runs Fast
|
||||
|
||||
Here's how the models ranked by token generation speed:
|
||||
|
||||
| Rank | Model | Backend | Avg Speed (t/s) | Std Dev | Success Rate |
|
||||
|------|-------|---------|-----------------|---------|--------------|
|
||||
| 1 | qwen3:0.6b | Ollama | 38.84 | 1.42 | 100% |
|
||||
| 2 | qwen2.5:0.5b | Ollama | 35.24 | 2.72 | 100% |
|
||||
| 3 | gemma3:1b | Ollama | 26.33 | 2.56 | 100% |
|
||||
| 4 | Qwen/Qwen2.5-0.5B-Instruct | vLLM | 15.18 | 2.15 | 100% |
|
||||
| 5 | Qwen/Qwen3-0.6B-FP8 | vLLM | 12.81 | 0.36 | 100% |
|
||||
| 6 | gemma3n:e2b | Ollama | 8.98 | 1.22 | 100% |
|
||||
| 7 | google/gemma-3-1b-it | vLLM | 4.59 | 1.52 | 100% |
|
||||
|
||||
The standout finding: quantized sub-1B models hit 25-40 tokens/second, with Ollama consistently outperforming vLLM by 2-6× thanks to aggressive quantization and edge-optimized execution. These numbers align well with independent benchmarks from NVIDIA's Jetson AI Lab (Llama 3.2 3B at 27.7 t/s, SmolLM2 at 41 t/s), confirming this is typical performance for the hardware class.
|
||||

|
||||
|
||||
### Responsiveness: First Token Latency
|
||||
|
||||
The time to generate the first output token—a critical metric for interactive applications—varied significantly:
|
||||
|
||||
- qwen3:0.6b (Ollama): 0.522 seconds
|
||||
- gemma3:1b (Ollama): 1.000 seconds
|
||||
- qwen2.5:0.5b (Ollama): 1.415 seconds
|
||||
- gemma3n:e2b (Ollama): 1.998 seconds
|
||||
|
||||
Smaller, quantized models get to that first token faster—exactly what you want for a chatbot or interactive assistant where perceived responsiveness matters as much as raw throughput.
|
||||
|
||||
### The Memory Bottleneck Revealed
|
||||
|
||||
When I compared actual performance against theoretical limits, the results were striking:
|
||||
|
||||
| Model | Theoretical (t/s) | Actual (t/s) | Efficiency | Bottleneck | OI (FLOPs/byte) |
|
||||
|-------|-------------------|--------------|------------|------------|-----------------|
|
||||
| gemma3:1b | 109.90 | 26.33 | 24.0% | Memory | 3.23 |
|
||||
| qwen3:0.6b | 103.03 | 38.84 | 37.7% | Memory | 1.82 |
|
||||
| qwen2.5:0.5b | 219.80 | 35.24 | 16.0% | Memory | 3.23 |
|
||||
| gemma3n:e2b | 54.95 | 8.98 | 16.3% | Memory | 3.23 |
|
||||
| google/gemma-3-1b-it | 30.91 | 4.59 | 14.9% | Memory | 0.91 |
|
||||
| Qwen/Qwen3-0.6B-FP8 | 103.03 | 12.81 | 12.4% | Memory | 1.82 |
|
||||
| Qwen/Qwen2.5-0.5B-Instruct | 61.82 | 15.18 | 24.6% | Memory | 0.91 |
|
||||
|
||||
Every single model is memory-bound in this single-stream inference scenario. Average hardware efficiency sits at just 20.8%—meaning the computational units spend most of their time waiting for data rather than crunching numbers. That advertised 40 TOPS? Largely untapped when generating one token at a time for a single user.
|
||||

|
||||
|
||||
|
||||
## What This Actually Means
|
||||
|
||||
### Why Memory Bandwidth Dominates (in Single-Stream Inference)
|
||||
|
||||
The roofline numbers tell a clear story: operational intensity ranges from 0.91 to 3.23 FLOPs/byte across all tested models during single-token generation (batch size = 1). To actually saturate those 1024 CUDA cores and hit compute-bound operation, you'd need an operational intensity around 147 FLOPs/byte at the device's 68 GB/s memory bandwidth.
|
||||
|
||||
In practice, for a model to actually become compute-bound on this device during single-stream inference, it would need an operational intensity exceeding:
|
||||
|
||||
```
|
||||
OI_threshold = Peak_Compute / Memory_Bandwidth
|
||||
= (40 × 10^12 ops/s) / (68 × 10^9 bytes/s)
|
||||
= 588 FLOPs/byte
|
||||
```
|
||||
|
||||
Single-stream autoregressive decoding falls 100-600× short of this threshold because each token generation requires loading the entire model from memory (matrix-vector multiplication) while performing only ~2 FLOPs per parameter. The compute units are idle most of the time, simply waiting for model weights and activations to arrive from memory.
|
||||
|
||||
Note: Production LLM serving with large batch sizes (32-256 requests) changes this dynamic dramatically—batching transforms matrix-vector operations into matrix-matrix multiplications, increasing operational intensity by 30-250× and making workloads compute-bound. However, edge devices serving single users cannot exploit this optimization.
|
||||
|
||||
The largest model tested—gemma3n:e2b at 3.5GB quantized (5.44B total parameters, 2B effective)—shows only 16.3% efficiency, similar to other quantized models. Despite being the largest model, Q4_K_M quantization keeps its memory footprint manageable, resulting in similar operational intensity (3.23 FLOPs/byte) to the other INT4-quantized models. Its MatFormer architecture with selective parameter activation (only 2B of 5.44B params active per token) actually helps reduce memory traffic, though this benefit is partially offset by the overhead of routing logic.
|
||||
|
||||
### What This Means for Edge Deployment
|
||||
|
||||
The performance gap between Ollama and vLLM (2.3-5.7×) tells us something important about optimization priorities for single-user edge devices:
|
||||
|
||||
**Qwen 2.5 0.5B:** Ollama (Q4_K_M, 350MB) at 35.24 t/s vs vLLM (FP16, 1GB) at 15.18 t/s—2.32× faster
|
||||
**Qwen 3 0.6B:** Ollama (FP8) at 38.84 t/s vs vLLM (FP8) at 12.81 t/s—3.03× faster despite identical quantization
|
||||
**Gemma 3 1B:** Ollama (Q4_K_M, 815MB) at 26.33 t/s vs vLLM (FP16, 2GB) at 4.59 t/s—5.74× faster
|
||||
|
||||
In single-stream scenarios, quantization delivers near-linear performance gains by directly attacking the memory bandwidth bottleneck. Q4_K_M quantization (4.5 bits/parameter) hits a sweet spot between model quality and speed. Going lower to INT2 might help further, but you'll need to carefully evaluate output quality.
|
||||
|
||||
The real insight: Ollama's edge-first design philosophy (GGUF format, streamlined execution, optimized kernels from llama.cpp) is fundamentally better aligned with single-stream, memory-constrained edge scenarios. vLLM's datacenter features—continuous batching, PagedAttention, tensor parallelism—add overhead without providing benefits when serving individual users on unified memory architectures. These features shine in multi-user production serving where batching can be exploited, but hurt performance in the single-stream case.
|
||||
|
||||
**What you should actually do**: Stick with Ollama or TensorRT-LLM using Q4_K_M/INT4 quantized models in GGUF format. Target the 0.5-1B parameter range (under 3GB) to leave headroom for KV cache. Focus your optimization efforts on memory access patterns and bandwidth reduction. Watch for emerging techniques like INT4 AWQ, sparse attention, and quantized KV caches.
|
||||
|
||||
### Room for Improvement
|
||||
|
||||
The 20.8% average efficiency might sound terrible, but it's actually typical for edge AI devices running single-stream inference. Datacenter GPUs hit 60-80% efficiency on optimized workloads—but that's typically with large batch sizes that increase operational intensity. In comparable single-stream scenarios, even high-end GPUs see similar efficiency drops. Edge devices commonly land in the 15-40% range due to architectural tradeoffs and memory bandwidth constraints relative to their compute capability.
|
||||
|
||||
Three factors explain the gap:
|
||||
|
||||
1. **Architecture**: Unified memory sacrifices bandwidth for integration simplicity. The 4MB L2 cache and 7-15W TDP limit further constrain performance.
|
||||
2. **Software maturity**: Edge inference frameworks lag behind their datacenter counterparts in optimization.
|
||||
3. **Runtime overhead**: Quantization/dequantization operations, Python abstractions, and non-optimized kernels all add up.
|
||||
|
||||
The consistent 16-24% efficiency across most models suggests there's room for 2-3× speedups through better software optimization—particularly in memory access patterns and kernel implementations. But fundamental performance leaps will require hardware changes—specifically, prioritizing memory bandwidth (200+ GB/s) over raw compute capability in future edge AI chips.
|
||||
|
||||
## Where to Go From Here
|
||||
|
||||
### Software Optimizations Worth Pursuing
|
||||
|
||||
- Optimize memory access patterns in attention and MLP kernels
|
||||
- Implement quantized KV cache (8-bit or lower)
|
||||
- Tune for small batch sizes (2-4) to improve memory bus utilization
|
||||
- Overlap CPU-GPU pipeline operations to hide latency
|
||||
|
||||
### Research Directions
|
||||
|
||||
- Architectures with higher operational intensity (fewer memory accesses per compute operation)
|
||||
- Sparse attention patterns to reduce memory movement
|
||||
- On-device LoRA fine-tuning with frozen, quantized base weights
|
||||
- Multi-model serving with shared base model weights
|
||||
|
||||
### What Edge AI Hardware Designers Should Focus On
|
||||
|
||||
Future edge AI devices optimized for local, single-user LLM inference need a fundamental shift in priorities: memory bandwidth over raw compute capability. Specifically:
|
||||
|
||||
- 200+ GB/s memory bandwidth (3× current Jetson Orin Nano)
|
||||
- HBM integration for higher bandwidth density
|
||||
- 16GB+ capacity to support 7B+ parameter models
|
||||
- Purpose-built INT4/INT8 accelerators with larger on-chip caches to reduce DRAM traffic
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Williams, S., Waterman, A., & Patterson, D. (2009). "Roofline: An Insightful Visual Performance Model for Multicore Architectures." *Communications of the ACM*, 52(4), 65-76.
|
||||
|
||||
2. NVIDIA Corporation. (2024). "Jetson Orin Nano Developer Kit Technical Specifications." [https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit](https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit)
|
||||
|
||||
3. "Jetson AI Lab Benchmarks." NVIDIA Jetson AI Lab. [https://www.jetson-ai-lab.com/benchmarks.html](https://www.jetson-ai-lab.com/benchmarks.html)
|
||||
|
||||
4. Gerganov, G., et al. (2023). "GGML - AI at the edge." *GitHub*. [https://github.com/ggerganov/ggml](https://github.com/ggerganov/ggml)
|
||||
|
||||
5. Kwon, W., et al. (2023). "Efficient Memory Management for Large Language Model Serving with PagedAttention." *Proceedings of SOSP 2023*.
|
||||
|
||||
6. Team, G., Mesnard, T., et al. (2025). "Gemma 3: Technical Report." *arXiv preprint arXiv:2503.19786v1*. [https://arxiv.org/html/2503.19786v1](https://arxiv.org/html/2503.19786v1)
|
||||
|
||||
7. Yang, A., et al. (2025). "Qwen3 Technical Report." *arXiv preprint arXiv:2505.09388*. [https://arxiv.org/pdf/2505.09388](https://arxiv.org/pdf/2505.09388)
|
||||
|
||||
8. DeepSeek-AI. (2025). "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." *arXiv preprint arXiv:2501.12948v1*. [https://arxiv.org/html/2501.12948v1](https://arxiv.org/html/2501.12948v1)
|
||||
|
||||
9. "Running LLMs with TensorRT-LLM on NVIDIA Jetson Orin Nano Super." Collabnix. [https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/](https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/)
|
||||
|
||||
10. Pope, R., et al. (2022). "Efficiently Scaling Transformer Inference." *Proceedings of MLSys 2022*.
|
||||
|
||||
11. Frantar, E., et al. (2023). "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers." *Proceedings of ICLR 2023*.
|
||||
|
||||
12. Dettmers, T., et al. (2023). "QLoRA: Efficient Finetuning of Quantized LLMs." *Proceedings of NeurIPS 2023*.
|
||||
|
||||
13. Lin, J., et al. (2023). "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration." *arXiv preprint arXiv:2306.00978*.
|
||||
@@ -1,94 +0,0 @@
|
||||
---
|
||||
title: "Breville Barista Pro Maintenance"
|
||||
date: 2025-08-16
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
|
||||
|
||||
#### **Understanding the Two Primary Maintenance Cycles**
|
||||
|
||||
The Breville Barista Pro has two distinct, automated maintenance procedures: the **Cleaning (Flush) Cycle** and the **Descale Cycle**. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.
|
||||
|
||||
* **Cleaning Cycle (Flush):** This process is designed to remove coffee oils and granulated residue from the group head, shower screen, and portafilter system.
|
||||
* **Descale Cycle:** This process targets the internal components of the machine, such as the thermocoil and water lines, to remove mineral and limescale deposits from water.
|
||||
|
||||
#### **Procedure 1: The Cleaning (Flush) Cycle**
|
||||
|
||||
The machine will indicate when a cleaning cycle is needed by displaying a "FLUSH" alert on the LCD screen. This typically occurs after approximately 200 extractions.
|
||||
|
||||
**Required Materials:**
|
||||
* 1-Cup filter basket
|
||||
* Grey silicone cleaning disc (provided with the machine)
|
||||
* One cleaning tablet
|
||||
|
||||
**Step-by-Step Instructions:**
|
||||
1. Insert the 1-cup filter basket into the portafilter.
|
||||
2. Place the grey silicone cleaning disc inside the basket.
|
||||
3. Position one cleaning tablet in the center of the disc.
|
||||
4. Lock the portafilter firmly into the group head.
|
||||
5. Ensure the drip tray is empty and the water tank is filled.
|
||||
6. Press the 'MENU' button and use the 'Grind Amount' dial to navigate to the 'FLUSH' option. Press the dial to select it.
|
||||
7. The '1 CUP' button will illuminate. Press it to initiate the cycle.
|
||||
8. The cleaning process will last approximately five minutes, with the machine backflushing water under pressure. The remaining time will be displayed on the screen.
|
||||
9. Upon completion, the machine will beep and return to its ready state.
|
||||
10. Remove the portafilter and discard the water and dissolved tablet residue. Thoroughly rinse the portafilter, cleaning disc, and filter basket.
|
||||
11. Re-insert the portafilter (without the disc or tablet) and run a shot of hot water through the group head to rinse any remaining cleaning solution.
|
||||
|
||||
#### **Procedure 2: The Descale Cycle**
|
||||
|
||||
The machine will alert you when descaling is required. The frequency depends on water hardness and usage but is generally recommended every 2-3 months.
|
||||
|
||||
**Required Materials:**
|
||||
* Breville-recommended descaling solution
|
||||
* A large container (minimum 2-liter capacity)
|
||||
|
||||
**Step-by-Step Instructions:**
|
||||
|
||||
**Part A: Preparation**
|
||||
1. Empty the drip tray and re-insert it.
|
||||
2. Remove the water filter from the water tank.
|
||||
3. Pour the descaling solution into the empty water tank and add fresh water up to the indicated "DESCALE" line.
|
||||
4. Place a large container under the group head, hot water outlet, and steam wand.
|
||||
|
||||
**Part B: The Descaling Process**
|
||||
1. Turn the machine on and press the 'MENU' button. Navigate to the 'DESCALE' option and select it by pressing the dial.
|
||||
2. Press the illuminated '1 CUP' button to begin.
|
||||
3. The cycle proceeds in three stages. You must manually advance through them using the steam dial based on the LCD prompts:
|
||||
* **Group Head (d3):** The machine descales the coffee brewing components.
|
||||
* **Hot Water (d2):** After a beep, the LCD shows "d2". Turn the steam dial to the hot water position.
|
||||
* **Steam (d1):** After another beep, the display reads "d1". Turn the dial to the steam position.
|
||||
|
||||
**Part C: The Rinse Cycle**
|
||||
1. Once the descaling solution is expended, the machine will beep and prompt for a rinse cycle ("r").
|
||||
2. Empty the large container and rinse the water tank thoroughly.
|
||||
3. Fill the water tank with fresh, cold water to the MAX line and re-insert it.
|
||||
4. Place the empty container back under the outlets and press the '1 CUP' button.
|
||||
5. The rinse cycle will mirror the descaling process, prompting you to engage the group head ("r3"), hot water ("r2"), and steam wand ("r1") in sequence.
|
||||
6. After the rinse is complete, the machine will exit the maintenance mode and return to its ready state.
|
||||
|
||||
#### **Routine and Preventative Maintenance Schedule**
|
||||
|
||||
In addition to the automated cycles, regular manual cleaning is essential for machine health.
|
||||
|
||||
**Daily Tasks:**
|
||||
* **Purge Group Head:** After the final use of the day, run hot water through the group head (without the portafilter) to clear grounds.
|
||||
* **Clean Portafilter & Baskets:** Do not let used coffee grounds sit in the portafilter. Rinse with hot water after every use.
|
||||
* **Clean Steam Wand:** Immediately after texturing milk, wipe the wand with a damp cloth and purge steam for 2-3 seconds to clear internal passages.
|
||||
* **Empty Drip Tray:** Empty and rinse the drip tray regularly.
|
||||
|
||||
**Weekly Tasks:**
|
||||
* **Soak Components:** Remove the filter basket from the portafilter. Soak both components in a solution of hot water and a cleaning tablet (or specific espresso cleaner) for 20-30 minutes to dissolve accumulated coffee oils. Rinse thoroughly.
|
||||
* **Clean Grinder:** Empty the bean hopper. Run the grinder to clear any remaining beans, then use a brush and/or vacuum to clean out fines and oil residue from the burrs and chute.
|
||||
|
||||
**Periodic Tasks (Every 2-3 Months):**
|
||||
* **Replace Water Filter:** The water filter located inside the water tank should be replaced every 3 months. This reduces the rate of scale buildup.
|
||||
* **Inspect Shower Screen:** Use a brush to gently scrub the shower screen inside the group head to remove any stubborn coffee grounds.
|
||||
|
||||
By adhering to this comprehensive maintenance schedule, you can ensure your Breville Barista Pro operates at peak performance and consistently produces high-quality espresso.
|
||||
|
||||
***
|
||||
|
||||
**Reference:**
|
||||
* Breville Barista Pro Instruction Manual and official manufacturer guidelines.
|
||||
@@ -1,117 +0,0 @@
|
||||
---
|
||||
title: "Why Your "Resilient" Homelab is Slower Than a Raspberry Pi"
|
||||
date: 2026-01-02
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running "production" at home, there is only one metric that truly matters: **The Wife Acceptance Factor (WAF)**.
|
||||
|
||||
My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was "slow sometimes." She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage.
|
||||
|
||||
Here is a breakdown of the symptoms, the red herrings, and the root cause that was hiding in plain sight.
|
||||
|
||||
## The Environment
|
||||
|
||||
My homelab is designed for node-level resilience, which adds complexity to the storage layer. It is not running on a single server, but rather a 3-node **Proxmox** cluster where every component is redundant:
|
||||
|
||||
- **Orchestration**: Kubernetes (k3s) managed via Flux CD.
|
||||
- **Storage**: A **Ceph** cluster running on the Proxmox nodes, utilizing enterprise NVMe SSDs (`bluestore`) for OSDs.
|
||||
- **Database**: Postgres managed by the Zalando Postgres Operator, with persistent volumes (PVCs) provisioned on Ceph RBD (block storage).
|
||||
- **Identity**: Authentik for SSO.
|
||||
|
||||
While the underlying disks are blazing fast NVMe drives, the architecture dictates that a write to a Ceph RBD volume is not complete until it is replicated over the network and acknowledged by multiple OSDs. This setup provides incredible resilience—I can pull the plug on a node and nothing stops—but it introduces unavoidable network latency for synchronous write operations. **Keep this particular trade-off in mind; it plays a starring role in the investigation later.**
|
||||
|
||||
## The Symptom
|
||||
|
||||
The issue was insidious because it was intermittent. Clicking "Login" would sometimes hang for 5-8 seconds, while other times it was instant. To an engineer, "sometimes slow" is the worst kind of bug because it defies easy reproduction.
|
||||
|
||||
The breakthrough came when I put aside the server-side Grafana dashboards and looked at the client side. By opening Chrome DevTools and monitoring the **Network** tab during a slow login attempt, I was able to capture the exact failing request.
|
||||
|
||||
I identified the culprit: the `/api/v3/core/applications/` endpoint. It wasn't a connection timeout or a DNS issue; the server was simply taking 5+ seconds to respond to this specific GET request.
|
||||
|
||||
Armed with this "smoking gun," I copied the request as cURL (preserving the session cookies) and converted it into a Python benchmark script (`reproduce_latency.py`). This allowed me to reliably trigger the latency on demand, turning an intermittent "heisenbug" into a reproducible test case.
|
||||
|
||||
The results were validating and horrifying:
|
||||
|
||||
```text
|
||||
Request 1: 2.1642s
|
||||
Request 2: 8.4321s
|
||||
Request 3: 5.1234s
|
||||
...
|
||||
Avg Latency: 4.8s
|
||||
```
|
||||
|
||||
## Investigation & Red Herrings
|
||||
|
||||
### Attempt 1: The Connection Overhead Hypothesis
|
||||
**The Hypothesis**: Authentik defaults to `CONN_MAX_AGE=0`, meaning it closes the database connection after every request. Since I enforce SSL for the database, I assumed the handshake overhead was killing performance.
|
||||
|
||||
**The Fix Attempt**: I updated the Authentik configuration to enable persistent connections:
|
||||
```yaml
|
||||
env:
|
||||
- name: AUTHENTIK_POSTGRESQL__CONN_MAX_AGE
|
||||
value: "600"
|
||||
```
|
||||
|
||||
**The Reality**: The benchmark showed a slight improvement (~4.2s average), but the random 5-8s spikes remained. The 300ms connection setup was a factor, but not the root cause. As a side note, enabling this without configuring TCP Keepalives caused the Authentik worker to crash with `OperationalError('the connection is closed')` when firewalls silently dropped idle connections.
|
||||
|
||||
### Attempt 2: CPU Starvation
|
||||
**The Hypothesis**: The pods were CPU throttled during request processing.
|
||||
|
||||
**The Reality**: `kubectl top pods` showed the server using only 29m (2.9% of a core). Even increasing the Gunicorn worker count from 2 to 4 did not improve the latency of individual requests, though it did help with concurrency.
|
||||
|
||||
## The Root Cause: A Perfect Storm
|
||||
|
||||
I was stuck. The CPU was idle, network was fine, and individual database queries were fast (<1ms). Then I looked at the traffic patterns:
|
||||
1. **Redis**: Almost zero traffic.
|
||||
2. **Postgres**: High `WALSync` and `WALWrite` wait times.
|
||||
3. **The Table**: `django_postgres_cache_cacheentry` was getting hammered.
|
||||
|
||||
### Insight: The Breaking Change
|
||||
I checked the release notes for **Authentik 2025.10**:
|
||||
> *Breaking Change: Redis is no longer used for caching. All caching has been moved to the PostgreSQL database to simplify deployment.*
|
||||
|
||||
This architectural shift created a bottleneck specific to my storage backend:
|
||||
1. **The Change**: Every API request triggers a cache write (session updates) to Postgres instead of Redis.
|
||||
2. **The Default**: Postgres defaults to `synchronous_commit = on`. A transaction is not considered "committed" until it is flushed to disk.
|
||||
3. **The Storage**: Ceph RBD replicates data across the network to multiple OSDs.
|
||||
|
||||
Every time I loaded the dashboard, Authentik tried to update the cache. Postgres paused, verified the write was replicated to 3 other servers over the network (WAL Sync), and *then* responded.
|
||||
|
||||
## The Solution
|
||||
|
||||
I couldn't move the database to local NVMe without losing the failover capabilities I built the cluster for. However, for a cache-heavy workload, I could compromise on strict durability.
|
||||
|
||||
I patched the Postgres configuration to disable synchronous commits:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
postgresql:
|
||||
parameters:
|
||||
synchronous_commit: "off" # The magic switch
|
||||
```
|
||||
|
||||
**What this does**: Postgres returns "Success" to the application as soon as the transaction is in memory. It flushes to disk in the background. In the event of a crash, I might lose the last ~500ms of data (mostly cache entries), which is an acceptable trade-off.
|
||||
|
||||
## Verification
|
||||
|
||||
I re-ran the benchmark with `synchronous_commit = off`.
|
||||
|
||||
| Metric | Before (`sync=on`) | After (`sync=off`) | Improvement |
|
||||
| -------------------------- | ------------------ | ------------------ | -------------- |
|
||||
| Sequential x8 stream (Avg) | ~4.8s | **0.40s** | **12x Faster** |
|
||||
| Parallel x8 stream (Wall) | ~10.5s | **2.45s** | **4x Faster** |
|
||||
|
||||
The latency vanished. The login became instant.
|
||||
|
||||
## Key Insights
|
||||
|
||||
* **Read Release Notes**: The shift from Redis to Postgres for caching was a major architectural change that I missed during the upgrade.
|
||||
* **Storage Matters**: Distributed storage (Ceph/Longhorn) handles linear writes well, but struggles with latency-sensitive, high-frequency sync operations like WAL updates.
|
||||
* **Tuning Postgres**: For workloads where immediate durability is less critical than latency (like caching tables), `synchronous_commit = off` is a powerful tool.
|
||||
* **Observability**: The "Wife Test" is a valid monitoring alert. If a user complains it's slow, investigate the P99 latency, not just the average.
|
||||
|
||||
### References
|
||||
* [Authentik 2025.10 Release Notes](https://docs.goauthentik.io/releases/2025.10/)
|
||||
* [PostgreSQL Documentation: Synchronous Commit](https://www.postgresql.org/docs/current/wal-async-commit.html)
|
||||
@@ -1,129 +0,0 @@
|
||||
---
|
||||
title: "Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso"
|
||||
date: 2025-05-01
|
||||
draft: false
|
||||
---
|
||||
|
||||
Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.
|
||||
|
||||
Our overarching philosophy is simple: **isolate and change only one variable at a time.** While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your **grind size** is your most powerful lever.
|
||||
|
||||
Let's dive in!
|
||||
|
||||
---
|
||||
|
||||
### **Part 1: The Foundation — Dose (The Weight of Dry Coffee)**
|
||||
|
||||
Your dose is the bedrock of your espresso. It's the weight of your ground coffee, and it should be the first variable you set and then keep **constant** during the initial dialing-in process.
|
||||
|
||||
**Why Dose Matters:**
|
||||
|
||||
* **Basket Size is Key:** Your portafilter basket dictates your ideal dose. Too little coffee (under-dosing) creates excessive "headspace," leading to soupy extractions. Too much (over-dosing) causes the coffee puck to touch the shower screen, preventing even water flow and causing channeling.
|
||||
* **Extraction "Work":** A higher dose means more coffee mass, requiring more "work" (a finer grind, more water) to extract properly.
|
||||
* **Coffee Type:**
|
||||
* **Light Roasts:** Denser and harder to extract. Consider a **slightly lower dose**.
|
||||
* **Dark Roasts:** More brittle and soluble. You can often use a **slightly higher dose**.
|
||||
|
||||
**Application for Your Breville Barista Pro (54mm Portafilter):**
|
||||
|
||||
* **Your Starting Point:** Always begin with **18 grams**. Use a scale for accuracy!
|
||||
* **Adjusting for Roast:** For light roasts, if you're struggling, drop to 17g. For dark roasts, you can try 19g.
|
||||
* **Golden Rule:** Once you choose your starting dose (e.g., 18g), **do not change it** until you've dialed in your grind size.
|
||||
|
||||
---
|
||||
|
||||
### **Part 2: Defining the Drink — Brew Ratio (Dose vs. Yield)**
|
||||
|
||||
The brew ratio defines the relationship between your dry coffee dose and the weight of your liquid espresso yield. Always measure by **weight (grams)**, not volume (mL), as crema can be inconsistent.
|
||||
|
||||
**Understanding Ratios:**
|
||||
|
||||
* **Ristretto (1:1 – 1:1.5):** E.g., 18g in → 18g to 27g out. Strong, textured, less extracted.
|
||||
* **Espresso (Normale) (1:1.5 – 1:2.5):** E.g., 18g in → 27g to 45g out. The standard, balanced shot.
|
||||
* **Lungo (1:2.5+):** E.g., 18g in → 45g+ out. Weaker, less textured, more extracted.
|
||||
|
||||
**The Fundamental Trade-Off:**
|
||||
|
||||
* **Longer Ratio (more water):** Higher extraction, but lower strength (more diluted).
|
||||
* **Shorter Ratio (less water):** Lower extraction, but higher strength (more concentrated).
|
||||
|
||||
**Application for Your Breville Barista Pro:**
|
||||
|
||||
* **Recommended Starting Ratio:** A **1:2 ratio** is the perfect place to begin.
|
||||
* **Practical Numbers:** With your 18g dose, your target yield is **36 grams** of liquid espresso.
|
||||
* **Execution:** Place your cup on a scale and use the manual brew function to stop the shot precisely when the scale reads 36g.
|
||||
|
||||
---
|
||||
|
||||
### **Part 3: The Diagnostic Tool — Brew Time**
|
||||
|
||||
Brew time is not something you set directly; it's the **result** of how much resistance your coffee puck provides against the machine's water pressure. Think of it as a **diagnostic tool**.
|
||||
|
||||
**The 25-30 Second Guideline:**
|
||||
|
||||
This is a benchmark. If your 1:2 ratio shot falls within this time, your grind size is likely in the correct range for a balanced extraction.
|
||||
|
||||
* **Too Fast (<25s):** Indicates under-extraction (often tastes sour).
|
||||
* **Too Slow (>30s):** Indicates over-extraction (often tastes bitter).
|
||||
|
||||
**Taste is King:** Remember, if a shot tastes fantastic at 32 seconds, it's a great shot! The time simply becomes part of your successful recipe for that specific coffee.
|
||||
|
||||
**Application for Your Breville Barista Pro:**
|
||||
|
||||
* **Pre-infusion:** The Barista Pro's low-pressure pre-infusion is **part of your total brew time**. Its purpose is to saturate the puck evenly to prevent channeling. Keep it consistent for every shot while dialing in.
|
||||
|
||||
---
|
||||
|
||||
### **Part 4: The Primary Control — Grind Setting**
|
||||
|
||||
This is where the magic (and sometimes frustration) happens. Grind size is your main tool for controlling the resistance of the coffee puck, which directly dictates your brew time.
|
||||
|
||||
**The Dual Impact of Grinding Finer:**
|
||||
|
||||
1. **Increases surface area:** Allows for more efficient flavor extraction.
|
||||
2. **Increases resistance:** Slows down water flow and increases contact time.
|
||||
|
||||
**The Risk of Grinding Too Fine (Channeling):**
|
||||
|
||||
If the grind is too fine, the puck becomes so dense that high-pressure water can't flow evenly. Instead, it "breaks" the puck and punches an easy path (a channel) through a weak spot. This results in a disastrous shot that is simultaneously:
|
||||
|
||||
* **Under-extracted:** Most of the coffee is bypassed.
|
||||
* **Over-extracted:** The water that does flow blasts through the channel, extracting harsh, bitter compounds.
|
||||
* **The Taste:** A channeled shot tastes hollow, weak, sour, *and* bitter all at once.
|
||||
|
||||
**The Goal:** You want to **grind as fine as you possibly can *without* causing significant channeling**. This is the sweet spot for maximizing surface area and resistance for high, even extraction.
|
||||
|
||||
**Grind Retention (Purging):** Most grinders retain some old grounds. When you change your grind setting, always purge a few grams of coffee to ensure your dose is entirely at the new setting.
|
||||
|
||||
**Application for Your Breville Barista Pro:**
|
||||
|
||||
* **Grinder Mechanism:** The "Grind Amount" dial controls the **TIME** the grinder runs, not the weight. When you adjust the fineness, you **must** re-adjust the grind time to ensure you are still getting your target 18g dose.
|
||||
* **Tackling Channeling:** The Barista Pro is prone to channeling. To fight this, focus on excellent **puck prep**: use a WDT (Weiss Distribution Technique) tool to break up clumps and evenly distribute the grounds before tamping levelly.
|
||||
|
||||
---
|
||||
|
||||
### **The Complete Dialing-In Workflow**
|
||||
|
||||
This systematic process will get you to a delicious shot from your Breville Barista Pro efficiently:
|
||||
|
||||
1. **Set Your Constants:**
|
||||
* **Dose:** **18g**.
|
||||
* **Ratio:** **1:2** (meaning a **Yield** of **36g**).
|
||||
* **Pre-infusion:** Use a consistent method (e.g., manual 8-second hold).
|
||||
2. **Make an Initial Grind:**
|
||||
* Set the grinder to a starting point of **15**.
|
||||
* Adjust the grind **time** until the grinder dispenses exactly 18g.
|
||||
3. **Pull the First Shot:**
|
||||
* Brew manually, stopping at **36g** of liquid in the cup. Note the **total brew time**.
|
||||
4. **Taste and Diagnose:**
|
||||
* **Fast & Sour? (<25s):** Grind is too coarse.
|
||||
* **Slow & Bitter? (>32s):** Grind is too fine.
|
||||
5. **Make ONE Adjustment - THE GRIND SIZE:**
|
||||
* If fast/sour, adjust the grind **finer** (e.g., from 15 down to 13).
|
||||
* If slow/bitter, adjust the grind **coarser** (e.g., from 15 up to 17).
|
||||
6. **Re-adjust and Repeat:**
|
||||
* After changing the grind setting, **purge** a small amount of coffee.
|
||||
* Re-weigh your next dose and **adjust the grind time** to get back to exactly 18g.
|
||||
* Pull another 36g shot. Repeat this process until your shot tastes balanced and the time falls roughly between **25-32 seconds**.
|
||||
|
||||
Happy brewing! With patience and this systematic approach, you'll be pulling consistently delicious espresso shots from your Breville Barista Pro in no time.
|
||||
@@ -1,361 +0,0 @@
|
||||
---
|
||||
title: "Flashing Jetson Orin Nano in Virtualized Environments"
|
||||
date: 2025-10-02
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
# Flashing Jetson Orin Nano in Virtualized Environments
|
||||
|
||||
## Introduction
|
||||
|
||||
Flashing NVIDIA Jetson devices remotely presents unique challenges when the host machine is virtualized. This article documents the technical challenges, failures, and eventual success of flashing a Jetson Orin Nano Super developer kit using NVIDIA SDK Manager in various virtualized environments, specifically focusing on QEMU/KVM virtual machines and LXC containers on Proxmox VE.
|
||||
|
||||

|
||||
|
||||
### The Constraint: Hypervisor-Only Infrastructure
|
||||
|
||||
This project operated under a specific constraint: the only available x86_64 machines were homelab servers running Proxmox VE as bare-metal hypervisors. There was no x86 laptop available, and the primary workstation was an Apple M4 Mac (ARM64 architecture incompatible with SDK Manager).
|
||||
|
||||
Installing SDK Manager directly on the Proxmox host OS was explicitly ruled out for several reasons:
|
||||
|
||||
1. **Hypervisor Stability**: The Proxmox hosts run critical infrastructure (Kubernetes clusters, Ceph storage, network services). Installing development tools and potentially conflicting dependencies directly on the hypervisor risks system stability.
|
||||
|
||||
2. **Dependency Conflicts**: SDK Manager requires numerous dependencies (QEMU, specific Python versions, USB libraries) that could conflict with Proxmox's carefully managed package versions.
|
||||
|
||||
3. **Clean Separation**: Best practices dictate keeping hypervisor hosts minimal, with all workloads running in VMs or containers. This separation simplifies maintenance, updates, and disaster recovery.
|
||||
|
||||
4. **Repeatability**: A solution confined to a VM or container can be easily replicated, backed up, and destroyed without affecting the host system.
|
||||
|
||||
This constraint made the flashing process significantly more complex, as it required finding a virtualization method that could reliably handle the Jetson's USB communication requirements without installing anything on the Proxmox host beyond standard virtualization features.
|
||||
|
||||
## Background: Jetson Flashing Requirements
|
||||
|
||||
NVIDIA Jetson devices use a specialized flashing process that requires:
|
||||
|
||||
1. **USB Connection**: The device must be connected in Force Recovery Mode (APX mode, USB ID `0955:7523`)
|
||||
2. **Initrd Flash Method**: Modern Jetson devices boot a temporary Linux kernel over USB (`0955:7035`) that establishes USB networking
|
||||
3. **USB Network Communication**: The host system must establish network connectivity (typically `192.168.55.1` or IPv6 `fc00:1:1::1`) with the Jetson during the flash process
|
||||
4. **SDK Manager**: NVIDIA's SDK Manager orchestrates the entire process, requiring specific kernel modules and capabilities
|
||||
|
||||
The initrd flash method is particularly sensitive to timing and USB device handling, making it challenging in virtualized environments.
|
||||
|
||||
| Method | USB Passthrough | Network Namespace | Timing Sensitivity | Result |
|
||||
|--------|----------------|-------------------|-------------------|---------|
|
||||
| QEMU VM (device-level) | Emulated | VM-isolated | High latency | ❌ Failed (USB timeout) |
|
||||
| LXC Container | Host devices | Host namespace | Near-native | ❌ Failed (network isolation) |
|
||||
| QEMU VM (PCI-level) | Direct hardware | VM-isolated | Native | ✅ Success |
|
||||
## First Attempt: QEMU/KVM Virtual Machine with USB Passthrough
|
||||
|
||||
### Configuration
|
||||
|
||||
Given the constraint of not having an x86 laptop, initial attempts used a QEMU/KVM virtual machine running Ubuntu 22.04 x86_64 on an Apple M4 Mac via UTM (a QEMU frontend for macOS). This approach allowed running SDK Manager on an emulated x86_64 system while connecting the Jetson device via USB passthrough configured through UTM's USB settings.
|
||||
|
||||
While this satisfied the requirement of having an x86_64 environment without using the Proxmox hosts, it introduced additional virtualization overhead as the entire x86_64 instruction set was being emulated on ARM64 hardware.
|
||||
|
||||
### Issues Encountered
|
||||
|
||||
The flash process consistently failed during the USB communication phase with the error:
|
||||
|
||||
```
|
||||
ERROR: might be timeout in USB write.
|
||||
```
|
||||
|
||||
### Root Cause Analysis
|
||||
|
||||
QEMU/KVM's USB passthrough implementation has known reliability issues with complex USB protocols. The Jetson's initrd flash process requires:
|
||||
|
||||
1. Rapid USB re-enumeration when switching between recovery mode and initrd mode
|
||||
2. High-throughput data transfer for writing the root filesystem
|
||||
3. Bidirectional USB network communication with strict timing requirements
|
||||
|
||||
Individual USB device passthrough in QEMU emulates USB at the device level, introducing latency and potential timing issues. The Jetson's USB networking during initrd boot is particularly sensitive to these delays, causing the timeout errors.
|
||||
|
||||
### Conclusion
|
||||
|
||||
This approach was abandoned due to fundamental limitations in QEMU's USB emulation layer. USB passthrough at the device level is insufficient for the Jetson flash process.
|
||||
|
||||
## Second Attempt: LXC Container on Proxmox
|
||||
|
||||
### Rationale
|
||||
|
||||
After the Mac-based VM approach failed, attention shifted to the Proxmox infrastructure. LXC containers provide near-native performance with minimal virtualization overhead compared to full VMs. Unlike running SDK Manager directly on the Proxmox host (which was ruled out for stability reasons), an LXC container offers:
|
||||
|
||||
1. **Isolation**: Complete separation from the host OS with its own filesystem and process space
|
||||
2. **Near-Native Performance**: Containers share the host kernel, eliminating instruction emulation overhead
|
||||
3. **Easy Management**: Containers can be created, destroyed, and backed up without affecting the host
|
||||
4. **USB Access**: Proxmox supports passing USB devices to containers via cgroup device permissions
|
||||
|
||||
The hypothesis was that an LXC container with proper USB device access would provide the necessary USB timing characteristics while maintaining the clean separation requirement.
|
||||
|
||||
### Configuration Progression
|
||||
|
||||
The LXC container (ID 106, Ubuntu 22.04) required extensive configuration on the Proxmox host (`/etc/pve/lxc/106.conf`):
|
||||
|
||||
```bash
|
||||
# Enable mknod capability for creating device nodes
|
||||
features: nesting=1,mknod=1
|
||||
|
||||
# USB device passthrough (Bus 003)
|
||||
lxc.cgroup2.devices.allow: c 189:* rwm
|
||||
lxc.mount.entry: /dev/bus/usb/003 dev/bus/usb/003 none bind,optional,create=dir 0 0
|
||||
|
||||
# Loop device access for mounting disk images
|
||||
lxc.cgroup2.devices.allow: b 7:* rwm
|
||||
lxc.mount.entry: /dev/loop0 dev/loop0 none bind,optional,create=file 0 0
|
||||
lxc.mount.entry: /dev/loop1 dev/loop1 none bind,optional,create=file 0 0
|
||||
# ... (loop2-7)
|
||||
lxc.mount.entry: /dev/loop-control dev/loop-control none bind,optional,create=file 0 0
|
||||
```
|
||||
|
||||
### Issues Encountered and Resolutions
|
||||
|
||||
#### 1. mknod Permission Errors
|
||||
|
||||
**Error**: `mknod: .../rootfs/dev/random: Operation not permitted`
|
||||
|
||||
**Cause**: LXC containers lack `CAP_MKNOD` capability by default, required by L4T flash scripts to create device nodes in the rootfs.
|
||||
|
||||
**Solution**: Enable `mknod=1` feature on the Proxmox host:
|
||||
```bash
|
||||
pct set 106 -features nesting=1,mknod=1
|
||||
```
|
||||
|
||||
#### 2. ARM64 Binary Execution
|
||||
|
||||
**Error**: `chroot: failed to run command 'dpkg': Exec format error`
|
||||
|
||||
**Cause**: The L4T rootfs contains ARM64 binaries that cannot execute on x86_64 without emulation.
|
||||
|
||||
**Solution**: Install and enable `qemu-user-static` and `binfmt-support` on the **Proxmox host** (not the container):
|
||||
```bash
|
||||
apt-get install qemu-user-static binfmt-support
|
||||
update-binfmts --enable qemu-aarch64
|
||||
```
|
||||
|
||||
#### 3. Loop Device Access
|
||||
|
||||
**Error**: `losetup: cannot find an unused loop device`
|
||||
|
||||
**Cause**: The L4T flash scripts use loop devices to mount disk images. LXC containers don't have loop device access by default.
|
||||
|
||||
**Solution**: Add loop device permissions and mount entries to the container configuration.
|
||||
|
||||
#### 4. USB Networking Failure
|
||||
|
||||
**Error**: `Device failed to boot to the initrd flash kernel`
|
||||
|
||||
**Cause**: This was the most complex issue. When the Jetson boots into initrd mode (`0955:7035`), it creates a USB network interface (`enx*` or `usb0`). However, in LXC containers, this interface appeared in the **host's network namespace**, not the container's namespace.
|
||||
|
||||
**Attempted Solution**:
|
||||
1. Loaded USB networking kernel modules on the Proxmox host:
|
||||
```bash
|
||||
modprobe rndis_host cdc_ether cdc_ncm cdc_subset
|
||||
echo "rndis_host" >> /etc/modules
|
||||
echo "cdc_ether" >> /etc/modules
|
||||
echo "cdc_ncm" >> /etc/modules
|
||||
echo "cdc_subset" >> /etc/modules
|
||||
```
|
||||
|
||||
2. Created udev rules to automatically move USB network interfaces to the container:
|
||||
```bash
|
||||
# /etc/udev/rules.d/99-jetson-usb-network.rules
|
||||
ACTION=="add", SUBSYSTEM=="net", KERNEL=="enx*", RUN+="/usr/local/bin/handle-jetson-usb-network.sh %k"
|
||||
```
|
||||
|
||||
3. Created handler script to move interfaces into container namespace:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
INTERFACE=$1
|
||||
CONTAINER_ID=106
|
||||
CONTAINER_PID=$(pct exec $CONTAINER_ID -- pidof systemd | awk '{print $1}')
|
||||
ip link set "$INTERFACE" netns "ct$CONTAINER_ID"
|
||||
pct exec $CONTAINER_ID -- ip link set dev $INTERFACE up
|
||||
pct exec $CONTAINER_ID -- dhclient $INTERFACE
|
||||
```
|
||||
|
||||
### Fundamental LXC Limitation
|
||||
|
||||
Despite all configuration efforts, the LXC container could not properly handle USB network interfaces due to **network namespace isolation**. LXC containers have separate network namespaces from the host, and moving USB network interfaces between namespaces proved unreliable and often failed to establish proper connectivity.
|
||||
|
||||
The initrd flash process has strict timing requirements:
|
||||
1. Jetson boots into initrd mode
|
||||
2. USB network interface must appear and be configured within seconds
|
||||
3. SSH connection must establish for flash commands
|
||||
|
||||
Even when the interface was successfully moved to the container's namespace, DHCP configuration often failed, causing the flash process to timeout.
|
||||
|
||||
### Conclusion
|
||||
|
||||
LXC containers, despite their near-native performance, have fundamental limitations for this use case due to network namespace isolation. USB networking devices created dynamically during the flash process cannot be reliably handled.
|
||||
|
||||
## Final Solution: Proxmox VM with PCI USB Controller Passthrough
|
||||
|
||||
### Architecture Change
|
||||
|
||||
With both the Mac-based VM (due to QEMU USB emulation issues) and the LXC container (due to network namespace isolation) ruled out, the final approach combined the best aspects of both previous attempts while working within the Proxmox infrastructure constraint:
|
||||
|
||||
1. **Use a VM** (not a container) to provide proper network namespace isolation for USB networking
|
||||
2. **Pass through the entire USB controller at the PCI level** (not individual USB devices) to eliminate emulation overhead and any potential timing issues
|
||||
3. **Keep the host OS clean** by running SDK Manager only within the disposable VM
|
||||
|
||||
This approach leverages Proxmox's PCI passthrough capability—a feature designed exactly for scenarios where VMs need direct hardware access without installing drivers or tools on the hypervisor host.
|
||||
|
||||
### Implementation
|
||||
|
||||
#### 1. Identify USB Controller
|
||||
|
||||
```bash
|
||||
# Find which USB controller the Jetson is connected to
|
||||
lsusb -t | grep -B5 "0955:7523"
|
||||
|
||||
# Map USB buses to PCI addresses
|
||||
for bus in {1..8}; do
|
||||
pci=$(readlink /sys/bus/usb/devices/usb$bus 2>/dev/null | grep -oE '[0-9a-f]{4}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-9]')
|
||||
echo "USB Bus $bus → PCI $pci"
|
||||
done
|
||||
```
|
||||
|
||||
Result: Jetson on Bus 4, controlled by PCI device `0000:22:00.3`
|
||||
|
||||
Verification that no other critical devices shared this controller:
|
||||
```bash
|
||||
lsusb | grep "Bus 003" # Empty except root hub
|
||||
lsusb | grep "Bus 004" # Only Jetson device
|
||||
```
|
||||
|
||||
#### 2. Create VM with PCI Passthrough
|
||||
|
||||
```bash
|
||||
# Create VM
|
||||
qm create 200 --name jetson-flash --memory 4096 --cores 4 \
|
||||
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci
|
||||
|
||||
# Set machine type to q35 (required for PCIe passthrough)
|
||||
qm set 200 --machine q35
|
||||
|
||||
# Import Ubuntu cloud image
|
||||
qm importdisk 200 ubuntu-22.04-server-cloudimg-amd64.img local-lvm
|
||||
|
||||
# Configure disk and cloud-init
|
||||
qm set 200 --scsi0 local-lvm:vm-200-disk-0 --boot order=scsi0 \
|
||||
--ide2 local-lvm:cloudinit
|
||||
|
||||
# Configure cloud-init
|
||||
qm set 200 --ciuser sdkmanager --cipassword sdkmanager \
|
||||
--ipconfig0 ip=dhcp --sshkeys ~/.ssh/authorized_keys
|
||||
|
||||
# Add PCI passthrough for USB controller
|
||||
qm set 200 --hostpci0 0000:22:00.3,pcie=1
|
||||
|
||||
# Resize disk for JetPack installation
|
||||
qm resize 200 scsi0 +30G
|
||||
|
||||
# Start VM
|
||||
qm start 200
|
||||
```
|
||||
|
||||
#### 3. Critical: USB Networking Kernel Modules
|
||||
|
||||
The Ubuntu cloud image does not include USB networking kernel modules by default. This is critical because when the Jetson boots into initrd mode, it requires the host to have these modules loaded immediately.
|
||||
|
||||
**Solution**: Install and load modules before starting the flash:
|
||||
|
||||
```bash
|
||||
# Install extra kernel modules
|
||||
apt-get install linux-modules-extra-$(uname -r)
|
||||
|
||||
# Load USB networking modules
|
||||
modprobe rndis_host
|
||||
modprobe cdc_ether
|
||||
modprobe cdc_ncm
|
||||
modprobe cdc_subset
|
||||
|
||||
# Verify modules loaded
|
||||
lsmod | grep -E 'rndis|cdc'
|
||||
```
|
||||
|
||||
When the Jetson transitions to initrd mode (`0955:7035`), the USB network interface (`usb0`) now appears immediately in the VM's network namespace.
|
||||
|
||||
#### 4. Network Configuration
|
||||
|
||||
The Jetson's initrd uses **IPv6** for USB networking by default:
|
||||
|
||||
```bash
|
||||
# Interface appears automatically
|
||||
ip addr show usb0
|
||||
# Output:
|
||||
# usb0: inet6 fc00:1:1::1/64 scope global
|
||||
|
||||
# Test connectivity
|
||||
ping6 -c 3 fc00:1:1:0::2 # Jetson's IPv6 address
|
||||
```
|
||||
|
||||
The SDK Manager automatically detects and uses IPv6 connectivity for SSH and flash operations.
|
||||
|
||||
### Flash Process Timeline
|
||||
|
||||
1. **07:49:38** - Flash component started, 30-second pre-check wait
|
||||
2. **07:50:12** - Board detected as `jetson-orin-nano-devkit-super`
|
||||
3. **07:51:25** - System image created (rootfs populated)
|
||||
4. **07:52:24** - Converting to sparse image format
|
||||
5. **07:54:54** - Device rebooted into initrd mode (`0955:7035`)
|
||||
6. **07:55:05** - USB network interface `usb0` appeared immediately
|
||||
7. **07:55:16** - SSH connection established via IPv6
|
||||
8. **07:55:16-07:59:05** - QSPI flash (boot firmware) written
|
||||
9. **07:59:05-08:00:28** - eMMC flash (system partitions) written
|
||||
10. **08:00:28** - Flash successful, device rebooted to normal mode (`0955:7020`)
|
||||
11. **08:00:28-08:02:58** - First-boot auto-configuration
|
||||
12. **08:03:00** - Installation completed successfully
|
||||
|
||||
Total flash time: **~13 minutes**
|
||||
|
||||
### Why PCI Passthrough Succeeded
|
||||
|
||||
1. **Direct Hardware Access**: The VM has complete control over the USB controller with no emulation layer
|
||||
2. **Timing Precision**: USB protocol timing is maintained at hardware level
|
||||
3. **Network Namespace**: The VM's network stack directly handles USB network interfaces
|
||||
4. **No Virtualization Overhead**: USB transactions happen at native speed
|
||||
|
||||
|
||||
## Key Lessons Learned
|
||||
|
||||
1. **USB Device Passthrough vs Controller Passthrough**: Passing through individual USB devices adds emulation overhead. PCI-level controller passthrough provides native hardware access.
|
||||
|
||||
2. **LXC Network Namespace Limitations**: LXC containers cannot reliably handle dynamically created USB network interfaces due to network namespace isolation. Even with udev rules to move interfaces, timing and configuration issues persist.
|
||||
|
||||
3. **Kernel Module Requirements**: USB networking kernel modules must be loaded **before** the Jetson enters initrd mode. Cloud images and minimal installations often lack these modules.
|
||||
|
||||
4. **IPv6 Support**: Modern Jetson initrd images prefer IPv6 for USB networking. Ensure the host system has IPv6 enabled and properly configured.
|
||||
|
||||
5. **Timing Sensitivity**: The Jetson's initrd flash process has strict timing requirements. USB network interfaces must appear and be configured within seconds of the mode transition.
|
||||
|
||||
6. **PCI Passthrough Machine Type**: QEMU/KVM requires `q35` machine type for PCIe device passthrough. The default `i440fx` machine type does not support it.
|
||||
|
||||
## Recommendations
|
||||
|
||||
For flashing Jetson devices in production or automated environments:
|
||||
|
||||
1. **Use PCI USB Controller Passthrough**: If virtualization is required, pass through the entire USB controller at the PCI level to a VM.
|
||||
|
||||
2. **Pre-load USB Networking Modules**: Ensure `rndis_host`, `cdc_ether`, `cdc_ncm`, and `cdc_subset` kernel modules are loaded before starting the flash process.
|
||||
|
||||
3. **Verify USB Controller Isolation**: Before passthrough, ensure no other critical devices share the USB controller.
|
||||
|
||||
4. **Use Physical Machines When Possible**: For development and testing, a physical Linux machine provides the most reliable flashing experience.
|
||||
|
||||
5. **Monitor USB Device Transitions**: Use `lsusb` and `dmesg` to monitor device state transitions:
|
||||
- `0955:7523` = Recovery mode (APX)
|
||||
- `0955:7035` = Initrd flash mode
|
||||
- `0955:7020` = Normal operation mode
|
||||
|
||||
|
||||
## References
|
||||
|
||||
- NVIDIA Jetson Linux Developer Guide: https://docs.nvidia.com/jetson/
|
||||
- NVIDIA SDK Manager Documentation: https://developer.nvidia.com/sdk-manager
|
||||
- Proxmox VE PCI Passthrough: https://pve.proxmox.com/wiki/PCI_Passthrough
|
||||
- Linux USB Networking Drivers: https://www.kernel.org/doc/html/latest/usb/
|
||||
- QEMU USB Documentation: https://www.qemu.org/docs/master/system/devices/usb.html
|
||||
- LXC Container Configuration: https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html
|
||||
|
||||
@@ -1,70 +0,0 @@
|
||||
---
|
||||
title: "Beyond Words: How RVQ Teaches LLMs to See and Hear"
|
||||
date: 2025-08-07
|
||||
draft: false
|
||||
---
|
||||
|
||||
Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?
|
||||
|
||||
The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is **Residual Vector Quantization (RVQ)**.
|
||||
|
||||
This article dives deep into RVQ, exploring how it turns raw data into meaningful semantic IDs and how these IDs, in turn, unlock multi-modal understanding in LLMs.
|
||||
|
||||
#### **What is Residual Vector Quantization? The Art of Smart Compression**
|
||||
|
||||
At its core, Vector Quantization (VQ) is a compression technique. It maps a high-dimensional vector (like an data embedding) to the single closest vector in a predefined dictionary, called a **codebook**. You then only need to store the index of that chosen vector. The problem? To represent complex data accurately, you'd need a codebook with an astronomical number of entries, which is computationally impossible.
|
||||
|
||||
This is where **Residual** Vector Quantization shines. Instead of one giant codebook, RVQ uses a series of smaller codebooks in stages.
|
||||
|
||||
1. **Stage 1 (Coarse Quantization):** The input vector is quantized by the first codebook. This finds the broadest, most general category for the data.
|
||||
2. **Calculate the Residual:** The system calculates the error, or "residual," between the original vector and its quantized version from Stage 1. This residual vector represents the information that was lost in the first coarse approximation.
|
||||
3. **Stage 2 (Refinement):** This residual vector is then quantized by the *second* codebook. This stage doesn't re-evaluate the whole vector, but only focuses on correcting the error from the previous stage.
|
||||
4. **Iterate:** This process repeats for several stages, with each subsequent codebook quantizing the residual error from the previous one, adding a finer and finer layer of detail.
|
||||
|
||||
The final compressed representation is simply the sequence of indices from each codebook. For example, an ID like `[8, 5, 4, 1]` is produced. The magic of this approach is that it creates a **hierarchical ID**. The first digit `[8]` might represent "Sports," the next `[5]` refines it to "Court Sports," `[4]` to "Beach Volleyball," and the final `[1]` distinguishes a specific match. Videos with similar content will naturally share a longer prefix in their Semantic ID.
|
||||
|
||||
#### **Learning What Matters: The Trainable VQ-Autoencoder**
|
||||
|
||||
A key insight is that RVQ is not a fixed algorithm but a **trainable neural network component**. Its codebooks are not predefined; they are learned. This learning happens within a **Vector-Quantized Autoencoder (VQ-AE)** architecture.
|
||||
|
||||
1. **Encoder:** A powerful neural network (e.g., a Transformer or CNN) takes the raw data (like video frames and audio) and converts it into a continuous semantic embedding.
|
||||
2. **RVQ Bottleneck:** This embedding is fed into the RVQ module, which quantizes it into the sequence of discrete IDs.
|
||||
3. **Decoder:** The decoder takes these discrete IDs, looks up the corresponding codebook vectors, sums them up to get a reconstructed embedding, and attempts to rebuild the original video/audio.
|
||||
|
||||
The entire system is trained end-to-end. The **reconstruction loss** (the difference between the original and reconstructed data) is used to update the parameters of the Encoder, the Decoder, and, most importantly, **the codebook vectors within the RVQ module**. Initially random, the codebook vectors are gradually pushed to become meaningful "anchors" for the core concepts present in the training data.
|
||||
|
||||
#### **From Implicit to Explicit: Controlling Semantics with Contrastive Learning**
|
||||
|
||||
A standard VQ-AE learns implicit semantics. It gets good at reconstruction, but we can't control *what* concepts it learns. To make the Semantic IDs truly meaningful and aligned with human language, we introduce **contrastive learning**.
|
||||
|
||||
The architecture is enhanced with a parallel text encoder (like BERT or CLIP's). The model is then trained with a joint loss function:
|
||||
|
||||
`L_total = L_reconstruction + λ * L_contrastive`
|
||||
|
||||
* **Reconstruction Loss** ensures the RVQ codes contain enough information to rebuild the input.
|
||||
* **Contrastive Loss** forces the media embedding (from the video/audio encoder) to be mathematically "close" to the text embedding of its description, and "far" from the embeddings of unrelated text descriptions.
|
||||
|
||||
This dual goal forces the model to organize its embedding space according to the semantics of human language. The codebook vectors now learn to represent concepts that are not just useful for reconstruction, but are also tied to explicit textual descriptions.
|
||||
|
||||
#### **Integrating with LLMs: Two Powerful Paths to Multi-Modality**
|
||||
|
||||
Once we have a contrastively-trained VQ-AE, we can use its output to give LLMs the ability to see and hear. There are two primary strategies for this.
|
||||
|
||||
**Path 1: The Tokenizer Approach - Teaching the LLM a New Language**
|
||||
|
||||
This path treats the RVQ IDs as a new vocabulary. It’s a two-stage process ideal for high-fidelity content generation.
|
||||
|
||||
1. **Create a Neural Codec:** The trained VQ-AE serves as a powerful "codec." You can take any piece of media (e.g., a song) and use the codec to compress it into a sequence of discrete RVQ tokens (e.g., `[8, 5, 4, 1, 8, 5, 9, 2, ...]`).
|
||||
2. **Train a Generative LLM:** A new Transformer model is trained auto-regressively on a massive dataset of these media-derived tokens. Its sole purpose is to learn the patterns and predict the next token in a sequence.
|
||||
|
||||
**Use Case:** This is the architecture behind models like Meta's MusicGen. A user provides a text prompt, which conditions the Transformer to generate a new sequence of RVQ tokens. These tokens are then fed to the VQ-AE's decoder to synthesize the final audio waveform.
|
||||
|
||||
**Path 2: The Adapter Approach - Translating for a Language Expert**
|
||||
|
||||
This path is used to augment a powerful, pre-trained, text-only LLM without the astronomical cost of retraining it.
|
||||
|
||||
1. **Freeze the LLM:** A massive, pre-trained LLM (like LLaMA) is frozen. Its deep language understanding is preserved.
|
||||
2. **Use the Pre-Quantized Embedding:** Instead of using the discrete RVQ tokens, we take the rich, continuous embedding vector produced by our media encoder *just before* it enters the RVQ module.
|
||||
3. **Train a Small Adapter:** A small, lightweight projection layer (or "adapter") is trained. Its only job is to translate the media embedding into a vector that has the same format and structure as the LLM's own word embeddings. It learns to map visual concepts to their corresponding "word" concepts in the LLM's latent space.
|
||||
|
||||
**Use Case:** This is the principle behind models like Google's Flamingo. To answer a question about an image, the image is passed through the media encoder and adapter. The resulting "vision-as-a-word" vector is inserted into the prompt sequence alongside the text tokens. The frozen LLM can now "reason" about the visual input because it has been translated into a format it already understands.
|
||||
@@ -1,85 +0,0 @@
|
||||
---
|
||||
title: "Setting Up Jellyfin SSO with Authentik: Surviving the Beta"
|
||||
date: 2025-11-15
|
||||
draft: false
|
||||
---
|
||||
|
||||
I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren't immediately obvious.
|
||||
|
||||
## The Setup
|
||||
The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent.
|
||||
|
||||
### 1. Authentik (Terraform)
|
||||
Let Authentik manage the secrets. Don't hardcode them.
|
||||
```hcl
|
||||
resource "authentik_provider_oauth2" "jellyfin" {
|
||||
name = "Jellyfin"
|
||||
client_id = "jellyfin-ericxliu-me"
|
||||
# client_secret omitted -> auto-generated
|
||||
property_mappings = [
|
||||
authentik_scope_mapping.openid.id,
|
||||
authentik_scope_mapping.profile.id,
|
||||
authentik_scope_mapping.email.id,
|
||||
authentik_scope_mapping.groups.id
|
||||
]
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Jellyfin Plugin (Bash/Curl)
|
||||
```bash
|
||||
# ... (retrieve secret from terraform) ...
|
||||
curl -X POST "https://jellyfin.ericxliu.me/SSO/OID/Add/authentik" ... -d '{
|
||||
"OidClientId": "jellyfin-ericxliu-me",
|
||||
"OidSecret": "'"${SECRET}"'",
|
||||
"OidScopes": ["openid", "profile", "email", "groups"],
|
||||
"SchemeOverride": "https",
|
||||
"RoleClaim": "groups"
|
||||
}'
|
||||
```
|
||||
|
||||
## Obscure Errors & Fixes
|
||||
Because the plugin is still maturing, it doesn't always handle configuration errors gracefully. Here are the two main "cryptic" failures I encountered.
|
||||
|
||||
### 1. The "Value cannot be null" Crash
|
||||
**The Symptom**:
|
||||
You attempt to start the SSO flow and get a generic 500 error. The Jellyfin logs show a C# exception:
|
||||
```
|
||||
System.ArgumentNullException: Value cannot be null. (Parameter 'source')
|
||||
at System.Linq.Enumerable.Prepend[TSource](IEnumerable`1 source, TSource element)
|
||||
at Jellyfin.Plugin.SSO.Api.SSOController.OidChallenge(...)
|
||||
```
|
||||
**The Reality**:
|
||||
This looks like deep internal failure, but it's actually a simple configuration miss. The plugin code attempts to prepend "openid profile" to your configured scopes without checking if your scopes array exists first.
|
||||
**The Fix**:
|
||||
You **must** explicitly provide `"OidScopes"` in your JSON configuration. It cannot be null or omitted.
|
||||
```json
|
||||
"OidScopes": ["openid", "profile", "email", "groups"]
|
||||
```
|
||||
|
||||
### 2. The HTTP/HTTPS Mismatch (Redirect Loop)
|
||||
**The Symptom**:
|
||||
Authentik rejects the authorization request with "Redirect URI mismatch", or the browser enters a redirect loop.
|
||||
**The Reality**:
|
||||
Jellyfin often sits behind a reverse proxy (Ingress/Traefik) terminating TLS. Use `Browser Developer Tools` to inspect the network requests. You will likely see the `redirect_uri` parameter encoded as `http://jellyfin...` instead of `https://`. configuration.
|
||||
**The Fix**:
|
||||
Do not rely on header forwarding magic. Force the scheme in the plugin configuration:
|
||||
```json
|
||||
"SchemeOverride": "https"
|
||||
```
|
||||
|
||||
### 3. Case Sensitivity in JSON
|
||||
**The Symptom**: Configuration seems to be ignored or fields remain empty after a POST.
|
||||
**The Reality**: The plugin's API controller keys are Case Sensitive in some versions/contexts.
|
||||
**The Fix**: Stick to PascalCase for the keys (`OidEndpoint`, `AdminRoles`) as seen in the C# DTOs, rather than camelCase (`oidEndpoint`), unless the specific version documentation explicitly states otherwise. When in doubt, checking the source code (`SSOController.cs`) is often faster than trusting the README.
|
||||
|
||||
## Summary
|
||||
When debugging Jellyfin SSO, don't trust the UI to tell you what's wrong.
|
||||
1. **Check the logs** (`kubectl logs`) for C# stack traces.
|
||||
2. **Sanitize your JSON** inputs (arrays can't be null).
|
||||
3. **Inspect the URL parameters** in your browser to see what Redirect URI is actually being generated.
|
||||
|
||||
### References
|
||||
- Jellyfin SSO Plugin Repository: `https://github.com/9p4/jellyfin-plugin-sso`
|
||||
- Authentik Documentation: `https://goauthentik.io/docs/providers/oauth2/`
|
||||
- Jellyfin API Documentation: `https://api.jellyfin.org/`
|
||||
@@ -1,117 +0,0 @@
|
||||
---
|
||||
title: "Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice"
|
||||
date: 2025-07-02
|
||||
draft: false
|
||||
---
|
||||
|
||||
Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called "experts") to specialize in different types of inputs. A "gating network" or "router" learns to dispatch each input (or "token") to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.
|
||||
|
||||
### 1. Challenge: Non-Differentiability of Routing Functions
|
||||
|
||||
**The Problem:**
|
||||
Many routing mechanisms, especially "Top-K routing," involve a discrete, hard selection process. A common function is `KeepTopK(v, k)`, which selects the top `k` scoring elements from a vector `v` and sets others to $-\infty$ or $0$.
|
||||
|
||||
$$
|
||||
KeepTopK(v, k)_i = \begin{cases} v_i & \text{if } v_i \text{ is in the top } k \text{ elements of } v \\ -\infty & \text{otherwise.} \end{cases}
|
||||
$$
|
||||
|
||||
This function is **not differentiable**. Its gradient is zero almost everywhere and undefined at the threshold points, making it impossible to directly train the gating network's parameters (e.g., $W_g$) using standard gradient descent.
|
||||
|
||||
**Solutions (Stochastic Approximations):**
|
||||
To enable end-to-end training, non-differentiable routing decisions must be approximated with differentiable or stochastic methods.
|
||||
|
||||
* **Stochastic Scoring (e.g., Shazeer et al. 2017):**
|
||||
The expert score $H(x)_i = (x \cdot W_g)_i + \text{StandardNormal}() \cdot \text{Softplus}((x \cdot W_{noise})_i)$ introduces Gaussian noise. This makes the scores themselves stochastic, which can be leveraged with other methods.
|
||||
|
||||
* **Gumbel-Softmax Trick (or Concrete Distribution):**
|
||||
This method allows for differentiable sampling from categorical distributions. Instead of directly picking the top-k, Gumbel noise is added to the scores, and a Softmax (with a temperature parameter) is applied. This provides a continuous, differentiable approximation of a discrete choice, allowing gradients to flow back.
|
||||
|
||||
* **REINFORCE (Score Function Estimator):**
|
||||
This is a policy gradient method from reinforcement learning. The routing decision is treated as an action, and the gating network's parameters are updated based on the "reward" (e.g., the model's performance). Gradients are estimated by sampling routing choices and weighting them by their outcomes.
|
||||
|
||||
* **Straight-Through Estimator (STE):**
|
||||
A simpler approximation where, during the backward pass, gradients are treated as if the non-differentiable operation was an identity function or a simple smooth function.
|
||||
|
||||
* **Softmax after TopK (e.g., Mixtral, DBRX, DeepSeek v3):**
|
||||
Instead of `Softmax(KeepTopK(...))`, some models apply a Softmax *only to the scores of the selected TopK experts*, and then assign $0$ to the rest. This provides differentiable weights for the selected experts while still enforcing sparsity.
|
||||
|
||||
### 2. Challenge: Uneven Expert Utilization (Balancing Loss)
|
||||
|
||||
**The Problem:**
|
||||
Left unchecked, the gating network might learn to heavily favor a few experts, leaving others underutilized. This leads to:
|
||||
* **System Inefficiency:** Overloaded experts become bottlenecks, while underutilized experts waste computational resources.
|
||||
* **Suboptimal Learning:** Experts might not specialize effectively if they don't receive diverse data.
|
||||
|
||||
**Solution: Heuristic Balancing Losses (e.g., from Switch Transformer, Fedus et al. 2022)**
|
||||
An auxiliary loss is added to the total model loss during training to encourage more even expert usage.
|
||||
|
||||
$$ \text{loss}_{\text{auxiliary}} = \alpha \cdot N \cdot \sum_{i=1}^{N} f_i \cdot P_i $$
|
||||
|
||||
Where:
|
||||
* $\alpha$: A hyperparameter controlling the strength of the auxiliary loss.
|
||||
* $N$: Total number of experts.
|
||||
* $f_i$: The **fraction of tokens *actually dispatched* to expert $i$** in the current batch $B$.
|
||||
$$ f_i = \frac{1}{T} \sum_{x \in B} \mathbf{1}\{\text{argmax } p(x) = i\} $$
|
||||
($p(x)$ here refers to the output of the gating network, which could be $s_{i,t}$ in the DeepSeek/classic router. The $\text{argmax}$ means it counts hard assignments to expert $i$.)
|
||||
* $P_i$: The **fraction of the router *probability mass* allocated to expert $i$** in the current batch $B$.
|
||||
$$ P_i = \frac{1}{T} \sum_{x \in B} p_i(x) $$
|
||||
($p_i(x)$ is the learned probability (or soft score) from the gating network for token $x$ and expert $i$.)
|
||||
|
||||
**How it works:**
|
||||
The loss aims to minimize the product $f_i \cdot P_i$ when $f_i$ and $P_i$ are small, effectively pushing them to be larger (closer to $1/N$). If an expert $i$ is overused (high $f_i$ and $P_i$), its term in the sum contributes significantly to the loss. The derivative with respect to $p_i(x)$ reveals that "more frequent use = stronger downweighting," meaning the gating network is penalized for sending too much traffic to an already busy expert.
|
||||
|
||||
**Relationship to Gating Network:**
|
||||
* **$p_i(x)$ (or $s_{i,t}$):** This is the output of the **learned gating network** (e.g., from a linear layer followed by Softmax). The gating network's parameters are updated via gradient descent, influenced by this auxiliary loss.
|
||||
* **$P_i$:** This is *calculated* from the outputs of the learned gating network for the current batch. It's not a pre-defined value.
|
||||
|
||||
**Limitation ("Second Best" Scenario):**
|
||||
Even with this loss, an expert can remain imbalanced if it's consistently the "second best" option (high $P_i$) but never the *absolute top choice* that gets counted in $f_i$ (especially if $K=1$). This is because $f_i$ strictly counts hard assignments based on `argmax`. This limitation highlights why "soft" routing or "softmax after TopK" approaches can be more effective for truly even distribution.
|
||||
|
||||
### 3. Challenge: Overfitting during Fine-tuning
|
||||
|
||||
**The Problem:**
|
||||
Sparse MoE models, despite only activating a few experts per token, possess a very large total number of parameters. When fine-tuning these models on **smaller datasets**, they are highly prone to **overfitting**. The model's vast capacity allows it to memorize the limited fine-tuning data, leading to poor generalization performance on unseen validation data. This is evident when training loss continues to decrease, but validation loss stagnates or increases.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
* **Zoph et al. Solution – Fine-tune non-MoE MLPs:**
|
||||
* This strategy involves freezing a portion of the MoE model's parameters during fine-tuning, specifically the large expert weights.
|
||||
* Instead, only the "non-MoE" parameters (e.g., attention layers, adapter layers, or the gating network itself) are updated.
|
||||
* This reduces the effective number of trainable parameters during fine-tuning, thereby mitigating the risk of overfitting on small datasets. It assumes the experts are already well-pre-trained for general tasks.
|
||||
|
||||
* **DeepSeek Solution – Use Lots of Data (1.4M SFT):**
|
||||
* This approach tackles the problem by providing the model with a very large and diverse dataset for Supervised Fine-Tuning (SFT).
|
||||
* With abundant data (e.g., 1.4 million examples covering a wide range of tasks and languages), the model's large capacity can be effectively utilized for specialized learning rather than memorization. The diversity and volume of data prevent individual experts from overfitting to specific examples.
|
||||
|
||||
**Conclusion:**
|
||||
MoE models offer significant advantages in terms of model capacity and computational efficiency, but their unique sparse activation pattern introduces challenges in training and fine-tuning. Overcoming non-differentiability in routing and ensuring balanced expert utilization are crucial for effective pre-training. During fine-tuning, managing the model's vast parameter count to prevent overfitting on smaller datasets requires either strategic parameter freezing or access to very large and diverse fine-tuning data.
|
||||
The **Top-K routing** mechanism, as illustrated in the provided image, is a core component in many modern Mixture-of-Experts (MoE) models. It involves selecting a fixed number (`K`) of experts for each input based on relevance scores.
|
||||
|
||||
---
|
||||
|
||||
**Traditional Top-K (Deterministic Selection):**
|
||||
|
||||
* **How it works:**
|
||||
1. Calculate relevance scores (`s_{i,t}`) for each expert `i` and input `t`.
|
||||
2. Identify the `K` experts with the highest scores.
|
||||
3. Experts *within* the Top-K are assigned their scores (`g_{i,t} = s_{i,t}`).
|
||||
4. Experts *outside* the Top-K are assigned a score of `0` (`g_{i,t} = 0`).
|
||||
5. The output is a weighted sum of the selected experts' outputs.
|
||||
* **Pros:** Predictable, deterministic, selects the "best" experts based on current scores.
|
||||
* **Cons:** Can lead to expert imbalance, where a few popular experts are always chosen, starving others of training.
|
||||
|
||||
**Alternative: Sampling from Softmax (Probabilistic Selection):**
|
||||
|
||||
* **How it works:**
|
||||
1. Calculate relevance scores (`s_{i,t}`) which are treated as probabilities (after softmax).
|
||||
2. **Randomly sample** `K` unique expert indices from the distribution defined by these probabilities.
|
||||
3. Selected experts contribute; unselected experts do not.
|
||||
* **Why it's suggested:**
|
||||
* **Load Balancing:** Prevents expert collapse by ensuring all experts get a chance to be selected, even those with slightly lower scores. This promotes more even training across the entire expert pool.
|
||||
* **Diversity & Exploration:** Introduces randomness, potentially leading to better generalization and robustness by exploring different expert combinations.
|
||||
* **Pros:** Better load balancing, prevents expert starvation, encourages exploration.
|
||||
* **Cons:** Stochastic (non-deterministic routing), can make debugging harder, might not pick the absolute "best" expert in a single instance (but better for long-term training).
|
||||
|
||||
**Key Takeaway:** While deterministic Top-K is simpler and directly picks the "highest-scoring" experts, sampling from the softmax offers a more robust training dynamic by ensuring that all experts receive training data, thereby preventing some experts from becoming unused ("dead experts").
|
||||
|
||||
---
|
||||
@@ -1,135 +0,0 @@
|
||||
---
|
||||
title: "How I Got Open WebUI Talking to OpenAI Web Search"
|
||||
date: 2025-12-29
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
OpenAI promised native web search in GPT‑5, but LiteLLM proxy deployments (and by extension Open WebUI) still choke on it—issue [#13042](https://github.com/BerriAI/litellm/issues/13042) tracks the fallout. I needed grounded answers inside Open WebUI anyway, so I built a workaround: route GPT‑5 traffic through the Responses API and mask every `web_search_call` before the UI ever sees it.
|
||||
|
||||
This post documents the final setup, the hotfix script that keeps LiteLLM honest, and the tests that prove Open WebUI now streams cited answers without trying to execute the tool itself.
|
||||
|
||||
## Why Open WebUI Broke
|
||||
|
||||
1. **Wrong API surface.** `/v1/chat/completions` still rejects `type: "web_search"` with `Invalid value: 'web_search'. Supported values are: 'function' and 'custom'.`
|
||||
2. **LiteLLM tooling gap.** The OpenAI TypedDicts in `litellm/types/llms/openai.py` only allow `Literal["function"]`. Even if the backend call succeeded, streaming would crash when it saw a new tool type.
|
||||
3. **Open WebUI assumptions.** The UI eagerly parses every tool delta, so when LiteLLM streamed the raw `web_search_call` chunk, the UI tried to execute it, failed to parse the arguments, and aborted the chat.
|
||||
|
||||
Fixing all three required touching both the proxy configuration and the LiteLLM transformation path.
|
||||
|
||||
## Step 1 – Route GPT‑5 Through the Responses API
|
||||
|
||||
LiteLLM’s Responses bridge activates whenever the backend model name starts with `openai/responses/`. I added a dedicated alias, `gpt-5.2-search`, that hardcodes the Responses API plus web search metadata. Existing models (reasoning, embeddings, TTS) stay untouched.
|
||||
|
||||
```yaml
|
||||
# proxy-config.yaml (sanitized)
|
||||
model_list:
|
||||
- model_name: gpt-5.2-search
|
||||
litellm_params:
|
||||
model: openai/responses/openai/gpt-5.2
|
||||
api_key: <OPENAI_API_KEY>
|
||||
reasoning_effort: high
|
||||
merge_reasoning_content_in_choices: true
|
||||
tools:
|
||||
- type: web_search
|
||||
user_location:
|
||||
type: approximate
|
||||
country: US
|
||||
```
|
||||
|
||||
Any client (Open WebUI included) can now request `model: "gpt-5.2-search"` over the standard `/v1/chat/completions` endpoint, and LiteLLM handles the Responses API hop transparently.
|
||||
|
||||
## Step 2 – Mask `web_search_call` Chunks Inside LiteLLM
|
||||
|
||||
Even with the right API, LiteLLM still needs to stream deltas Open WebUI can digest. My [hotfix.py](https://ericxliu.me/hotfix.py) script copies the LiteLLM source into `/tmp/patch/litellm`, then rewrites two files. This script runs as part of the Helm release’s init hook so I can inject fixes directly into the container filesystem at pod start. That saves me from rebuilding and pushing new images every time LiteLLM upstream changes (or refuses a patch), which is critical while waiting for issue #13042 to land. I’ll try to upstream the fix, but this is admittedly hacky, so timelines are uncertain.
|
||||
|
||||
1. **`openai.py` TypedDicts**: extend the tool chunk definitions to accept `Literal["web_search"]`.
|
||||
2. **`litellm_responses_transformation/transformation.py`**: intercept every streaming item and short-circuit anything with `type == "web_search_call"`, returning an empty assistant delta instead of a tool call.
|
||||
|
||||
```python
|
||||
# Excerpt from hotfix.py
|
||||
tool_call_chunk_original = (
|
||||
'class ChatCompletionToolCallChunk(TypedDict): # result of /chat/completions call\n'
|
||||
' id: Optional[str]\n'
|
||||
' type: Literal["function"]'
|
||||
)
|
||||
tool_call_chunk_patch = tool_call_chunk_original.replace(
|
||||
'Literal["function"]', 'Literal["function", "web_search"]'
|
||||
)
|
||||
...
|
||||
if tool_call_chunk_original in content:
|
||||
content = content.replace(tool_call_chunk_original, tool_call_chunk_patch, 1)
|
||||
```
|
||||
|
||||
```python
|
||||
added_block = """ elif output_item.get("type") == "web_search_call":
|
||||
# Mask the call: Open WebUI should never see tool metadata
|
||||
action_payload = output_item.get("action")
|
||||
verbose_logger.debug(
|
||||
"Chat provider: masking web_search_call (added) call_id=%s action=%s",
|
||||
output_item.get("call_id"),
|
||||
action_payload,
|
||||
)
|
||||
return ModelResponseStream(
|
||||
choices=[
|
||||
StreamingChoices(
|
||||
index=0,
|
||||
delta=Delta(content=""),
|
||||
finish_reason=None,
|
||||
)
|
||||
]
|
||||
)
|
||||
"""
|
||||
```
|
||||
|
||||
These patches ensure LiteLLM never emits a `tool_calls` delta for `web_search`. Open WebUI only receives assistant text chunks, so it happily renders the model response and the inline citations the Responses API already provides.
|
||||
|
||||
## Step 3 – Prove It with cURL (and Open WebUI)
|
||||
|
||||
I keep a simple smoke test (`litellm_smoke_test.sh`) that hits the public ingress with and without streaming. The only secrets are placeholders here, but the structure is the same.
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
echo "Testing non-streaming..."
|
||||
curl "https://api.ericxliu.me/v1/chat/completions" \
|
||||
-H "Authorization: Bearer <LITELLM_MASTER_KEY>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-5.2-search",
|
||||
"messages": [{"role": "user", "content": "Find the sunset time in Tokyo today."}]
|
||||
}'
|
||||
|
||||
echo -e "\n\nTesting streaming..."
|
||||
curl "https://api.ericxliu.me/v1/chat/completions" \
|
||||
-H "Authorization: Bearer <LITELLM_MASTER_KEY>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-5.2-search",
|
||||
"stream": true,
|
||||
"messages": [{"role": "user", "content": "What is the weather in NYC right now?"}]
|
||||
}'
|
||||
```
|
||||
|
||||
Each request now returns grounded answers with citations (`url_citation` annotations) via Open WebUI, and the SSE feed never stalls because the UI isn’t asked to interpret tool calls.
|
||||
|
||||
## Lessons & Pitfalls
|
||||
|
||||
- **The Responses API is non-negotiable (and syntax-sensitive).** `/v1/chat/completions` still rejects `web_search`. Always test against `/v1/responses` directly before wiring LiteLLM into the loop. Furthermore, the syntax for `reasoning` is different: while Chat Completions uses the top-level `reasoning_effort` parameter, the Responses API requires a nested object: `"reasoning": {"effort": "medium"}`.
|
||||
- **The Native Model Trap.** Models like `gpt-5-search-api` exist and support web search via standard Chat Completions, but they are often less flexible—for instance, rejecting `reasoning_effort` entirely. Routing a standard model through LiteLLM's Responses bridge offers more control over formatting and fallbacks.
|
||||
- **Magic strings control routing.** LiteLLM has hardcoded logic (deep in `main.py`) that only triggers the Responses-to-Chat bridge if the backend model name starts with `openai/responses/`. Without that specific prefix, LiteLLM bypasses its internal transformation layer entirely, leading to cryptic 404s or "model not found" errors.
|
||||
- **Synthesized Sovereignty: The Call ID Crisis.** Open WebUI is a "well-behaved" OpenAI client, yet it often omits the `id` field in `tool_calls` when sending assistant messages back to the server. LiteLLM's Responses bridge initially exploded with a `KeyError: 'id'` because it assumed an ID would always be present. The fix: synthesizing predictable IDs like `auto_tool_call_N` on the fly to satisfy the server-side schema.
|
||||
- **The Argument Delta Void.** In streaming mode, the Responses API sometimes skips sending `response.function_call_arguments.delta` entirely if the query is simple. If the proxy only waits for deltas, the client receives an empty `{}` for tool arguments. The solution is to fallback and synthesize the `arguments` string from the `action` payload (e.g., `output_item['action']['query']`) when deltas are missing.
|
||||
- **Streaming State Machines are Fragile.** Open WebUI is highly sensitive to the exact state of a tool call. If it sees a `web_search_call` with `status: "in_progress"`, its internal parser chokes, assuming it's an uncompleted "function" call. These intermediate state chunks must be intercepted and handled before they reach the UI.
|
||||
- **Defensive Masking is the Final Boss.** To stop Open WebUI from entering an infinite client-side loop (thinking it needs to execute a tool it doesn't have), LiteLLM must "mask" the `web_search_call` chunks. By emitting empty content deltas instead of tool chunks, we hide the server-side search mechanics from the UI, allowing it to stay focused on the final answer.
|
||||
|
||||
With those guardrails in place, GPT‑5’s native web search works end-to-end inside Open WebUI, complete with citations, without waiting for LiteLLM upstream fixes.
|
||||
|
||||
## References
|
||||
|
||||
- [LiteLLM Documentation - OpenAI Responses API Bridge](https://docs.litellm.ai/docs/proxy/openai_responses)
|
||||
- [OpenAI Documentation - Responses API](https://platform.openai.com/docs/api-reference/responses)
|
||||
- [LiteLLM GitHub Issue #13042](https://github.com/BerriAI/litellm/issues/13042)
|
||||
- [Open WebUI Documentation](https://docs.openwebui.com/)
|
||||
- [The hotfix.py Script](https://ericxliu.me/hotfix.py)
|
||||
@@ -1,178 +0,0 @@
|
||||
---
|
||||
title: "OpenWrt: Fix WireGuard Connectivity with MWAN3 by Excluding the VPN Endpoint"
|
||||
date: 2025-09-28
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
### Overview
|
||||
|
||||
When using WireGuard together with MWAN3 on OpenWrt, the tunnel can fail to establish or flap when the peer's IP is routed into the tunnel itself. This is a classic routing bootstrap problem: WireGuard wants to route 0.0.0.0/0 into the tunnel, but the UDP packets to the peer's public endpoint also get captured, so they never reach the Internet to bring the tunnel up.
|
||||
|
||||
This article explains the symptoms, root cause, and a minimal, robust fix: add an MWAN3 rule that forces traffic to the WireGuard peer endpoint IP to go out via a physical WAN policy (not the tunnel). Optionally, assign a metric to the WireGuard interface to keep tables predictable.
|
||||
|
||||
### Environment
|
||||
|
||||
- OpenWrt 24.10.x
|
||||
- MWAN3 for multi-WAN policy routing
|
||||
- WireGuard interface configured with broad `allowed_ips` covering default route (`0.0.0.0/1` and `128.0.0.0/1` or `0.0.0.0/0`)
|
||||
|
||||
### Symptoms
|
||||
|
||||
- `wg show` indicates the interface is listening, but `transfer: 0 B received` persists after bringing the tunnel up.
|
||||
- Intermittent reachability to public IPs until routing settles.
|
||||
- `ip route` shows multiple defaults via WANs; a host route to the peer IP may exist but is still overridden by policy routing once MWAN3 applies rules.
|
||||
|
||||
Example observations (sanitized):
|
||||
|
||||
```bash
|
||||
wg show
|
||||
interface: wireguard
|
||||
public key: <redacted>
|
||||
listening port: 39345
|
||||
peer: <peer-public-key-redacted>
|
||||
endpoint: 203.0.113.55:51821
|
||||
allowed ips: 0.0.0.0/1, 128.0.0.0/1
|
||||
transfer: 0 B received, 5.6 KiB sent
|
||||
|
||||
ip route
|
||||
default via 192.0.2.1 dev wan0 proto static src 192.0.2.10
|
||||
default via 198.51.100.1 dev wan1 proto static src 198.51.100.10 metric 20
|
||||
203.0.113.55 via 198.51.100.1 dev wan1 proto static metric 20
|
||||
```
|
||||
|
||||
### Root Cause
|
||||
|
||||
With default-route `allowed_ips`, WireGuard installs routes so that all outbound traffic prefers the tunnel. MWAN3 then applies policy rules that also match “all traffic,” including the UDP packets to the WireGuard peer’s public IP. If those packets are selected to go via the `wireguard` interface (or a table whose default is the tunnel), the handshake cannot succeed. This creates a chicken-and-egg dependency.
|
||||
|
||||
### Fix: Exclude the WireGuard Endpoint from MWAN3 Default Policy
|
||||
|
||||
Force traffic to the WireGuard peer public endpoint to use a physical WAN policy. This guarantees the handshake packets always reach the Internet outside of the tunnel.
|
||||
|
||||
Steps:
|
||||
|
||||
1) Resolve the peer endpoint IP (if you only have a hostname)
|
||||
|
||||
```bash
|
||||
nslookup vpn.example.com
|
||||
# => use the returned A/AAAA address(es) in the rule below
|
||||
```
|
||||
|
||||
2) Add an MWAN3 rule targeting the endpoint IP
|
||||
|
||||
Edit `/etc/config/mwan3` and place this rule before the default v4 rule so it takes precedence:
|
||||
|
||||
```conf
|
||||
config rule 'wireguard_endpoint'
|
||||
option dest_ip '203.0.113.55' # peer public IP
|
||||
option proto 'udp'
|
||||
option use_policy 'wan_only' # a policy that prefers a physical WAN
|
||||
option family 'ipv4'
|
||||
```
|
||||
|
||||
Notes:
|
||||
- Use the actual public IP of your WireGuard server. MWAN3 rules match IPs, not hostnames.
|
||||
- If you have multiple WAN policies (e.g., `wan_only`, `wphone_only`), choose the one that must carry the VPN handshake.
|
||||
|
||||
3) (Optional) Assign a metric on the WireGuard interface
|
||||
|
||||
This is not strictly required for the fix but keeps routing behavior deterministic when multiple defaults exist.
|
||||
|
||||
Edit `/etc/config/network`:
|
||||
|
||||
```conf
|
||||
config interface 'wireguard'
|
||||
option proto 'wireguard'
|
||||
option private_key '<redacted>'
|
||||
list addresses '192.168.3.2/32'
|
||||
option metric '5'
|
||||
```
|
||||
|
||||
4) Apply changes
|
||||
|
||||
```bash
|
||||
/etc/init.d/network restart && /etc/init.d/mwan3 restart
|
||||
```
|
||||
|
||||
### Validation
|
||||
|
||||
Confirm that the endpoint is routed via a physical WAN and that the tunnel is passing traffic.
|
||||
|
||||
```bash
|
||||
# Verify policy routing for the endpoint
|
||||
ip route get 203.0.113.55
|
||||
|
||||
# MWAN3 status should show your WANs online
|
||||
mwan3 status | sed -n '1,120p'
|
||||
|
||||
# WireGuard should show RX increasing after a few seconds
|
||||
wg show
|
||||
```
|
||||
|
||||
Expected results:
|
||||
- `ip route get <peer-ip>` resolves to a physical WAN device/policy, not the `wireguard` interface.
|
||||
- `wg show` shows non-zero bytes received and a recent handshake time.
|
||||
|
||||
### Operational Considerations
|
||||
|
||||
- Endpoint IP changes: If the server endpoint is behind DDNS, you must update the rule when its IP changes. Options include:
|
||||
- Use a small script triggered by DDNS updates to modify the MWAN3 rule and reload.
|
||||
- Maintain an IP set and populate it from DDNS; match the set in firewall/PBR and keep MWAN3 in sync.
|
||||
- IPv6: Repeat the approach with an IPv6 rule if your peer uses IPv6. Ensure `family 'ipv6'` and the correct policy are set.
|
||||
- Multiple peers: Create one rule per peer endpoint IP.
|
||||
- Ordering: Keep the endpoint rule above broad default rules so it always wins.
|
||||
|
||||
### Minimal Example Config Snippets (Sanitized)
|
||||
|
||||
`/etc/config/network` (relevant parts):
|
||||
|
||||
```conf
|
||||
config interface 'wireguard'
|
||||
option proto 'wireguard'
|
||||
option private_key '<redacted>'
|
||||
list addresses '192.168.3.2/32'
|
||||
option metric '5'
|
||||
|
||||
config wireguard_wireguard
|
||||
option description 'peer-1'
|
||||
option public_key '<peer-public-key-redacted>'
|
||||
option endpoint_host 'vpn.example.com'
|
||||
option endpoint_port '51821'
|
||||
list allowed_ips '0.0.0.0/1'
|
||||
list allowed_ips '128.0.0.0/1'
|
||||
option route_allowed_ips '1'
|
||||
```
|
||||
|
||||
`/etc/config/mwan3` (relevant parts):
|
||||
|
||||
```conf
|
||||
config policy 'wan_only'
|
||||
list use_member 'wan_m1_w3'
|
||||
option last_resort 'unreachable'
|
||||
|
||||
config rule 'wireguard_endpoint'
|
||||
option dest_ip '203.0.113.55'
|
||||
option proto 'udp'
|
||||
option use_policy 'wan_only'
|
||||
option family 'ipv4'
|
||||
|
||||
config rule 'default_rule_v4'
|
||||
option dest_ip '0.0.0.0/0'
|
||||
option use_policy 'wan_only'
|
||||
option family 'ipv4'
|
||||
option proto 'all'
|
||||
option sticky '0'
|
||||
```
|
||||
|
||||
### Why This Works
|
||||
|
||||
The explicit MWAN3 rule ensures that traffic to the peer’s public IP bypasses any routes that prefer the tunnel. This breaks the bootstrap loop and guarantees handshake packets traverse a real WAN uplink. Once the tunnel is established, the broad `allowed_ips` continue to route general traffic through WireGuard as intended.
|
||||
|
||||
### References
|
||||
|
||||
- Session log and configs (internal): `~/Downloads/chat-MWAN3 WireGuard Routing Fix 🌐.txt`
|
||||
- OpenWrt MWAN3 documentation: `https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3`
|
||||
- WireGuard documentation: `https://www.wireguard.com/`
|
||||
- OpenWrt WireGuard (UG): `https://openwrt.org/docs/guide-user/services/vpn/wireguard`
|
||||
|
||||
|
||||
@@ -1,107 +0,0 @@
|
||||
---
|
||||
title: "A Deep Dive into PPO for Language Models"
|
||||
date: 2025-08-02
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don't inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
|
||||
|
||||
You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.
|
||||

|
||||
|
||||
This post will decode that diagram, piece by piece. We'll explore the "why" behind each component, moving from high-level concepts to the deep technical reasoning that makes this process work.
|
||||
|
||||
### Translating RL to a Conversation
|
||||
|
||||
The first step is to understand how the traditional language of reinforcement learning maps to the world of text generation.
|
||||
|
||||
* **State (`s_t`)**: In a chat setting, the "state" is the context of the conversation so far. It's the initial prompt (`x`) plus all the text the model has generated up to the current moment (`y₁, ..., y_{t-1}`).
|
||||
* **Action (`a_t`)**: The "action" is the model's decision at each step. For an LLM, this means generating the very next token (`y_t`). A full response is a sequence of these actions.blob:https://aistudio.google.com/872e746f-88c1-40ec-8e45-fa0efce97299
|
||||
* **Reward (`r`)**: The "reward" is a numeric score that tells the model how good its full response (`y`) was. This score comes from a separate **Reward Model**, which has been trained on a large dataset of human preference comparisons (e.g., humans rating which of two responses is better). This reward is often only awarded at the end of the entire generated sequence.
|
||||
|
||||
Let's make this concrete. If a user provides the prompt **(x)**: *"The best thing about AI is"*, and the model generates the response **(y)**: *"its potential to solve problems."*, here is how it's broken down for training:
|
||||
|
||||
* **State 1**: "The best thing about AI is"
|
||||
* **Action 1**: "its"
|
||||
* **State 2**: "The best thing about AI is its"
|
||||
* **Action 2**: " potential"
|
||||
* **State 3**: "The best thing about AI is its potential"
|
||||
* **Action 3**: " to"
|
||||
* ...and so on for every generated token.
|
||||
|
||||
This breakdown transforms a single prompt-response pair into a rich trajectory of state-action pairs, which becomes the raw data for our learning algorithm.
|
||||
|
||||
### The Cast of Models: An Actor-Critic Ensemble
|
||||
|
||||
The PPO process doesn't rely on a single model but an ensemble where each member has a distinct role.
|
||||
|
||||
1. **The Actor (Policy LM)**: This is the star of the show—the LLM we are actively fine-tuning. Its role is to take a state (the current text) and decide on an action (the next token). We refer to its decision-making process as its "policy" (`π`).
|
||||
2. **The Critic (Value Model)**: This is the Actor's coach. The Critic doesn't generate text. Instead, it observes a state and estimates the *potential future reward* the Actor is likely to receive from that point onward. This estimate is called the "value" (`V(s_t)`). The Critic's feedback helps the Actor understand whether it's in a promising or a dead-end situation, which is a much more immediate learning signal than waiting for the final reward.
|
||||
3. **The Reward Model**: This is the ultimate judge. As mentioned, it's a separate model trained on human preference data that provides the final score for a complete generation. Its judgment is treated as the ground truth for training both the Actor and the Critic.
|
||||
|
||||
### The Challenge of Credit Assignment: Generalized Advantage Estimation (GAE)
|
||||
|
||||
A key problem in RL is assigning credit. If a 20-token response gets a high reward, was it because of the first token, the last one, or all of them? The Critic helps solve this. By comparing the reward at each step with the Critic's value estimate, we can calculate the **Advantage (`Â`)**.
|
||||
|
||||
A simple advantage calculation might be: `Advantage = reward + Value_of_next_state - Value_of_current_state`.
|
||||
|
||||
However, this can be noisy. PPO uses a more sophisticated technique called **Generalized Advantage Estimation (GAE)**. The formula looks complex, but the idea is intuitive:
|
||||
|
||||
`Â(s_t, a_t) = Σ(γλ)^l * δ_{t+l}`
|
||||
where `δ_t = r_t + γV(s_{t+1}) - V(s_t)`
|
||||
|
||||
* **γ (gamma)** is a discount factor (e.g., 0.99), which values immediate rewards slightly more than distant ones.
|
||||
* **λ (lambda)** is a smoothing parameter that balances the trade-off between bias and variance. It creates a weighted average of advantages over multiple future time steps.
|
||||
|
||||
In essence, GAE provides a more stable and accurate estimate of how much better a specific action was compared to the policy's average behavior in that state.
|
||||
|
||||
### The Heart of PPO: The Quest for Stable Updates
|
||||
|
||||
Now we arrive at the core innovation of PPO. We want to update our Actor model to take actions with higher advantages. The naive way to do this is to re-weight our training objective by an **importance sampling ratio**: `(π_new / π_old)`. This corrects for the fact that the data we are learning from was generated by a slightly older version of our policy.
|
||||
|
||||
However, this ratio is incredibly dangerous. If the new policy becomes very different from the old one, the ratio can explode, leading to massive, unstable gradient updates that destroy the model.
|
||||
|
||||
PPO solves this with its signature **Clipped Surrogate Objective**. The PPO loss function is:
|
||||
|
||||
`L_CLIP(θ) = Ê_t [ min( r_t(θ)Â_t, clip(r_t(θ), 1 - ε, 1 + ε)Â_t ) ]`
|
||||
|
||||
Let's translate this from math to English:
|
||||
* `r_t(θ)` is the probability ratio `π_new(a_t|s_t) / π_old(a_t|s_t)`.
|
||||
* The goal is to increase the objective by an amount proportional to the advantage `Â_t`.
|
||||
* **The `clip` function is the crucial safeguard.** It forbids the probability ratio from moving outside a small window (e.g., `[0.8, 1.2]`).
|
||||
|
||||
This means the algorithm says: "Let's update our policy to favor this good action. But if the required update would change the policy too drastically from the old one, we'll 'clip' the update to a more modest size." This creates a "trust region," ensuring stable, incremental improvements.
|
||||
|
||||
### Avoiding Amnesia: The Pretraining Loss
|
||||
|
||||
There's one final problem. If we only optimize for the PPO loss, the model might learn to "hack" the reward model by generating repetitive or nonsensical text that gets a high score. In doing so, it could suffer from **catastrophic forgetting**, losing its fundamental grasp of grammar and facts.
|
||||
|
||||
To prevent this, we introduce a second loss term. As seen in the diagram, we mix in data from the original **Pretraining Data** (or the dataset used for Supervised Fine-Tuning). We calculate a standard next-token prediction loss (`LM Loss`) on this high-quality data.
|
||||
|
||||
The final loss for the Actor is a combination of both objectives:
|
||||
|
||||
**Total Loss = Loss_PPO + `λ_ptx` * Loss_LM**
|
||||
|
||||
This brilliantly balances two goals:
|
||||
1. The `Loss_PPO` pushes the model towards behaviors that align with human preferences.
|
||||
2. The `Loss_LM` acts as a regularizer, pulling the model back towards its core language capabilities and preventing it from drifting into gibberish.
|
||||
|
||||
### The Full Training Loop
|
||||
|
||||
Now, we can assemble the entire process into a clear, iterative loop:
|
||||
|
||||
1. **Collect**: The current Actor policy `π_k` generates responses to a batch of prompts. These experiences—`(state, action, probability, reward, value)`—are stored in an **Experience Buffer**.
|
||||
2. **Calculate**: Once the buffer is full, we use the collected data to compute the advantage estimates `Â_t` for every single token-generation step.
|
||||
3. **Optimize**: For a few epochs, we repeatedly sample mini-batches from the buffer and update the Actor and Critic models. The Actor is updated using the combined `PPO-clip Loss` and `LM Loss`. The Critic is updated to improve its value predictions.
|
||||
4. **Flush and Repeat**: After the optimization phase, the entire experience buffer is discarded. The data is now "stale" because our policy has changed. The newly updated policy `π_{k+1}` becomes the new Actor, and we return to step 1 to collect fresh data.
|
||||
|
||||
This cycle of collection and optimization allows the language model to gradually and safely steer its behavior towards human-defined goals, creating the helpful and aligned AI assistants we interact with today.
|
||||
|
||||
***
|
||||
|
||||
**References:**
|
||||
|
||||
1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). *Proximal Policy Optimization Algorithms*. arXiv preprint arXiv:1707.06347.
|
||||
2. Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). *High-Dimensional Continuous Control Using Generalized Advantage Estimation*. arXiv preprint arXiv:1506.02438.
|
||||
3. Ouyang, L., et al. (2022). *Training language models to follow instructions with human feedback*. Advances in Neural Information Processing Systems 35.
|
||||
@@ -1,118 +0,0 @@
|
||||
---
|
||||
title: "Quantization in LLMs"
|
||||
date: 2025-08-19
|
||||
draft: false
|
||||
---
|
||||
|
||||
The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.
|
||||
|
||||
**The Fundamentals of Quantization**
|
||||
|
||||
At its core, quantization seeks to represent model weights and activations using fewer bits. Three primary approaches form the theoretical foundation:
|
||||
|
||||
1. **K-Means-based Quantization (Non-uniform):** This method clusters floating-point weights into a predefined number of groups. Each weight is then replaced by the centroid of its assigned cluster. While effective for storage compression by storing a small "codebook" of centroids and integer indices, its direct computational benefits during inference are limited unless specialized hardware for lookup tables is employed.
|
||||
|
||||
2. **Linear (Affine) Quantization:** The most prevalent form, linear quantization maps a floating-point range to a fixed integer range using a simple linear transformation: `r = S * (q - Z)`. Here, `r` is the real value, `q` is the quantized integer, `S` is the scale factor, and `Z` is the zero-point (offset). This approach directly enables integer arithmetic, which is significantly faster and more energy-efficient on modern hardware.
|
||||
|
||||
3. **Binary and Ternary Quantization (Extreme Low-Bit):** These push quantization to its limits by constraining weights and/or activations to only two (e.g., +1, -1) or three (e.g., +1, 0, -1) values. While offering maximal compression and enabling bitwise operations instead of multiplications, they often incur substantial accuracy degradation for complex LLMs. For instance, BinaryConnect enabled training deep neural networks with binary weights, showing near state-of-the-art results on image classification tasks. XNOR-Net further extended this by binarizing both weights and inputs, achieving significant speedups and memory savings. Ternary Weight Networks (TWNs) and Trained Ternary Quantization (TTQ) improve upon binary methods by introducing a zero value or learnable scaling factors, respectively, mitigating some accuracy loss.
|
||||
|
||||
**Quantization Strategies: Bridging Accuracy and Efficiency**
|
||||
|
||||
The practical application of quantization involves distinct strategies:
|
||||
|
||||
1. **Post-Training Quantization (PTQ):** This approach applies quantization to an already trained, full-precision model without any further training or fine-tuning.
|
||||
* **Quantization Granularity:** The precision of quantization can vary across a model.
|
||||
* **Per-Tensor Quantization** applies a single scale and zero-point to an entire tensor.
|
||||
* **Per-Channel Quantization** assigns unique scale and zero-point parameters to each output channel of a layer, crucial for handling diverse value distributions.
|
||||
* **Group Quantization** provides an intermediate granularity, where scales and zero-points are applied to smaller groups of weights within a channel or layer. This balances fine-grained control with hardware efficiency.
|
||||
* **Dynamic Range Clipping (Calibration):** A critical aspect of PTQ is determining the optimal range (`r_min`, `r_max`) for quantization, especially for activations, which often exhibit outliers. Methods include:
|
||||
* **Min-Max:** Simply using the observed minimum and maximum values.
|
||||
* **Exponential Moving Averages (EMA):** Tracking ranges using a smoothed average during a calibration run.
|
||||
* **Kullback-Leibler (KL) Divergence Minimization:** Selecting clipping thresholds that minimize the information loss between the original and quantized distributions.
|
||||
* **Mean Square Error (MSE) Minimization:** Optimizing scale and zero-point parameters to minimize the reconstruction error. Adaptive rounding techniques, such as AdaRound, further refine this by optimizing rounding decisions for individual weights.
|
||||
|
||||
2. **Quantization-Aware Training (QAT):** This method integrates the quantization process directly into the training or fine-tuning loop. By simulating the effects of low-precision arithmetic during training, the model learns to be robust to quantization noise. The **Straight-Through Estimator (STE)** is commonly used to approximate gradients for the non-differentiable quantization operations, enabling backpropagation. QAT generally yields higher accuracy than PTQ, particularly for aggressive low-bit quantization.
|
||||
|
||||
**Emerging Techniques for Modern LLMs**
|
||||
|
||||
The scale and complexity of LLMs necessitate advanced quantization strategies:
|
||||
|
||||
1. **One-Shot Post-Training Quantization (e.g., GPTQ, AWQ):** These techniques aim to achieve near-QAT accuracy with PTQ's convenience, requiring only a small, unlabelled calibration dataset and no full retraining. GPTQ quantizes weights layer-by-layer by minimizing output MSE, leveraging Hessian-aware information. AWQ identifies and scales "important" weights based on activation magnitudes before quantization. These methods have been instrumental in enabling 4-bit LLM inference on consumer-grade hardware.
|
||||
|
||||
2. **Sparsity-Quantization Hybrid (e.g., SpQR):** These approaches combine model pruning (removing redundant connections) with quantization to achieve even greater compression. SpQR prunes weights and then quantizes the remaining non-zero weights, often with special handling for critical outlier weights.
|
||||
|
||||
3. **Quantization for Efficient Fine-tuning (e.g., QLoRA):** QLoRA quantizes the base LLM weights (e.g., to 4-bit) and freezes them, then fine-tunes only small, low-rank adapter modules in full precision. This drastically reduces the memory requirements for fine-tuning large models on limited hardware.
|
||||
|
||||
4. **Hardware-Optimized Quantization Formats:** Beyond bit-width, specialized floating-point formats and efficient kernels are being developed. MXFP4 (Microscaling FP4), NVIDIA's FP8 (E4M3/E5M2), and GGUF's K-quants are examples of block-wise floating-point formats and hierarchical quantization schemes optimized for high performance on modern accelerators like NVIDIA's Blackwell GPUs. These formats offer superior dynamic range compared to fixed-point integers at very low bit-widths.
|
||||
|
||||
**Multi-Level Scaling in Group Quantization: A Deeper Dive**
|
||||
|
||||
Modern group quantization approaches often employ multi-level scaling to achieve an optimal balance between precision and compression. Consider a generalized formula for reconstructing a real value `r` from a quantized value `q`:
|
||||
|
||||
`r = (q - z) * s_l0 * s_l1 * ...`
|
||||
|
||||
where `z` is the zero-point (often 0 for symmetric quantization), and `s_l0`, `s_l1` are scale factors at different hierarchical levels. The "Effective Bit Width" reflects the average number of bits per weight after accounting for both the quantized value and its associated scales.
|
||||
|
||||
Let's dissect a representative table of such schemes:
|
||||
|
||||
| Quantization Approach | Data Type (q) | L0 Group Size | L0 Scale Data Type | L1 Group Size | L1 Scale Data Type | Effective Bit Width |
|
||||
| :-------------------- | :------------ | :-------------- | :----------------- | :-------------- | :----------------- | :------------------ |
|
||||
| Per-Channel Quant | INT4 | Per Channel | FP16 | - | - | 4 |
|
||||
| VSQ | INT4 | 16 | UINT4 | Per Channel | FP16 | 4 + 4/16 = 4.25 |
|
||||
| MX4 | S1M2 | 2 | E1M0 | 16 | E8M0 | 3 + 1/2 + 8/16 = 4 |
|
||||
| MX6 | S1M4 | 2 | E1M0 | 16 | E8M0 | 5 + 1/2 + 8/16 = 6 |
|
||||
| MX9 | S1M7 | 2 | E1M0 | 16 | E8M0 | 8 + 1/2 + 8/16 = 9 |
|
||||
|
||||
* **Data Types Explanation:**
|
||||
* `INT4`: Standard 4-bit integer.
|
||||
* `UINT4`: 4-bit *unsigned* integer.
|
||||
* `FP16`: 16-bit floating-point number.
|
||||
* `S1M2`: A custom 3-bit floating-point-like format (1 sign bit, 2 mantissa bits), with its exponent effectively derived from shared scales.
|
||||
* `S1M4`: A custom 5-bit format (1 sign bit, 4 mantissa bits).
|
||||
* `S1M7`: A custom 8-bit format (1 sign bit, 7 mantissa bits).
|
||||
* `E1M0`: A custom 1-bit exponent-only floating-point scale (1 exponent bit, 0 mantissa bits).
|
||||
* `E8M0`: A custom 8-bit exponent-only floating-point scale (8 exponent bits, 0 mantissa bits).
|
||||
|
||||
* **Row-by-Row Analysis:**
|
||||
1. **Per-Channel Quant:** This represents a baseline. Each individual value (`q`) is stored as a 4-bit integer. A single 16-bit FP16 scale (`s_l0`) is applied *per channel*. Since a channel contains many weights, the overhead of the 16-bit scale is amortized, making the effective bit width approximately 4 bits per weight.
|
||||
2. **VSQ (Per-Vector Scaled Quantization):** This scheme introduces a two-level scaling hierarchy. The core quantized value (`q`) is a 4-bit integer. A finer-grained 4-bit unsigned integer scale (`s_l0` in `UINT4`) is applied to groups of 16 quantized values. A coarser 16-bit FP16 scale (`s_l1`) is applied per channel. The effective bit width is calculated as: (4 bits for `q`) + (4 bits for `s_l0` / 16 elements) = 4 + 0.25 = 4.25 bits/weight. The `FP16 s_l1` scale overhead per channel is negligible, hence not included in the fraction.
|
||||
3. **MX4 (Mixed-Precision with Microexponents, 4-bit effective):** This is a key example of specialized floating-point quantization. The base quantized value (`q`) uses a compact 3-bit `S1M2` format. A 1-bit `E1M0` scale (`s_l0`) is applied to very small groups of 2 `q` values. A coarser 8-bit `E8M0` scale (`s_l1`) is applied to groups of 16 `q` values. The effective bit width is: (3 bits for `q`) + (1 bit for `s_l0` / 2 elements) + (8 bits for `s_l1` / 16 elements) = 3 + 0.5 + 0.5 = 4 bits/weight. This allows for a wider dynamic range, typical of floating-point numbers, while maintaining a very low average bit-width.
|
||||
4. **MX6:** Similar to MX4, but uses a 5-bit `S1M4` format for `q`. The effective bit width becomes: 5 + 0.5 + 0.5 = 6 bits/weight, offering higher precision at the cost of slight increase in size.
|
||||
5. **MX9:** Uses an 8-bit `S1M7` format for `q`. The effective bit width is: 8 + 0.5 + 0.5 = 9 bits/weight, providing near-INT8 precision while retaining the floating-point-like dynamic range benefits.
|
||||
|
||||
These multi-level, mixed-precision, floating-point quantization schemes represent a significant advancement, enabling LLMs to run efficiently on diverse hardware while maintaining high accuracy, especially for managing the ubiquitous outlier values in LLM activations and weights.
|
||||
|
||||
**Current Trends and Future Outlook**
|
||||
|
||||
The field of LLM quantization is characterized by rapid innovation.
|
||||
* **Linear (Affine) Quantization** remains the foundational principle, with most advancements focusing on refining its application.
|
||||
* **Per-channel** and especially **Group/Block-wise Quantization** are indispensable for LLMs due to their heterogeneous weight distributions.
|
||||
* **Post-Training Quantization (PTQ)**, particularly advanced one-shot methods like GPTQ and AWQ, are highly relevant for efficient deployment of LLMs without the extensive resources required for QAT.
|
||||
* **Quantization-Aware Training (QAT)** is the benchmark for achieving peak accuracy at very low bit-widths, particularly when PTQ falls short.
|
||||
* **Mixed-Precision Quantization** is crucial for balancing accuracy and efficiency across the massive, varying layers of LLMs.
|
||||
* **Hardware-optimized quantization formats** (like MXFP4, FP8) represent a significant step towards co-designing models and silicon for maximum performance.
|
||||
|
||||
Conversely, methods like pure K-means quantization (where computation requires fetching float centroids) and general-purpose pure binary/ternary quantization are less commonly adopted as primary strategies for high-accuracy LLM inference, primarily due to the greater accuracy challenges and lack of widespread hardware acceleration for these specific paradigms compared to optimized integer or block-floating-point operations. The trajectory indicates a continuous push for lower effective bit-widths, driven by clever scaling strategies, specialized data formats, and a hardware-aware approach to model optimization.
|
||||
|
||||
---
|
||||
**References**
|
||||
|
||||
Courbariaux, M., Bengio, Y., & David, J. P. (2015). BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. *NeurIPS Proceedings*.
|
||||
|
||||
Dai, S., Venkatesan, R., Ren, H., Zimmer, B., Dally, W. J., & Khailany, B. (2021). VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. *arXiv preprint arXiv:2102.04503*.
|
||||
|
||||
Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. *European Conference on Computer Vision (ECCV)*.
|
||||
|
||||
Zhu, C., Han, S., Mao, H., & Dally, W. J. (2017). Trained Ternary Quantization. *International Conference on Learning Representations (ICLR)*.
|
||||
|
||||
Migacz, S. (2017). 8-bit Inference with TensorRT. *NVIDIA GTC Presentation*.
|
||||
|
||||
Krishnamoorthi, R. (2018). Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. *arXiv preprint arXiv:1806.08342*.
|
||||
|
||||
Li, F., Liu, B., Wang, X., Zhang, B., & Yan, J. (2016). Ternary Weight Networks. *arXiv preprint arXiv:1605.04711*.
|
||||
|
||||
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*.
|
||||
|
||||
Nagel, M., van Baalen, T., Blankevoort, T., & Louizos, C. (2019). Data-Free Quantization Through Weight Equalization and Bias Correction. *Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)*.
|
||||
|
||||
Han, S., Mao, H
|
||||
@@ -1,70 +0,0 @@
|
||||
---
|
||||
title: "How I Built a Blog Agent that Writes About Itself"
|
||||
date: 2026-01-16
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
I've been spending a lot of time "vibe coding" in the Antigravity IDE lately. It's an incredible flow state—intense, iterative, and fast. But it has a major flaw: the context is ephemeral. Once the session is over, that rich history of decisions, wrong turns, and "aha!" moments is locked away in an opaque, internal format.
|
||||
|
||||
I wanted to capture that value. I wanted a system that could take my chaotic coding sessions and distill them into structured, technical blog posts (like the one you're reading right now).
|
||||
|
||||
But getting the data out turned into a much deeper rabbit hole than I expected.
|
||||
|
||||
## The Challenge: Check the Database?
|
||||
My first instinct was simple: It's an Electron app, so there's probably a SQLite database.
|
||||
|
||||
I found it easily enough at `~/Library/Application Support/Antigravity/User/globalStorage/state.vscdb`. But when I opened it up, I hit a wall. The data wasn't plain text; it was stored in the `ItemTable` under keys like `antigravityUnifiedStateSync.trajectorySummaries` as Base64-encoded strings.
|
||||
|
||||
Decoding them revealed raw Protobuf wire formats, not JSON.
|
||||
|
||||
### The "Wire-Walking" Dead End
|
||||
I spent a few hours writing a Python script to "wire-walk" the Protobuf data without a schema. I managed to extract some human-readable strings, but it was a mess:
|
||||
1. **Missing Context**: I got fragments of text, but the user prompts and cohesive flow were gone.
|
||||
2. **Encryption**: The actual conversation files (ending in `.pb`) in `~/.gemini/antigravity/conversations/` were encrypted.
|
||||
|
||||
It turns out Antigravity uses Electron’s `safeStorage` API, which interfaces directly with the macOS Keychain. Without the app's private key (which is hardware-bound), that data is effectively random noise. I even tried using Frida to hook `safeStorage.decryptString()`, but macOS SIP (System Integrity Protection) and code signing shut that down immediately.
|
||||
|
||||
I was stuck. I couldn't decrypt the local files, and I couldn't parse the database effectively.
|
||||
|
||||
## The Breakthrough: Living Off the Land
|
||||
When you can't break the front door, look for the side entrance. I realized I wasn't the only one trying to read this state—the official extensions had to do it too.
|
||||
|
||||
I started poking around the source code of the `vscode-antigravity-cockpit` extension, specifically a file named `local_auth_importer.ts`. That's where I found the golden ticket.
|
||||
|
||||
The extension *doesn't* decrypt the local files. Instead, it reads a specific key from the SQLite database: `jetskiStateSync.agentManagerInitState`.
|
||||
|
||||
When I decoded field #6 of this Protobuf structure, I found an `OAuthTokenInfo` message. It contained the user’s active `accessToken` and `refreshToken`.
|
||||
|
||||
### Shifting Strategy: Don't Crack it, Join it
|
||||
This changed everything. I didn't need to reverse-engineer the local storage encryption; I just needed to impersonate the IDE.
|
||||
|
||||
By "piggybacking" on this existing auth mechanism, I could extract a valid OAuth token directly from the local state. But I still needed the endpoints.
|
||||
|
||||
Instead of guessing, I opened the **Developer Tools** inside Antigravity itself (it is Electron, after all). By enabling the Chrome network tracing tools and triggering an export manually, I caught the request in the act.
|
||||
|
||||
I saw the exact call to `exa.language_server_pb.LanguageServerService/ConvertTrajectoryToMarkdown`.
|
||||
|
||||
It was perfect. By sending a gRPC-over-HTTP request to this endpoint using the stolen token, the server—which *does* have access to the unencrypted history—returned a perfectly formatted Markdown document of my entire coding session.
|
||||
|
||||
## The Architecture: The Blog-Agent
|
||||
Once I had the data extraction solved, building the rest of the "blog-agent" was straightforward. I built a **Node.js** stack to automate the pipeline:
|
||||
|
||||
* **Backend**: An **Express** server handles the routing, session imports, and post generation.
|
||||
* **Frontend**: A clean **EJS** interface to list sessions, view summaries, and "publish" them to the filesystem.
|
||||
* **Storage**: A local SQLite database (`data/sessions.sqlite`) acts as a cache. (I learned my lesson: always cache your LLM inputs).
|
||||
* **The Brain**: I use the **OpenAI SDK** (pointing to a LiteLLM proxy) to interface with `gemini-3-flash`. I wrote a map-reduce style prompt that first extracts technical decisions from the raw conversation log, then synthesizes them into a narrative.
|
||||
* **Persistence**: The final posts are saved with YAML front matter into a `generated_posts/` directory.
|
||||
|
||||
## Key Insights
|
||||
|
||||
* **Don't Fight the OS**: Trying to break macOS Keychain/SIP encryption is a losing battle for a weekend project.
|
||||
* **Follow the Tokens**: Applications often store auth tokens in less-secure places (like plain SQLite or weaker encryption) than the user content itself.
|
||||
* **Extensions are Open Books**: If an app has extensions, their source code is often the best documentation for the internal API.
|
||||
|
||||
In a satisfying detailed loop, **this very article was generated by the blog-agent itself**, analyzing the "vibe coding" session where I built it.
|
||||
|
||||
## References
|
||||
- `server.js`: The Express server and API implementation.
|
||||
- `services/antigravity.js`: The client for the Antigravity gRPC-over-HTTP API.
|
||||
- [vscode-antigravity-cockpit](https://github.com/jlcodes99/vscode-antigravity-cockpit): The extension that leaked the auth logic.
|
||||
@@ -1,93 +0,0 @@
|
||||
---
|
||||
title: "Why I Downgraded Magisk to Root My Pixel 2 XL"
|
||||
date: 2026-01-07
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
For the past few weeks, I've been stuck in a stalemate with my EcoFlow Bluetooth Protocol Reverse Engineering Project. I have the hci snoop logs, I have the decompiled APK, and I have a strong suspicion about where the authentication logic is hiding. But suspicion isn't proof.
|
||||
|
||||
Static analysis has its limits. I found the "smoking gun" function—a native method responsible for encrypting the login payload—but understanding *how* it constructs that payload within a strict 13-byte limit purely from assembly (ARM64) was proving to be a headache.
|
||||
|
||||
I needed to move from **static analysis** to **dynamic analysis**. I needed to hook the function at runtime, inspect the memory, and see the data before it gets encrypted. To do that, I needed a rooted Android device.
|
||||
|
||||
The only candidate in my drawer? An 8-year-old **Google Pixel 2 XL ("taimen")** that hadn't been turned on since 2017.
|
||||
|
||||
## The Objective
|
||||
Bring this relic back to life, update it to the final official firmware, and gain `su` access to install Frida and tcpdump. It sounds simple, but 2026 tools don't always play nice with 2017 hardware.
|
||||
|
||||
## Phase 1: The "I Forgot My Password" Hurdle
|
||||
|
||||
The first problem was mundane: I didn't remember the PIN. My only way in was a physical **Hard Reset**, which relies on a specific sequence of hardware button inputs:
|
||||
|
||||
1. **Fastboot Mode**: Hold `Power` + `Vol Down` until the familiar bootloader screen appears.
|
||||
2. **Recovery Mode**: Use volume keys to select "Recovery Mode".
|
||||
3. **The "No Command" Trick**: The phone reboots to a broken android logo. To get the actual menu, you have to hold `Power` and tap `Vol Up` *once*.
|
||||
4. **Wipe**: Select `Wipe data/factory reset`.
|
||||
|
||||
**The Catch**: This triggers **Factory Reset Protection (FRP)**. Upon boot, the device required authentication with the Google Account previously synced to the hardware. Since I verified my identity using the original credentials, I could proceed; otherwise, bypassing this security feature would have been a significant roadblock.
|
||||
|
||||
## Phase 2: The Update Trap
|
||||
|
||||
Once in, I checked the version: `Android 10 (QP1A.190711.020)`. This was ancient. The Pixel 2 XL officially supports Android 11, and I wanted the latest possible base for compatibility with modern tools.
|
||||
|
||||
I tried the easy route: **Settings > System Update**.
|
||||
**The Result**: Failure. The phone refused to pull the final OTA (`RP1A.201005.004.A1`), likely due to the Google update servers no longer prioritizing this EOL device.
|
||||
|
||||
### The Fix: Manual Flashing
|
||||
I had to bypass the OTA system entirely. I downloaded the [final Factory Image](https://developers.google.com/android/images) from Google.
|
||||
|
||||
```bash
|
||||
# Don't rely on OTA. Flash the whole valid state.
|
||||
fastboot -w update image-taimen-rp1a.201005.004.a1.zip
|
||||
```
|
||||
|
||||
*Note: I used the `-w` flag here since I had just wiped the device anyway. This gave me a pristine, stock Android 11 environment to break.*
|
||||
|
||||
## Phase 3: The Magisk "Time Travel"
|
||||
|
||||
This is where "modern tools meets old hardware" caused the most pain.
|
||||
|
||||
**The Hypothesis**: Rooting a Pixel is standard procedure.
|
||||
1. Extract `boot.img` from the factory zip.
|
||||
2. Patch it with the latest **Magisk** app.
|
||||
3. Flash it back.
|
||||
|
||||
**The Reality**: Bootloop.
|
||||
I used **Magisk v30.6** (the latest as of writing). The patch process "succeeded," but flashing the resulting image caused the phone to immediately crash back to the bootloader with a "Cannot find valid operating system" error.
|
||||
|
||||
### Debugging the Bootloop
|
||||
I suspected a regression in how modern Magisk handles the antiquated boot partition structure of the Pixel 2 (A/B partitions, but pre-GKI).
|
||||
|
||||
I decided to perform some "software archaeology" and use a version of Magisk that was contemporary with the device's lifespan. I grabbed **Magisk v25.0** (released around 2022).
|
||||
|
||||
1. **Repatch**: I patched the *exact same* stock `boot.img` using the v25.0 app.
|
||||
2. **Reflash**:
|
||||
|
||||
```bash
|
||||
# Flash to both slots to be safe
|
||||
fastboot flash boot_a magisk_patched_25000.img
|
||||
fastboot flash boot_b magisk_patched_25000.img
|
||||
```
|
||||
|
||||
**The Result**: Success. The phone booted, and the Magisk app confirmed `Installed: 25.0`.
|
||||
|
||||
```bash
|
||||
❯ adb shell "su -c id"
|
||||
uid=0(root) gid=0(root) groups=0(root) context=u:r:magisk:s0
|
||||
```
|
||||
|
||||
## Key Insights
|
||||
|
||||
* **Don't Trust OTAs on EOL Devices**: If you're reviving old hardware, the OTA mechanism is likely broken or unreliable. Go straight to the factory images.
|
||||
* **Version Matching Matters**: Tools like Magisk evolve. Using a 2026 root method on a 2017 kernel is a recipe for instability. Sometimes, downgrading your tools is the only way forward.
|
||||
* **A/B Partitions**: Always flash your patched boot image to *both* slots (`boot_a` and `boot_b`) to avoid active slot mismatches causing boot failures.
|
||||
|
||||
With root access secured, the path is now clear to install Frida and finally intercept those elusive EcoFlow authentication packets.
|
||||
|
||||
## References
|
||||
|
||||
1. [Google Pixel Factory Images](https://developers.google.com/android/images)
|
||||
2. [Magisk Installation Guide](https://topjohnwu.github.io/Magisk/install.html)
|
||||
3. [Magisk GitHub Releases](https://github.com/topjohnwu/Magisk/releases)
|
||||
4. [XDA Guide: Unlock/Flash/Root Pixel 2 XL](https://xdaforums.com/t/guide-unlock-flash-root-for-the-pixel-2-xl-taimen.3702418/)
|
||||
@@ -1,111 +0,0 @@
|
||||
---
|
||||
title: "Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian"
|
||||
date: 2025-08-09
|
||||
draft: false
|
||||
---
|
||||
|
||||
I hit an issue where all GPU Operator pods on one node were stuck in Init after migrating from Legacy BIOS to UEFI. The common error was NVIDIA components waiting for “toolkit-ready,” while the toolkit init container looped with:
|
||||
- nvidia-smi failed to communicate with the NVIDIA driver
|
||||
- modprobe nvidia → “Key was rejected by service”
|
||||
|
||||
That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.
|
||||
|
||||
### Environment
|
||||
- Proxmox VM (QEMU/KVM) 8.4.9
|
||||
- Debian 12 (bookworm), kernel 6.1
|
||||
- GPU: NVIDIA Tesla V100 (GV100GL)
|
||||
- NVIDIA driver installed via Debian packages (nvidia-driver, nvidia-kernel-dkms)
|
||||
|
||||
### Root Cause
|
||||
- Secure Boot enabled (verified with `mokutil --sb-state`)
|
||||
- NVIDIA DKMS modules were built, but the signing key was not trusted by the UEFI shim/firmware
|
||||
- VM booted via the fallback “UEFI QEMU HARDDISK” path (not shim), so MOK requests didn’t run; no MOK screen
|
||||
|
||||
### Strategy
|
||||
Keep Secure Boot on; get modules trusted. That requires:
|
||||
1) Ensure the VM boots via shim (so MOK can work)
|
||||
2) Make sure DKMS signs modules with a MOK key/cert
|
||||
3) Enroll that MOK into the firmware via shim’s MokManager
|
||||
|
||||
### Step 1 — Boot via shim and persist EFI variables
|
||||
In Proxmox (VM stopped):
|
||||
- BIOS: OVMF (UEFI)
|
||||
- Add EFI Disk (stores OVMF VARS; required for MOK)
|
||||
- Machine: q35
|
||||
- Enable Secure Boot (option shows only with OVMF + EFI Disk)
|
||||
|
||||
Inside Debian:
|
||||
- Ensure ESP is mounted at `/boot/efi`
|
||||
- Install signed boot stack:
|
||||
```bash
|
||||
sudo apt install shim-signed grub-efi-amd64-signed efibootmgr mokutil
|
||||
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian
|
||||
sudo update-grub
|
||||
```
|
||||
- Create/verify a boot entry that points to shim:
|
||||
```bash
|
||||
sudo efibootmgr -c -d /dev/sda -p 15 -L "debian" -l '\EFI\debian\shimx64.efi'
|
||||
sudo efibootmgr -o 0002,0001,0000 # make shim (0002) first
|
||||
sudo efibootmgr -n 0002 # BootNext shim for the next reboot
|
||||
```
|
||||
Tip: If NVRAM resets or fallback path is used, copy as a fallback:
|
||||
```bash
|
||||
sudo mkdir -p /boot/efi/EFI/BOOT
|
||||
sudo cp /boot/efi/EFI/debian/shimx64.efi /boot/efi/EFI/BOOT/BOOTX64.EFI
|
||||
sudo cp /boot/efi/EFI/debian/{mmx64.efi,grubx64.efi} /boot/efi/EFI/BOOT/
|
||||
```
|
||||
|
||||
### Step 2 — Make DKMS sign NVIDIA modules with a MOK
|
||||
Debian already generated a DKMS key at `/var/lib/dkms/mok.key`. Create an X.509 cert in DER format:
|
||||
```bash
|
||||
sudo openssl req -new -x509 \
|
||||
-key /var/lib/dkms/mok.key \
|
||||
-out /var/lib/dkms/mok.der \
|
||||
-outform DER \
|
||||
-subj "/CN=DKMS MOK/" \
|
||||
-days 36500
|
||||
```
|
||||
Enable DKMS signing:
|
||||
```bash
|
||||
sudo sed -i 's|^mok_signing_key=.*|mok_signing_key=/var/lib/dkms/mok.key|' /etc/dkms/framework.conf
|
||||
sudo sed -i 's|^mok_certificate=.*|mok_certificate=/var/lib/dkms/mok.der|' /etc/dkms/framework.conf
|
||||
```
|
||||
Rebuild/install modules (signs them now):
|
||||
```bash
|
||||
sudo dkms build nvidia/$(modinfo -F version nvidia) -k $(uname -r) --force
|
||||
sudo dkms install nvidia/$(modinfo -F version nvidia) -k $(uname -r) --force
|
||||
```
|
||||
|
||||
### Step 3 — Enroll the MOK via shim (MokManager)
|
||||
Queue the cert and set a longer prompt timeout:
|
||||
```bash
|
||||
sudo mokutil --revoke-import
|
||||
sudo mokutil --import /var/lib/dkms/mok.der
|
||||
sudo mokutil --timeout 30
|
||||
sudo efibootmgr -n 0002 # ensure next boot goes through shim
|
||||
```
|
||||
Reboot to the VM console (not SSH). In the blue MOK UI:
|
||||
- Enroll MOK → Continue → Yes → enter password → reboot
|
||||
|
||||
If arrow keys don’t work in Proxmox noVNC:
|
||||
- Use SPICE (virt-viewer), or
|
||||
- From the Proxmox host, send keys:
|
||||
- `qm sendkey <VMID> down`, `qm sendkey <VMID> ret`, `qm sendkey <VMID> esc`
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
sudo mokutil --test-key /var/lib/dkms/mok.der # “already enrolled”
|
||||
sudo modprobe nvidia
|
||||
nvidia-smi
|
||||
kubectl -n gpu-operator get pods -o wide
|
||||
```
|
||||
Once the module loads, GPU Operator pods on that node leave Init and become Ready.
|
||||
|
||||
### Key Insights
|
||||
- “Key was rejected by service” during `modprobe nvidia` means Secure Boot rejected an untrusted module.
|
||||
- Without shim in the boot path (or without a persistent EFI vars disk), `mokutil --import` won’t surface a MOK screen.
|
||||
- DKMS will not sign modules unless configured; set `mok_signing_key` and `mok_certificate` in `/etc/dkms/framework.conf`.
|
||||
- If you cannot or don’t want to use MOK, the pragmatic dev choice is to disable Secure Boot in OVMF. For production, prefer shim+MOK.
|
||||
|
||||
### References
|
||||
- Proxmox Secure Boot setup (shim + MOK, EFI vars, DKMS): [Proxmox docs](https://pve.proxmox.com/wiki/Secure_Boot_Setup#Setup_instructions_for_shim_+_MOK_variant)
|
||||
@@ -1,165 +0,0 @@
|
||||
---
|
||||
title: "Supabase Deep Dive: It's Not Magic, It's Just Postgres"
|
||||
date: 2025-08-03
|
||||
draft: false
|
||||
---
|
||||
|
||||
In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what's really going on.
|
||||
|
||||
Supabase enters this space with a radically different philosophy: **transparency**. It provides the convenience of a BaaS, but it’s built on the world's most trusted relational database: PostgreSQL. The "magic" isn't a proprietary black box; it's a carefully assembled suite of open-source tools that enhance Postgres, not hide it.
|
||||
|
||||
This deep dive will deconstruct that suite. We will move beyond the basics to explore the architectural patterns, security models, and development workflows that allow you to build robust, scalable applications. We will cover:
|
||||
|
||||
* **The Supabase Blueprint:** A procedural guide to designing your application.
|
||||
* **The Pillars of Supabase:** A detailed look at Auth, Storage, Functions, and Realtime.
|
||||
* **Transactional Realtime:** How Supabase guarantees data consistency in a live environment.
|
||||
* **Best Practices:** The practical knowledge you need before writing a single line of code.
|
||||
|
||||
### The Guiding Philosophy: Your Database as the Source of Truth
|
||||
|
||||
The most critical shift when adopting Supabase is to see your database as more than just a data store. It is your **single source of truth**. This means your database schema is responsible for:
|
||||
|
||||
* **Structure:** The tables and columns that define your data.
|
||||
* **Relationships:** The foreign keys that link tables together.
|
||||
* **Integrity:** The constraints (`NOT NULL`, `UNIQUE`) that ensure your data is always valid.
|
||||
* **Security:** The access control rules that define who can do what.
|
||||
|
||||
By leveraging PostgreSQL's native power, you get **full ACID compliance** (Atomicity, Consistency, Isolation, Durability) out of the box. You don't need to worry about application-level code to prevent orphan records or inconsistent states; the database guarantees it for you.
|
||||
|
||||
### The Supabase Design Blueprint: A Procedural Guide
|
||||
|
||||
To build a scalable application, follow a structured design process that moves from abstract ideas to concrete implementation.
|
||||
|
||||
#### Phase 1: Conceptual Modeling (The Blueprint)
|
||||
Before touching the Supabase dashboard, map out your application on paper.
|
||||
1. **Identify the "Nouns":** These are your core data objects, which will become your database tables. For a project management app, they are `projects`, `tasks`, `users`, `comments`.
|
||||
2. **Define the "Verbs":** These are the user actions. "A user *creates* a task." "A user *assigns* a task to another user." These actions will inform your security policies and APIs.
|
||||
3. **Map Relationships:** How do the nouns connect? A `task` belongs to one `project`. A `user` can have many `tasks`. A `project` can have many `users` (a many-to-many relationship, requiring a `project_users` join table).
|
||||
|
||||
#### Phase 2: The Foundation (Schema & Migrations)
|
||||
Translate your model into SQL. For any serious project, use the **Supabase CLI** to manage this process.
|
||||
1. **Develop Locally:** Run a full Supabase stack on your machine with `supabase start`.
|
||||
2. **Create Migration Files:** Write your `CREATE TABLE` statements in SQL files. Define columns, data types, and foreign key `REFERENCES` to enforce your relationships.
|
||||
3. **Version Control:** Commit these migration files to Git. Your database schema is now version-controlled alongside your application code.
|
||||
4. **Deploy:** Use `supabase db push` to apply your migrations to your live production database. This workflow is safe, repeatable, and professional.
|
||||
|
||||
#### Phase 3: The Security Layer (Row Level Security)
|
||||
This is not an optional step. RLS is the cornerstone of Supabase security.
|
||||
1. **Deny by Default:** For any table holding user data, immediately enable RLS. This blocks all access until you explicitly grant it.
|
||||
```sql
|
||||
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
|
||||
```
|
||||
2. **Write "Allow" Policies:** Create policies based on your user stories. Policies are SQL rules that the database enforces on every single query.
|
||||
```sql
|
||||
-- Users can see tasks in projects they are a member of.
|
||||
CREATE POLICY "Allow read access to tasks in user's projects"
|
||||
ON tasks FOR SELECT
|
||||
USING (
|
||||
EXISTS (
|
||||
SELECT 1 FROM project_users
|
||||
WHERE project_users.project_id = tasks.project_id
|
||||
AND project_users.user_id = auth.uid()
|
||||
)
|
||||
);
|
||||
|
||||
-- Users can only insert tasks for themselves.
|
||||
CREATE POLICY "Allow users to create their own tasks"
|
||||
ON tasks FOR INSERT
|
||||
WITH CHECK ( auth.uid() = tasks.assignee_id );
|
||||
```
|
||||
The `auth.uid()` function is a special Supabase utility that securely returns the ID of the logged-in user making the request.
|
||||
|
||||
#### Phase 4: The APIs (Data Access)
|
||||
With your data structured and secured, you can now build the access points.
|
||||
* **For Simple CRUD:** Use Supabase's auto-generated API. It's convenient, respects all your RLS policies, and is perfect for simple reads and writes on a single table.
|
||||
```javascript
|
||||
const { data, error } = await supabase.from('tasks').select('*');
|
||||
```
|
||||
* **For Complex Logic:** Use PostgreSQL Functions (RPC). Encapsulate complex `JOIN`s or multi-step transactions into a single, callable function. This reduces network chattiness and keeps your business logic secure on the server.
|
||||
```sql
|
||||
-- A function to get a task and its project name in one call
|
||||
CREATE OR REPLACE FUNCTION get_task_with_project(task_id_input int)
|
||||
RETURNS TABLE (task_title text, project_name text) AS $$
|
||||
BEGIN
|
||||
RETURN QUERY
|
||||
SELECT tasks.title, projects.name
|
||||
FROM tasks
|
||||
JOIN projects ON tasks.project_id = projects.id
|
||||
WHERE tasks.id = task_id_input;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
```
|
||||
```javascript
|
||||
// Called simply from the frontend
|
||||
const { data, error } = await supabase.rpc('get_task_with_project', { task_id_input: 123 });
|
||||
```
|
||||
|
||||
### A Tour of the Core Services
|
||||
|
||||
Beyond the database, Supabase provides a suite of essential tools.
|
||||
|
||||
#### Authentication
|
||||
A complete user management system that integrates directly with your database. When a user signs up, a corresponding entry is created in the managed `auth.users` table, which you can then reference in your own tables.
|
||||
```javascript
|
||||
// Sign up a new user and handle social logins with ease
|
||||
const { data, error } = await supabase.auth.signUp({ email, password });
|
||||
const { data, error } = await supabase.auth.signInWithOAuth({ provider: 'github' });
|
||||
```
|
||||
|
||||
#### Storage
|
||||
A simple, S3-compatible object store for managing files like user avatars or documents. It's integrated with Postgres and RLS, allowing you to write fine-grained access policies on files and folders (buckets).
|
||||
```javascript
|
||||
// Upload a user avatar to a public 'avatars' bucket
|
||||
const { error } = await supabase.storage
|
||||
.from('avatars')
|
||||
.upload(`public/${userId}.png`, file);
|
||||
```
|
||||
|
||||
#### Edge Functions vs. Database Functions
|
||||
It's critical to know when to use which.
|
||||
* **Database Functions (SQL):** For data-intensive logic *inside* your database.
|
||||
* **Edge Functions (TypeScript/Deno):** For connecting to the outside world. Use them to call third-party APIs (like Stripe for payments) or run computations that are not well-suited for SQL. This is where you use your secret `service_role` key, as the function runs in a trusted server environment.
|
||||
|
||||
### The Realtime Engine: A Pub/Sub System for Postgres
|
||||
|
||||
Supabase's Realtime engine is a powerful feature for building live, interactive experiences.
|
||||
|
||||
#### How it Works: Logical Replication
|
||||
It's not magic; it leverages a core PostgreSQL feature.
|
||||
1. When you enable Realtime on a table, Supabase creates a **Publication** for it.
|
||||
2. The Realtime server subscribes to this publication via a **Logical Replication Slot**.
|
||||
3. When a transaction is **successfully committed** to your database, the change is written to Postgres's Write-Ahead Log (WAL).
|
||||
4. The WAL change is then sent to the Realtime server through the replication slot.
|
||||
5. The server converts this database event into a JSON payload and broadcasts it over a WebSocket to all subscribed clients.
|
||||
|
||||
#### Transactional Integrity
|
||||
The most important guarantee of this system is its relationship with database transactions. An event is **only broadcast *after* a transaction is fully and successfully committed.** If a transaction is rolled back due to an error, the replication slot receives nothing, and no Realtime event is ever sent. This means you can trust that every Realtime message you receive corresponds to data that is permanently and consistently stored in your database.
|
||||
|
||||
#### Use Cases and Limitations
|
||||
* **Use For:** Small, JSON-based messages like chat messages, live notifications, activity feeds, and presence indicators ("who's online"). Use the `broadcast` feature for ephemeral data like cursor positions that you don't need to save.
|
||||
* **Do NOT Use For:** Large, continuous data streams. It is **not** a replacement for WebRTC for video/audio calls. The system is designed for small, infrequent payloads.
|
||||
|
||||
```javascript
|
||||
const channel = supabase.channel('public:messages');
|
||||
|
||||
// Subscribe to new rows in the 'messages' table
|
||||
channel
|
||||
.on(
|
||||
'postgres_changes',
|
||||
{ event: 'INSERT', schema: 'public', table: 'messages' },
|
||||
(payload) => {
|
||||
console.log('New message received!', payload.new);
|
||||
// Update your UI here
|
||||
}
|
||||
)
|
||||
.subscribe();
|
||||
```
|
||||
|
||||
### Final Words of Advice
|
||||
|
||||
* **Frontend Freedom:** Supabase is frontend-agnostic, but meta-frameworks like **Next.js** and **SvelteKit** offer a "golden path" with Auth Helpers that simplify server-side rendering and data fetching.
|
||||
* **Embrace the CLI:** Use the Supabase CLI for a professional, safe, and repeatable development workflow. Don't manage your production schema by clicking in the UI.
|
||||
* **Know Your Keys:** Use the public `anon` key in the browser. Guard the secret `service_role` key and only use it in secure server environments like Edge Functions.
|
||||
* **Indexes Matter:** For fast queries on large tables, `CREATE INDEX` on frequently queried columns. Performance is not automatic.
|
||||
|
||||
By understanding these principles, you can leverage Supabase not as a simple BaaS, but as a powerful, transparent, and scalable platform for building next-generation applications on the solid foundation of PostgreSQL.
|
||||
@@ -1,81 +0,0 @@
|
||||
---
|
||||
title: "An Architectural Deep Dive of T5"
|
||||
date: 2025-06-01
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the "decoder-only" model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.
|
||||
|
||||
But to truly understand the field, we must look at the pivotal models that explored different paths. Google's T5, or **Text-to-Text Transfer Transformer**, stands out as one of the most influential. It didn't just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.
|
||||
|
||||
### The Core Philosophy: Everything is a Text-to-Text Problem
|
||||
|
||||
The genius of T5 lies in its unifying framework. Instead of building different models or fine-tuning procedures for various NLP tasks, T5 reframes every task as a text-to-text problem. The model takes a string as input and generates a string as output, regardless of the underlying objective.
|
||||
|
||||
This is accomplished by adding a **task prefix** to the input. These prefixes are not conversational prompts like a GPT "system prompt"; they are learned triggers that the model is explicitly fine-tuned to recognize.
|
||||
|
||||
| Task | T5 Input | Expected T5 Output |
|
||||
| :--- | :--- | :--- |
|
||||
| Translation | `translate English to German: The cat is cute.` | `Die Katze ist süß.` |
|
||||
| Summarization | `summarize: [A long news article...]` | `[A concise summary.]` |
|
||||
| Classification | `cola sentence: The boys is walking.` | `unacceptable` |
|
||||
| Similarity | `stsb sentence1: The car is red. sentence2: The auto is crimson.` | `4.8` |
|
||||
|
||||
This elegant approach turns even classification into a generation task, where the model learns to generate the text of the correct label.
|
||||
|
||||
### The Engine: A Two-Window Encoder-Decoder Architecture
|
||||
|
||||
To execute this text-to-text mission, T5 uses the original Transformer's **encoder-decoder architecture**. This is the most significant point of divergence from modern decoder-only LLMs. The inference process works in two distinct stages:
|
||||
|
||||
#### Stage 1: The Encoder (The "Understanding" Window)
|
||||
When T5 receives an input like `summarize: [article text]`, the entire string is fed into the **encoder**.
|
||||
|
||||
* **Bidirectional Context:** The encoder processes the input bidirectionally. Every token can see every other token in the input text simultaneously. This allows the model to build a deep, holistic understanding of the entire prompt and its context.
|
||||
* **Static Representation:** The encoder's final output is not text. It's a set of numerical representations (hidden states) that encapsulates the meaning and intent of the input. This representation is generated once and remains static for the entire generation process.
|
||||
|
||||
#### Stage 2: The Decoder (The "Writing" Window)
|
||||
The decoder is responsible for generating the output string token by token.
|
||||
|
||||
* **Autoregressive Generation:** It begins with a `start-of-sequence` token and generates the output one word at a time.
|
||||
* **Cross-Attention:** At each step, the decoder does two things: it looks at the text it has generated so far (its own "decoder context"), and crucially, it uses a mechanism called **cross-attention** to look back at the static representation created by the encoder. This allows the decoder's generation to be guided by the encoder's complete understanding of the prompt.
|
||||
* **Growing Context:** The decoder's context window grows with each token it generates until it produces an `end-of-sequence` token, signaling that the task is complete.
|
||||
|
||||
This two-window system is a powerful design, especially for tasks that require a full understanding of a source document before generating a new one (like translation or summarization).
|
||||
|
||||
### Architectural Divergence: T5 vs. The Modern LLM Playbook
|
||||
|
||||
Beyond its core architecture, T5 made several specific design choices that contrast with today's standards.
|
||||
|
||||
#### 1. Positional Embeddings: Relative (RPE) vs. Rotary (RoPE)
|
||||
How a model knows the order of words is critical.
|
||||
|
||||
* **T5's Approach (RPE):** T5 uses a form of **Relative Positional Embedding**. Instead of adding a position signal to the word embeddings, it adds a learned bias directly to the attention scores based on the relative distance between tokens. It's a clever way to encode position that is independent of sequence length.
|
||||
* **The Modern Standard (RoPE):** Most modern LLMs (LLaMA, PaLM, Mistral) use **Rotary Positional Embeddings**. As detailed in the CS336 slides, RoPE works by mathematically *rotating* the Query and Key vectors based on their absolute position. This method has proven exceptionally effective for long sequences and is considered the current state-of-the-art.
|
||||
|
||||
#### 2. The Feed-Forward Network: An Extreme Experiment
|
||||
The Feed-Forward Network (FFN) inside each Transformer block is typically 4 times the model's hidden dimension (`d_model`). The original T5 11B model took a radical departure from this rule.
|
||||
|
||||
* **T5 11B's Choice:** It used a small hidden dimension (`d_model = 1024`) but an astoundingly large FFN dimension (`d_ff = 65,536`), a **64-times multiplier**. The rationale was that modern accelerators (like Google's TPUs) are highly efficient at large, dense matrix multiplications.
|
||||
* **The Modern Standard:** This experiment was not widely adopted. Later models, including T5's own successor **T5 v1.1**, reverted to the standard 4x multiplier (or ~2.66x when using GLU activations) for a better balance of parameters and performance.
|
||||
|
||||
#### 3. Denoising: Span Corruption vs. Iterative Diffusion
|
||||
While T5's pre-training is called "denoising," it's conceptually different from the denoising in modern diffusion models.
|
||||
|
||||
* **T5's Denoising:** This is **span corruption**. The model is shown a sentence with chunks of text masked out and learns to predict exactly what was removed in a single step. It's a fill-in-the-blanks task to learn rich language representations.
|
||||
* **Diffusion Denoising:** This is a multi-step generative process. A clean text is gradually corrupted with noise, and the model learns to reverse this process step-by-step, allowing it to generate high-fidelity text from pure noise.
|
||||
|
||||
### Where T5 Was Ahead of its Time
|
||||
|
||||
Despite its differences, the "T5 v1.1" variant pioneered several techniques that are now standard practice in the most advanced LLMs:
|
||||
|
||||
* **RMSNorm:** It was one of the first major models to adopt Root Mean Square Normalization instead of LayerNorm, a choice now used by LLaMA, Mistral, and others for its efficiency and stability.
|
||||
* **Pre-Normalization:** T5 applies the normalization layer *before* the attention and FFN blocks, a critical technique for enabling stable training of very deep networks.
|
||||
* **No Bias Terms:** T5 v1.1 removed the bias parameters from its normalization and FFN layers, a small but important optimization for memory and stability that modern models follow.
|
||||
* **Gated Activations (GeGLU):** While the original T5 used ReLU, T5 v1.1 adopted a Gated Linear Unit (GeGLU), presaging the move to GLU-family activations (like SwiGLU) that is now ubiquitous.
|
||||
|
||||
### Conclusion: The Lasting Legacy
|
||||
|
||||
T5 represents a different evolutionary branch in the Transformer family tree. While the field has largely converged on the decoder-only architecture for its scalability in general-purpose models, T5's design remains a masterclass in purpose-built engineering.
|
||||
|
||||
Its text-to-text framework was revolutionary, its encoder-decoder structure is still a go-to for tasks like translation, and its refined T5 v1.1 architecture laid the groundwork for many of the stability and efficiency tricks we see in today's state-of-the-art models. T5 is more than just a model; it's a crucial case study in the architectural trade-offs that continue to shape the future of artificial intelligence.
|
||||
@@ -1,141 +0,0 @@
|
||||
---
|
||||
title: "From Gemini-3-Flash to T5-Gemma-2: A Journey in Distilling a Family Finance LLM"
|
||||
date: 2025-12-27
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and "wait, was this dinner or *vacation* dinner?" questions.
|
||||
|
||||
For years, I relied on a rule-based system to categorize our credit card transactions. It worked... mostly. But maintaining `if "UBER" in description and amount > 50` style rules is a never-ending battle against the entropy of merchant names and changing habits.
|
||||
|
||||
Recently, I decided to modernize this stack using Large Language Models (LLMs). This post details the technical journey from using an off-the-shelf commercial model to distilling that knowledge into a small, efficient local model (`google/t5gemma-2-270m`) that runs on my own hardware while maintaining high accuracy.
|
||||
|
||||
## Phase 1: The Proof of Concept with Commercial LLMs
|
||||
|
||||
My first step was to replace the spaghetti code of regex rules with a prompt. I used **Gemini-3-Flash** (via `litellm`) as my categorization engine.
|
||||
|
||||
The core challenge was context. A transaction like `MCDONALDS` could be:
|
||||
- **Dining**: A quick lunch during work.
|
||||
- **Travel-Dining**: A meal while on a road trip.
|
||||
|
||||
To solve this, I integrated my **private Google Calendar** (via `.ics` export). The prompt doesn't just see the transaction; it sees *where I was* and *what I was doing* on that day.
|
||||
|
||||
### The "God Prompt"
|
||||
The system prompt was designed to return strict JSON, adhering to a schema of Categories (e.g., `Dining`, `Travel`, `Bills`) and Sub-Categories (e.g., `Travel` -> `Accommodation`).
|
||||
|
||||
```json
|
||||
{
|
||||
"Category": "Travel",
|
||||
"Travel Category": "Dining",
|
||||
"Reasoning": "User is on 'Trip: 34TH ARCH CANYON 2025', distinguishing this from regular dining."
|
||||
}
|
||||
```
|
||||
|
||||
This worked well. The "Reasoning" field even gave me explanations for why it flagged something as `Entertainment` vs `Shopping`. But relying on an external API for every single transaction felt like overkill for a personal project, and I wanted to own the stack.
|
||||
|
||||
## Phase 2: Distilling Knowledge
|
||||
|
||||
I wanted to train a smaller model to mimic Gemini's performance. But I didn't want to manually label thousands of transactions.
|
||||
|
||||
### Consistency Filtering
|
||||
I had a massive CSV of historical transactions (years of data). However, that data was "noisy"—some manual labels were outdated or inconsistent.
|
||||
|
||||
I built a **Distillation Pipeline** (`distill_reasoning.py`) that uses the Teacher Model (Gemini) to re-label the historical data. But here's the twist: I only added a data point to my training set if the **Teacher's prediction matched the Historical Ground Truth**.
|
||||
|
||||
```python
|
||||
# Pseudo-code for consistency filtering
|
||||
teacher_pred = gemini.categorize(transaction)
|
||||
historical_label = row['Category']
|
||||
|
||||
if teacher_pred.category == historical_label:
|
||||
# High confidence sample!
|
||||
training_data.append({
|
||||
"input": format_transaction(transaction),
|
||||
"output": teacher_pred.to_json()
|
||||
})
|
||||
else:
|
||||
# Discard: Either history is wrong OR teacher hallucinated.
|
||||
log_fail(transaction)
|
||||
```
|
||||
|
||||
This filtered out the noise, leaving me with ~2,000 high-quality, "verified" examples where both the human (me, years ago) and the AI agreed.
|
||||
|
||||
## Phase 3: Training the Little Guy
|
||||
|
||||
For the local model, I chose **google/t5gemma-2-270m**. This is a Seq2Seq model, which fits the "Text-to-JSON" task perfectly, and it's tiny (270M parameters), meaning it can run on almost anything.
|
||||
|
||||
### The Stack
|
||||
- **Library**: `transformers`, `peft`, `bitsandbytes`
|
||||
- **Technique**: **LoRA** (Low-Rank Adaptation). I targeted all linear layers (`q_proj`, `k_proj`, `v_proj`, etc.) with `r=16`.
|
||||
- **Optimization**: `AdamW` with linear decay.
|
||||
|
||||
### Pitfall #1: The "Loss is 0" Initial Panic
|
||||
My first training run showed a loss of exactly `0.000` essentially immediately. In deep learning, if it looks too good to be true, it's a bug.
|
||||
It turned out to be a syntax error in my arguments passed to the `Trainer` (or rather, my custom loop). Once fixed, the loss looked "healthy"—starting high and decaying noisily.
|
||||
|
||||
### Pitfall #2: Stability vs. Noise
|
||||
The loss curve was initially extremely erratic. The batch size on my GPU was limited (Physical Batch Size = 4).
|
||||
**The Fix**: I implemented **Gradient Accumulation** (accumulating over 8 steps) to simulate a batch size of 32. This smoothed out the optimization landscape significantly.
|
||||

|
||||
|
||||
### Pitfall #3: Overfitting
|
||||
With a small dataset (~2k samples), overfitting is a real risk. I employed a multi-layered defense strategy:
|
||||
|
||||
1. **Data Quality First**: The "Consistency Filtering" phase was the most critical step. By discarding ambiguous samples where the teacher model disagreed with history, I prevented the model from memorizing noise.
|
||||
2. **Model Regularization**:
|
||||
* **LoRA Dropout**: I set `lora_dropout=0.1`, randomly dropping 10% of the trainable adapter connections during training to force robust feature learning.
|
||||
* **Gradient Clipping**: We capped the gradient norm at `1.0`. This prevents the "exploding gradient" problem and keeps weight updates stable.
|
||||
* **AdamW**: Using the AdamW optimizer adds decoupled weight decay, implicitly penalizing overly complex weights.
|
||||
|
||||
I also set up a rigorous evaluation loop (10% validation split, eval every 50 steps) to monitor the `Train Loss` vs `Eval Loss` in real-time. The final curves showed them tracking downwards together, confirming generalization.
|
||||
|
||||
## Phase 4: Results and The "Travel" Edge Case
|
||||
|
||||
The distilled model is surprisingly capable. It learned the JSON schema very well. Although I included a regex fallback in the inference script as a safety net, the model generates valid JSON the vast majority of the time.
|
||||
|
||||
### Head-to-Head: Local Model vs Gemini-Flash
|
||||
|
||||
I ran a blind evaluation on 20 random unseen transactions.
|
||||
- **Gemini-3-Flash Accuracy**: 90% (18/20)
|
||||
- **Local T5-Gemma-2 Accuracy**: 85% (17/20)
|
||||
|
||||
The gap is surprisingly small. In fact, the local model sometimes outperformed the API because it was fine-tuned on *my* specific data distribution.
|
||||
|
||||
**Win for Local Model:**
|
||||
> **Transaction**: `XX RANCH #1702`
|
||||
> **Local Prediction**: `Groceries` (Correct)
|
||||
> **API Prediction**: `Gas` (Incorrect)
|
||||
> **Local Reasoning**: " XX RANCH refers to a well-known supermarket chain.
|
||||
> **API Reasoning**: "XX RANCH is a known convenience store and gas station chain."
|
||||
> **Analysis**: The local model "knows" (from training data) that XX Ranch is a Asian grocery store I frequent, whereas the general-purpose API assumed it was a gas station based on the name pattern.
|
||||
|
||||
**Win for API (World Knowledge):**
|
||||
> **Transaction**: `LOVE'S #0792`
|
||||
> **Local Prediction**: `Dining` (Hallucination)
|
||||
> **API Prediction**: `Travel-Gas` (Correct)
|
||||
> **Local Reasoning**: "Love's is a well-known restaurant chain, which falls under the Dining category."
|
||||
> **API Reasoning**: "Love's is a well-known gas station chain, and the transaction occurred during a trip to Moab, categorizing it as travel-related fuel."
|
||||
> **Analysis**: The API knows "Love's" is a major gas station chain. The small local model lacks this world knowledge and hallucinates it as a restaurant, highlighting the pure "Knowledge Gap" between a 270M and a 70B+ model. Additionally, Gemini Flash has **Google Search grounding** enabled, allowing it to verify real-world entities in real-time—a capability our isolated local model intrinsically lacks.
|
||||
|
||||
### Surprise Win: JSON Stability
|
||||
|
||||
One pleasant surprise was the **format adherence**. I initially feared I'd need constrained generation tools like `outlines` or a simplified schema for a 270M parameter model. However, the distilled T5-Gemma model followed the complex JSON schema (including nested fields) with near-perfect reliability, proving that specific structure can be learned effectively through fine-tuning alone.
|
||||
|
||||
### Key Lesson: The "Noisy Ground Truth" Trap
|
||||
|
||||
Since this is a **distillation (SFT)** pipeline, not Reinforcement Learning, the model has no way to "unlearn" bad habits via negative rewards. It relies entirely on the quality of the teacher's reasoning.
|
||||
|
||||
> **Transaction**: `[TRAVEL] SWEETHOME KITCHEN`
|
||||
> **Local Prediction**: `Dining`
|
||||
> **API Prediction**: `Travel-Dining`
|
||||
> **Local Reasoning**: "The description 'SWEETHOME KITCHEN' indicates a restaurant or dining establishment, which falls under the Dining category."
|
||||
> **API Reasoning**: "The transaction is for a kitchen/restaurant and occurred while the user was traveling to Pfeiffer Big Sur SP, making it a travel-related dining expense."
|
||||
|
||||
In this case, the API correctly used the calendar context ("User is in Big Sur"). The local model missed this link. This highlights that simply having the data isn't enough—the *reasoning* in the training set must explicitly force the model to look at the context, or it will revert to simple pattern matching (Kitchen = Dining).
|
||||
|
||||
## Conclusion
|
||||
|
||||
We often think we need 70B parameter models for everything. usage shows that for a specific, well-defined task with consistent formatting, a **270M parameter model**—fine-tuned on high-quality, distilled data—can punch way above its weight class.
|
||||
|
||||
The key was **data quality over quantity**. By using the commercial model to "verify" my historical data, I created a dataset that was cleaner than either source alone.
|
||||
@@ -1,103 +0,0 @@
|
||||
---
|
||||
title: "The Convergence of Fast Weights, Linear Attention, and State Space Models"
|
||||
date: 2025-12-19
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
|
||||
Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms ("Fast Weights") and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).
|
||||
|
||||
This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.
|
||||
|
||||
## 1. The Standard Transformer Bottleneck
|
||||
To understand the motivation for Fast Weights, one must first identify the inefficiency in standard Transformers. The core operation is **Self-Attention**, defined as:
|
||||
|
||||
$$ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{Q K^T}{\sqrt{d}}\right) V $$
|
||||
|
||||
During inference (generating tokens one by one), the model computes a Query ($Q$) for the current token and compares it against the Keys ($K$) and Values ($V$) of all previous tokens.
|
||||
* **Computational Cost:** Quadratic $O(N^2)$ during training; Linear $O(N)$ per step during inference.
|
||||
* **Memory Cost:** The KV Cache. To calculate the softmax, the model must explicitly store the $K$ and $V$ vectors for the entire history in GPU memory. For long contexts (e.g., 1 million tokens), this memory footprint becomes prohibitive.
|
||||
|
||||
The **Softmax** function is the culprit. It introduces a non-linearity that binds $Q$ and $K$ together, preventing the mathematical separation of the current query from the historical context.
|
||||
|
||||
## 2. Fast Weights as Associative Memory
|
||||
Geoffrey Hinton proposes that the brain does not maintain a "digital buffer" of past activations (like a KV cache). Instead, it relies on **Fast Weights**.
|
||||
|
||||
In this framework, neural connections possess two timescales:
|
||||
1. **Slow Weights:** The standard parameters learned over long periods (training).
|
||||
2. **Fast Weights:** Synaptic strengths that change rapidly during a forward pass to store temporary context.
|
||||
|
||||
Hinton formalizes this temporary storage as an **Associative Memory**. When a network encounters a new key-value pair ($k, v$), it does not store the vectors in a list. Instead, it updates a fast weight matrix $W_{fast}$ using the Hebbian learning rule (outer product):
|
||||
|
||||
$$ W_{fast} \leftarrow \lambda W_{fast} + (v \otimes k) $$
|
||||
|
||||
Here, $\lambda$ is a decay factor ($0 < \lambda < 1$) representing forgetfulness. This matrix $W_{fast}$ compresses the history into a fixed-size representation of size $d \times d$, regardless of the sequence length.
|
||||
|
||||
## 3. Mathematical Unification: Linear Attention
|
||||
The connection between Fast Weights and Transformers is established by removing the softmax function from the attention mechanism, a technique known as **Linear Attention**.
|
||||
|
||||
If we treat the interaction between $Q$ and $K$ as linear, the attention equation becomes:
|
||||
|
||||
$$ \text{LinearAttention} = (Q K^T) V $$
|
||||
|
||||
Using the associative property of matrix multiplication, we can reorder the operations:
|
||||
|
||||
$$ Q (K^T V) $$
|
||||
|
||||
This reordering fundamentally alters the mechanism:
|
||||
* **Left Side $(Q K^T) V$:** Compare Query to all Keys, then multiply by Values. Requires storing history.
|
||||
* **Right Side $Q (K^T V)$:** Compute the summation of Key-Value outer products first.
|
||||
|
||||
The term $(K^T V)$ represents the summation of all past associations. This term **is** the Fast Weight matrix $W_{fast}$ described by Hinton.
|
||||
|
||||
$$ \text{State}_t = \sum_{i=1}^t k_i v_i^T $$
|
||||
|
||||
Thus, Linear Attention is effectively a system where the "state" is a matrix of Fast Weights that is updated at every time step.
|
||||
|
||||
## 4. State Space Models (SSMs) as Recurrent Fast Weights
|
||||
State Space Models (like S4 and Mamba) typically define sequence modeling through continuous control theory, discretized into a recurrence:
|
||||
|
||||
$$ h_t = \bar{A} h_{t-1} + \bar{B} x_t $$
|
||||
$$ y_t = \bar{C} h_t $$
|
||||
|
||||
While derived differently, this recurrence is mathematically equivalent to the Linear Attention/Fast Weight mechanism. We can demonstrate this by "unrolling" the SSM recursion to see how the output $y_t$ depends on the history.
|
||||
|
||||
The output at time $t$ is the sum of inputs weighted by decaying powers of $\bar{A}$:
|
||||
|
||||
$$ y_t = \sum_{j=1}^t \bar{C} (\bar{A}^{t-j}) (\bar{B} x_j) $$
|
||||
|
||||
Comparing this to the Linear Attention formulation with decay $\lambda$:
|
||||
|
||||
$$ \text{Attention}_t = q_t \sum_{j=1}^t (\lambda^{t-j}) (k_j^T v_j) $$
|
||||
|
||||
The mapping between architectures becomes clear:
|
||||
* **Query ($q_t$)** $\leftrightarrow$ Output Matrix **$\bar{C}$**
|
||||
* **Key/Value ($k_j^T v_j$)** $\leftrightarrow$ Input Matrix **$\bar{B} x_j$** (Input Projection)
|
||||
* **Decay Factor ($\lambda$)** $\leftrightarrow$ State Matrix **$\bar{A}$**
|
||||
* **Fast Weight Matrix ($S_t$)** $\leftrightarrow$ Hidden State **$h_t$**
|
||||
|
||||
Therefore, an SSM is mechanically a Transformer that uses Fast Weights (a fixed-size recurrent state) rather than a KV Cache (a growing buffer) to handle attention.
|
||||
|
||||
## 5. Implications for Inference Optimization
|
||||
This theoretical convergence has significant implications for inference efficiency.
|
||||
|
||||
### Standard Transformer
|
||||
* **Mechanism:** Stores history in a KV Cache.
|
||||
* **Memory:** $O(N)$ (Grows linearly with sequence length).
|
||||
* **Performance:** High recall/precision because it retains the exact history.
|
||||
|
||||
### Fast Weight / SSM (Mamba / RWKV)
|
||||
* **Mechanism:** Compresses history into a single Matrix/Vector state.
|
||||
* **Memory:** $O(1)$ (Constant memory, regardless of sequence length).
|
||||
* **Performance:** Historically lower than Transformers due to "compression loss" (trying to stuff infinite history into a finite matrix).
|
||||
|
||||
**The Solution:** Modern SSMs like Mamba improve upon basic Linear Attention by introducing **Selectivity**. Instead of compressing *all* history equally (which blurs the memory), Mamba allows the model to dynamically gate the inputs—choosing to store relevant information and reset/forget irrelevant noise. This allows the Fast Weight approach to compete with the accuracy of explicit Attention while maintaining constant memory usage.
|
||||
|
||||
### References
|
||||
|
||||
1. **Hinton, G. E., & Plaut, D. C. (1987).** "Using Fast Weights to Deblur Old Memories." *Proceedings of the 9th Annual Conference of the Cognitive Science Society.*
|
||||
2. **Ba, J., Hinton, G. E., et al. (2016).** "Using Fast Weights to Attend to the Recent Past." *Advances in Neural Information Processing Systems (NeurIPS).*
|
||||
3. **Katharopoulos, A., et al. (2020).** "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention." *International Conference on Machine Learning (ICML).*
|
||||
4. **Gu, A., & Dao, T. (2023).** "Mamba: Linear-Time Sequence Modeling with Selective State Spaces." *arXiv preprint arXiv:2312.00752.*
|
||||
5. **Vaswani, A., et al. (2017).** "Attention Is All You Need." *Advances in Neural Information Processing Systems (NeurIPS).*
|
||||
@@ -1,93 +0,0 @@
|
||||
---
|
||||
title: "Transformer's Core Mechanics"
|
||||
date: 2025-04-01
|
||||
draft: false
|
||||
---
|
||||
|
||||
The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of "channels" to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.
|
||||
|
||||
### 1. The "Channel": A Foundational View of `d_model`
|
||||
|
||||
In deep learning, a "channel" can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model's primary embedding dimension, commonly referred to as `d_model`.
|
||||
|
||||
An input text is first tokenized, and each token is mapped to a vector of size `d_model` (e.g., 4096). Each of the 4096 dimensions in this vector can be considered a "channel," representing a different semantic or syntactic feature of the token.
|
||||
|
||||
As this data, represented by a tensor of shape `[batch_size, sequence_length, d_model]`, progresses through the layers of the Transformer, these channels are continuously transformed. However, a critical design choice is that the output dimension of every main sub-layer (like the attention block or the FFN block) is also `d_model`. This consistency is essential for enabling **residual connections**, where the input to a block is added to its output (`output = input + SubLayer(input)`). This technique is vital for training the extremely deep networks common today.
|
||||
|
||||
### 2. The Building Blocks: Dimensions of Key Layers
|
||||
|
||||
A Transformer layer is primarily composed of two sub-layers: a Multi-Head Attention block and a position-wise Feed-Forward Network (FFN). The parameters for these are stored in several key weight matrices. Understanding their dimensions is crucial.
|
||||
|
||||
Let's define our variables:
|
||||
* `d_model`: The core embedding dimension.
|
||||
* `d_ff`: The inner dimension of the FFN, typically `4 * d_model`.
|
||||
* `h`: The number of attention heads.
|
||||
* `d_head`: The dimension of each attention head, where `d_model = h * d_head`.
|
||||
|
||||
The dimensions of the weight matrices are as follows:
|
||||
|
||||
| Layer | Weight Matrix | Input Vector Shape | Output Vector Shape | **Weight Matrix Dimension** |
|
||||
| ----------------------------- | ------------- | ------------------ | ------------------- | ------------------------- |
|
||||
| **Attention Projections** | | | | |
|
||||
| Query | `W_Q` | `d_model` | `d_model` | **`[d_model, d_model]`** |
|
||||
| Key | `W_K` | `d_model` | `d_model` | **`[d_model, d_model]`** |
|
||||
| Value | `W_V` | `d_model` | `d_model` | **`[d_model, d_model]`** |
|
||||
| Output | `W_O` | `d_model` | `d_model` | **`[d_model, d_model]`** |
|
||||
| **Feed-Forward Network** | | | | |
|
||||
| Layer 1 (Up-projection) | `W_ff1` | `d_model` | `d_ff` | **`[d_model, d_ff]`** |
|
||||
| Layer 2 (Down-projection) | `W_ff2` | `d_ff` | `d_model` | **`[d_ff, d_model]`** |
|
||||
|
||||
### 3. Deconstructing Multi-Head Attention (MHA)
|
||||
|
||||
The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
|
||||

|
||||
#### 3.1. The "Why": Beyond a Single Attention
|
||||
A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating `h` parallel subspaces. Each "head" can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.
|
||||
|
||||
#### 3.2. An Encoding/Decoding Analogy
|
||||
A powerful way to conceptualize the attention calculation is as a two-stage process:
|
||||
1. **Encoding Relationships:** The first part of the calculation, `softmax(Q @ K.T)`, can be seen as an encoding step. It does not use the actual "content" of the tokens (the `V` vectors). Instead, it uses the Queries and Keys to build a dynamic "relationship map" between tokens in the sequence. This map, a matrix of attention scores, answers the question: "For each token, how important is every other token right now?"
|
||||
2. **Decoding via Information Retrieval:** The second part, `scores @ V`, acts as a decoding step. It uses the relationship map to retrieve and synthesize information. For each token, it creates a new vector by taking a weighted sum of all the `V` vectors in the sequence, using the scores as the precise mixing recipe. It decodes the relational structure into a new, context-aware representation.
|
||||
|
||||
#### 3.3. The "How": A Step-by-Step Flow
|
||||
The MHA process is designed for maximum computational efficiency.
|
||||
1. **Initial Projections:** The input vectors (shape `[seq_len, d_model]`) are multiplied by `W_Q`, `W_K`, and `W_V`. These matrices are all `[d_model, d_model]` not to create one large query, but to **efficiently compute the vectors for all `h` heads at once**. The single large output vector is then reshaped into `h` separate vectors, each of size `d_head`.
|
||||
2. **Attention Score Calculation:** For each head `i`, a score matrix is calculated: `scores_i = softmax( (Q_i @ K_i.T) / sqrt(d_head) )`. Note that `Q_i` and `K_i` have dimensions `[seq_len, d_head]`, so the resulting `scores_i` matrix has a dimension of **`[seq_len, seq_len]`**.
|
||||
3. **Weighted Value Calculation:** The scores are used to create a weighted sum of the Value vectors for each head: `output_i = scores_i @ V_i`. Since `scores_i` is `[seq_len, seq_len]` and `V_i` is `[seq_len, d_head]`, the resulting `output_i` has a dimension of **`[seq_len, d_head]`**. This is the final output of a single head.
|
||||
4. **Concatenation and Final Projection:** The outputs of all `h` heads are concatenated along the last dimension. This produces a single large matrix of shape `[seq_len, h * d_head]`, which is equivalent to `[seq_len, d_model]`. This matrix is then passed through the final output projection layer, `W_O` (shape `[d_model, d_model]`), to produce the attention block's final output. The `W_O` matrix learns the optimal way to mix the information from all the specialized heads into a single, unified representation.
|
||||
|
||||
### 4. Optimizing Attention: GQA and MQA
|
||||
|
||||
During inference, storing the Key and Value vectors for all previous tokens (the KV Cache) is a major memory bottleneck. **Grouped-Query Attention (GQA)** and **Multi-Query Attention (MQA)** are architectural modifications that address this by allowing multiple Query heads to share the same Key and Value heads.
|
||||
|
||||
Let's use a concrete example, similar to Llama 2 7B:
|
||||
* `d_model` = 4096
|
||||
* `h` = 32 Q heads
|
||||
* `d_head` = 128
|
||||
* `g` = 8 KV head groups for GQA
|
||||
|
||||
The key insight is that only the dimensions of the `W_K` and `W_V` matrices change, which in turn reduces the size of the KV cache. The `W_Q` and `W_O` matrices remain `[4096, 4096]`.
|
||||
|
||||
| Attention Type | No. of Q Heads | No. of KV Heads | `W_K` & `W_V` Dimension | Relative KV Cache Size |
|
||||
| ------------------- | -------------- | --------------- | ----------------------- | ---------------------- |
|
||||
| **MHA** (Multi-Head)| 32 | 32 | `[4096, 32*128]` = `[4096, 4096]` | 1x (Baseline) |
|
||||
| **GQA** (Grouped) | 32 | 8 | `[4096, 8*128]` = `[4096, 1024]` | 1/4x |
|
||||
| **MQA** (Multi-Query)| 32 | 1 | `[4096, 1*128]` = `[4096, 128]` | 1/32x |
|
||||
|
||||
GQA provides a robust balance, significantly reducing the memory and bandwidth requirements for the KV cache with negligible impact on model performance, making it a popular choice in modern LLMs.
|
||||
|
||||
### 5. MHA vs. Mixture of Experts (MoE): A Clarification
|
||||
|
||||
While both MHA and MoE use the concept of "experts," they are functionally and architecturally distinct.
|
||||
|
||||
* **MHA:** The "experts" are the **attention heads**. All heads are active for every token to build a rich representation within the attention layer. This is akin to a board meeting where every member analyzes and contributes to every decision.
|
||||
* **MoE:** The "experts" are full **Feed-Forward Networks**. A routing network selects a small subset of these FFNs for each token. This is a scaling strategy to increase a model's parameter count for greater capacity while keeping the computational cost fixed. It replaces the standard FFN block, whereas MHA *is* the attention block.
|
||||
|
||||
By understanding these technical details, from the basic concept of a channel to the sophisticated interplay of heads and experts, one can build a more complete and accurate mental model of how LLMs truly operate.
|
||||
|
||||
---
|
||||
### References
|
||||
|
||||
1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. *Advances in neural information processing systems*, 30.
|
||||
2. Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. *arXiv preprint arXiv:1701.06538*.
|
||||
3. Ainslie, J., Ontanon, J., Cakka, E., Dosovitskiy, A., & Le, Q. V. (2023). GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. *arXiv preprint arXiv:2305.13245*.
|
||||
@@ -1,83 +0,0 @@
|
||||
---
|
||||
title: "UniFi VLAN Migration to Zone-Based Architecture"
|
||||
date: 2025-09-22
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
Embarking on a network migration to a properly segmented VLAN architecture is a rite of passage for any serious home lab or small business operator. The goal is clear: improve security and organization by separating traffic. However, the path from a flat network to a segmented one is often paved with subtle but critical configuration details that can lead to hours of frustrating troubleshooting.
|
||||
|
||||
This article documents that journey. It details the pitfalls encountered, the core networking concepts that were essential to understand, and the best practices that ultimately led to a stable, secure, and logical network design built on a zone-based firewall model.
|
||||
|
||||
### Lesson 1: Demystifying the Native VLAN
|
||||
|
||||
The most significant source of initial problems was a fundamental misunderstanding of the "Native VLAN" setting on a switch port.
|
||||
|
||||
**The Misconception:** It's easy to assume that the "Native Network" on a port should be set to the VLAN you want the connected device to be on. For example, if a switch should be on the "corp" network (VLAN 10), one might set its management VLAN to `corp` and the upstream switch port's Native Network to `corp` as well.
|
||||
|
||||
**The Reality:** The Native VLAN on a trunk port has a specific purpose: it determines which VLAN any **untagged** traffic belongs to. A trunk port is designed to carry traffic for multiple VLANs by adding a "tag" to each packet. The one exception is the traffic for the Native VLAN, which is sent *without* a tag.
|
||||
|
||||
This leads to a critical rule: **for a trunk link to function correctly, the Native VLAN must be the same on both ends of the connection.** When they mismatch, management traffic from devices like switches and access points gets lost, sending them offline.
|
||||
|
||||
### Lesson 2: The Power of a Dedicated Management VLAN
|
||||
|
||||
This realization about the Native VLAN led directly to the next critical architectural decision: isolating the network's control plane. The initial plan involved using VLAN 1 for a DMZ, but this is a significant security risk, as VLAN 1 is often the default "catch-all" network.
|
||||
|
||||
**The Best Practice:** The industry-standard solution is to create a dedicated **Management VLAN**. This network's sole purpose is to be the home for the management interfaces of your router, switches, and access points.
|
||||
|
||||
The final, secure architecture was as follows:
|
||||
1. A new network, "Management" (e.g., VLAN 1, `192.168.1.0/24`), was created.
|
||||
2. This network was assigned to its own "Management" firewall zone with highly restrictive rules.
|
||||
3. All trunk ports connecting switches and access points were configured with "Management" as the **Native VLAN**.
|
||||
4. All other user-facing VLANs (`corp`, `iot`, `dmz`) were configured as **Tagged VLANs** on these trunk ports.
|
||||
|
||||
This isolates the network's control plane from the data plane, vastly improving the security posture.
|
||||
|
||||
### Lesson 3: Mastering Inter-VLAN Communication
|
||||
|
||||
With traffic properly segmented at Layer 2, the next challenge was controlling communication at Layer 3. This is the job of the router and its firewall, and it presented a common challenge: providing DHCP to clients when the server resides in a different VLAN.
|
||||
|
||||
DHCP requests are broadcasts and are not passed between VLANs by a router. The solution is to use a **DHCP Relay**.
|
||||
1. On the network configuration for a client VLAN (e.g., `corp`), the DHCP mode was changed from "Server" to "Relay".
|
||||
2. The IP address of the actual DHCP server was specified.
|
||||
|
||||
This instructs the router to listen for DHCP broadcasts, catch them, and forward them as a unicast packet directly to the DHCP server. For this to work, the firewall must allow this traffic, and the DHCP server itself must be configured with a "scope" or pool of IP addresses for the client's subnet.
|
||||
|
||||
### The Final Architecture: A Zone-Based Firewall Model
|
||||
|
||||
The culmination of these lessons is a network architecture defined by clear, logical zones, each with a distinct purpose and trust level. This model simplifies firewall management and provides a robust security posture that is easy to understand at a glance.
|
||||
|
||||
#### Network Zones and Their Roles
|
||||
|
||||
The final configuration groups the individual VLANs into distinct zones, forming the foundation of the security policy.
|
||||
|
||||
* **Internal:** Contains the `corp` network. This is the most trusted zone for daily work.
|
||||
* **DMZ:** Contains the `dns` and `prod` networks for semi-trusted, exposed services.
|
||||
* **IoT:** Contains the `iot` network. This is a low-trust zone for smart devices.
|
||||
* **Management:** Contains the `management` network. This is a highly privileged, isolated zone for network infrastructure.
|
||||

|
||||
|
||||
#### The Security Policy Matrix
|
||||
|
||||
The true power of this model is realized in the firewall's zone matrix, which dictates the default traffic flow between each zone.
|
||||

|
||||
|
||||
This matrix enforces the desired security policy with clear, high-level rules:
|
||||
* **Complete IoT Isolation:** The `IoT` row shows that devices in this zone are blocked from initiating any communication with any other internal zone. Their only allowed path is out to the internet.
|
||||
* **Protected Management Plane:** The `management` row and column are almost entirely red. The critical network infrastructure is blocked from initiating contact with any user-facing zone, and vice-versa, following the principle of least privilege.
|
||||
* **Controlled DMZ Access:** The `DMZ` is prevented from initiating connections to the trusted `Internal` zone, preventing a compromised public-facing server from being used as a pivot point to attack internal devices.
|
||||
|
||||
#### Granular Intra-Zone Control
|
||||
|
||||
Beyond the high-level zone policies, the configuration also implements granular rules to control traffic *within* a single zone, providing defense-in-depth.
|
||||
|
||||
These rules explicitly define the communication paths between services. For instance, rules allow a specific device to access a Kubernetes load balancer while another rule allows general DNS access within the zone. This ensures that even within a semi-trusted zone, services can only communicate in expected and necessary ways, further reducing the potential attack surface.
|
||||
|
||||
By adhering to these principles, what began as a day of frustrating troubleshooting evolved into a robust, layered, and logically segmented network that balances simplicity with strong security practices.
|
||||
|
||||
***
|
||||
|
||||
### References
|
||||
|
||||
* [Troubleshooting UniFi Device Connectivity](https://help.ui.com/hc/en-us/articles/7258465146519-Troubleshooting-UniFi-Device-Connectivity)
|
||||
* [Virtual Network (VLAN) Troubleshooting](https://help.ui.com/hc/en-us/articles/9592924981911-Virtual-Network-VLAN-Troubleshooting)
|
||||
@@ -1,11 +0,0 @@
|
||||
+++
|
||||
date = 2020-10-26T04:14:43Z
|
||||
title = "Some useful files"
|
||||
description = ""
|
||||
slug = ""
|
||||
tags = []
|
||||
categories = []
|
||||
externalLink = ""
|
||||
series = []
|
||||
+++
|
||||
* [rootCA.pem](/rootCA.crt)
|
||||
@@ -1,53 +0,0 @@
|
||||
---
|
||||
title: "vAttention"
|
||||
date: 2025-12-08
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While **PagedAttention** (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.
|
||||
|
||||
#### The Status Quo: PagedAttention and Software Tables
|
||||
Prior to PagedAttention, systems allocated contiguous memory for the maximum possible context length, leading to severe fragmentation and wasted memory. PagedAttention addressed this by chunking the KV cache into non-contiguous blocks, managed by a software-defined "page table" (the Block Table) [1].
|
||||
|
||||
While effective at reducing fragmentation, this approach introduces significant complexity:
|
||||
* **Kernel Rewriting:** Because the KV cache is no longer contiguous in virtual memory, standard attention kernels (like cuDNN SDPA or vanilla FlashAttention) cannot be used directly. Developers must rewrite kernels to manually dereference block tables [1].
|
||||
* **Software Overhead:** The system must manage virtual-to-physical mapping in user space, duplicating work typically handled by the OS. This adds runtime overhead to the critical path of both the CPU (managing tables) and the GPU (performing lookups) [1].
|
||||
* **Performance Penalties:** PagedAttention-based kernels have been observed to be slower than their non-paged counterparts. For example, vLLM's paged kernel has shown to be up to 2.8x slower than FlashAttention-2 in specific tests [1].
|
||||
|
||||
#### The Hardware-Native Alternative: vAttention
|
||||
**vAttention** proposes returning the responsibility of memory management to the OS and hardware. By utilizing the CUDA Virtual Memory Management (VMM) APIs, it is possible to decouple the allocation of virtual memory from physical memory [1].
|
||||
|
||||
**How it works:**
|
||||
1. **Virtual Contiguity:** The system reserves a large, contiguous range of virtual addresses for the KV cache at request start.
|
||||
2. **Physical Paging:** Physical memory pages are allocated and mapped to this virtual range only on demand (dynamically) as the token sequence grows [1].
|
||||
3. **Hardware Lookups:** Because the GPU sees a contiguous virtual address range, the hardware Translation Lookaside Buffer (TLB) handles the address translation. This allows the use of unmodified, high-performance kernels like FlashAttention-2 or FlashAttention-3 without custom paging logic [1].
|
||||
|
||||
#### Technical Challenges and Solutions
|
||||
Historically, using the GPU native virtual memory for high-frequency token generation faced two major bottlenecks: **Control Plane Latency** and **Page Granularity**.
|
||||
|
||||
**1. Control Plane Latency (The API Bottleneck)**
|
||||
Standard memory allocation (`cudaMalloc`) is monolithic—it allocates virtual and physical memory simultaneously. The more granular driver API, `cuMemMap`, allows separating these steps but involves expensive round-trips to the OS driver. Invoking these APIs synchronously during decoding (which generates one token at a time) would stall the GPU execution pipeline [1].
|
||||
|
||||
To solve this, vAttention utilizes **execution overlap**:
|
||||
* Because LLM decoding is autoregressive and predictable, the system knows exactly when new memory is needed (one token ahead).
|
||||
* The CPU initiates the memory mapping for the *next* token asynchronously while the GPU is still computing the *current* token. By the time the GPU reaches the next step, the TLB and page tables are already updated, effectively hiding the driver latency [1].
|
||||
|
||||
**2. Page Size Granularity (The Fragmentation Bottleneck)**
|
||||
The GPU TLB hierarchy is sensitive to page sizes.
|
||||
* **4KB Pages:** Too small. Mapping gigabytes of KV cache with 4KB pages causes "TLB thrashing," degrading performance.
|
||||
* **2MB Huge Pages:** The standard for CUDA large allocations. However, allocating 2MB for a single token update causes massive internal fragmentation, negating the benefits of dynamic allocation.
|
||||
|
||||
Research identified **64KB** as the optimal page size, offering a balance between TLB efficiency and memory utilization. While standard CUDA APIs default to 2MB, vAttention utilizes modified driver calls to enable 64KB pages, eliminating TLB thrashing without incurring the fragmentation cost of huge pages [1].
|
||||
|
||||
#### Performance and Portability Implications
|
||||
Moving memory management from software (PagedAttention) to hardware (vAttention) yields measurable benefits:
|
||||
|
||||
* **Throughput:** In prefill-heavy workloads, vAttention outperforms PagedAttention-based systems (like vLLM and FlashInfer) by up to 1.23x due to the elimination of software lookup overheads. In decoding, it matches or exceeds the performance of optimized paged kernels [1].
|
||||
* **Portability:** A significant advantage is software compatibility. When NVIDIA released FlashAttention-3 (optimized for Hopper H100 GPUs), it did not initially support PagedAttention. vAttention enabled the immediate use of FlashAttention-3 with dynamic memory support, achieving up to 1.5x higher throughput than PagedAttention-based FlashAttention-2 [1].
|
||||
|
||||
#### Conclusion
|
||||
While PagedAttention solved the critical issue of memory fragmentation in LLMs, it necessitated a complex software abstraction layer. By leveraging low-level CUDA VMM APIs, handling allocations asynchronously to hide driver latency, and optimizing page sizes, it is possible to achieve dynamic memory management using the GPU's native hardware. This restores the illusion of contiguous memory, simplifies kernel development, and improves inference performance.
|
||||
|
||||
### References
|
||||
[1] R. Prabhu et al., "vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention," in *Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '25)*, 2025.
|
||||
@@ -1,70 +0,0 @@
|
||||
---
|
||||
title: "Hacking a Chinese Car Stereo to fulfill my Knight Rider dreams"
|
||||
date: 2026-01-21
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
"Vibe coding" has become my latest obsession. It's that flow state where the tools disappear, and you're just manipulating logic at the speed of thought. Usually, this happens in a high-end IDE like Antigravity. But lately, I've been trying to answer a childhood dream.
|
||||
|
||||
Growing up in China before the internet age, my window to the outside world was CCTV-6. Along with *Baywatch*, one of the first American TV shows I ever watched was *Knight Rider*. I don't remember the exact plot lines, but the core concept stuck with me forever: KITT. A car that could talk, think, and do things for you.
|
||||
|
||||
Decades later, I'm sitting in my Jeep, wondering: Can I build my own KITT? Can I take the vibe on the road?
|
||||
|
||||
I already updated the head unit in my Jeep to an aftermarket unit. It features a **K706 (UIS7862S)** chipset with an **8-core CPU and 8GB of RAM**, essentially making it a reasonably powerful Android tablet hardwired into the dashboard.
|
||||
|
||||
## The Objective
|
||||
Turn this car accessory into a legitimate dev environment. I wanted a physical keyboard, a real terminal, and access to my AI coding assistants. I wanted to push code while parked on a trail.
|
||||
|
||||
## The Hardware Blocker: Getting Input
|
||||
The first hurdle was mundane but blocking: My Bluetooth keyboard wouldn't pair. The head unit could see other devices, but refused to connect to my keyboard.
|
||||
|
||||
### Attempt 1: The USB Dongle Bypass
|
||||
My first instinct was to blame the cheap Chinese head unit hardware. I grabbed a spare TP-Link USB Bluetooth dongle and plugged it in, hoping to bypass the internal stack entirely.
|
||||
|
||||
The device showed up in `lsusb`, but it remained inert. A quick check of the kernel config via `zcat /proc/config.gz` revealed the bad news:
|
||||
|
||||
```bash
|
||||
# CONFIG_BT is not set
|
||||
```
|
||||
|
||||
The kernel was compiled without generic Bluetooth driver support (`btusb`). Even with root access, I couldn't load the drivers because they simply didn't exist in the firmware. I was stuck with the internal hardware.
|
||||
|
||||
### Attempt 2: The "Dual Bluetooth" Fix
|
||||
Forced back to the built-in Bluetooth, I tried to diagnose why it was ignoring my keyboard. Standard debugging tools painted a grim picture:
|
||||
|
||||
```bash
|
||||
❯ hciconfig -a
|
||||
# (Empty output - no standard HCI interface found)
|
||||
|
||||
❯ ps -A | grep -iE "goc|ivt|syu"
|
||||
u0_a50 3456 ... com.goc.sdk # Accessing the proprietary BT chip
|
||||
```
|
||||
|
||||
The diagnosis was clear: The internal Bluetooth chip is acting in **Slave Mode** (Client), managed by a proprietary `com.goc.sdk` service instead of the standard Android Bluetooth stack. It's designed to *be* a speaker for your phone, not to *host* a keyboard.
|
||||
|
||||
**The Fix**: Hidden deep in the Factory Settings (password `8888`), there's a toggle called **"Dual Bluetooth"**. Enabling this flips the proprietary stack to expose a standard Host interface. Enable that, and suddenly my mechanical keyboard connected instantly.
|
||||
|
||||
## The Software: Termux + Claude
|
||||
With input sorted, the software setup was surprisingly straightforward. **Termux** was the obvious choice for a terminal.
|
||||
|
||||
I discovered that **Claude Code** works on Termux with zero hassle.
|
||||
|
||||
The setup was shockingly simple:
|
||||
```bash
|
||||
pkg install nodejs git ripgrep
|
||||
npm install -g @anthropic-ai/claude-code
|
||||
```
|
||||
|
||||
Authentication via `claude login` worked out of the box. Now, I have a fully capable coding agent running directly on my dashboard. I can pull a repo, ask Claude to refactor a module, and push the changes—all without opening a laptop.
|
||||
|
||||

|
||||
|
||||
## Key Insights
|
||||
|
||||
* **Head Units are just Weird Tablets**: They have quirks (like Slave-only Bluetooth), but they are standard Android under the hood. `adb root` is your best friend for diagnosing them.
|
||||
* **Check the Kernel Config**: Before buying hardware peripherals for embedded Android devices, always check `/proc/config.gz`. If the support isn't compiled in, you're dead in the water.
|
||||
* **The Vibe is Portable**: With tools like Termux and Claude Code, the "dev environment" is no longer a heavy laptop. It's anywhere you have a terminal.
|
||||
|
||||
## References
|
||||
1. [Reddit: Claude Code on Termux](https://www.reddit.com/r/termux/comments/1jd4y4y/claude_code_is_easy_to_install_on_termux/)
|
||||
@@ -1,7 +0,0 @@
|
||||
{
|
||||
"folders": [
|
||||
{
|
||||
"path": "."
|
||||
}
|
||||
]
|
||||
}
|
||||
165
fonts/LICENSE.txt
Normal file
@@ -0,0 +1,165 @@
|
||||
Fonticons, Inc. (https://fontawesome.com)
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Font Awesome Free License
|
||||
|
||||
Font Awesome Free is free, open source, and GPL friendly. You can use it for
|
||||
commercial projects, open source projects, or really almost whatever you want.
|
||||
Full Font Awesome Free license: https://fontawesome.com/license/free.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
# Icons: CC BY 4.0 License (https://creativecommons.org/licenses/by/4.0/)
|
||||
|
||||
The Font Awesome Free download is licensed under a Creative Commons
|
||||
Attribution 4.0 International License and applies to all icons packaged
|
||||
as SVG and JS file types.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
# Fonts: SIL OFL 1.1 License
|
||||
|
||||
In the Font Awesome Free download, the SIL OFL license applies to all icons
|
||||
packaged as web and desktop font files.
|
||||
|
||||
Copyright (c) 2024 Fonticons, Inc. (https://fontawesome.com)
|
||||
with Reserved Font Name: "Font Awesome".
|
||||
|
||||
This Font Software is licensed under the SIL Open Font License, Version 1.1.
|
||||
This license is copied below, and is also available with a FAQ at:
|
||||
http://scripts.sil.org/OFL
|
||||
|
||||
SIL OPEN FONT LICENSE
|
||||
Version 1.1 - 26 February 2007
|
||||
|
||||
PREAMBLE
|
||||
The goals of the Open Font License (OFL) are to stimulate worldwide
|
||||
development of collaborative font projects, to support the font creation
|
||||
efforts of academic and linguistic communities, and to provide a free and
|
||||
open framework in which fonts may be shared and improved in partnership
|
||||
with others.
|
||||
|
||||
The OFL allows the licensed fonts to be used, studied, modified and
|
||||
redistributed freely as long as they are not sold by themselves. The
|
||||
fonts, including any derivative works, can be bundled, embedded,
|
||||
redistributed and/or sold with any software provided that any reserved
|
||||
names are not used by derivative works. The fonts and derivatives,
|
||||
however, cannot be released under any other type of license. The
|
||||
requirement for fonts to remain under this license does not apply
|
||||
to any document created using the fonts or their derivatives.
|
||||
|
||||
DEFINITIONS
|
||||
"Font Software" refers to the set of files released by the Copyright
|
||||
Holder(s) under this license and clearly marked as such. This may
|
||||
include source files, build scripts and documentation.
|
||||
|
||||
"Reserved Font Name" refers to any names specified as such after the
|
||||
copyright statement(s).
|
||||
|
||||
"Original Version" refers to the collection of Font Software components as
|
||||
distributed by the Copyright Holder(s).
|
||||
|
||||
"Modified Version" refers to any derivative made by adding to, deleting,
|
||||
or substituting — in part or in whole — any of the components of the
|
||||
Original Version, by changing formats or by porting the Font Software to a
|
||||
new environment.
|
||||
|
||||
"Author" refers to any designer, engineer, programmer, technical
|
||||
writer or other person who contributed to the Font Software.
|
||||
|
||||
PERMISSION & CONDITIONS
|
||||
Permission is hereby granted, free of charge, to any person obtaining
|
||||
a copy of the Font Software, to use, study, copy, merge, embed, modify,
|
||||
redistribute, and sell modified and unmodified copies of the Font
|
||||
Software, subject to the following conditions:
|
||||
|
||||
1) Neither the Font Software nor any of its individual components,
|
||||
in Original or Modified Versions, may be sold by itself.
|
||||
|
||||
2) Original or Modified Versions of the Font Software may be bundled,
|
||||
redistributed and/or sold with any software, provided that each copy
|
||||
contains the above copyright notice and this license. These can be
|
||||
included either as stand-alone text files, human-readable headers or
|
||||
in the appropriate machine-readable metadata fields within text or
|
||||
binary files as long as those fields can be easily viewed by the user.
|
||||
|
||||
3) No Modified Version of the Font Software may use the Reserved Font
|
||||
Name(s) unless explicit written permission is granted by the corresponding
|
||||
Copyright Holder. This restriction only applies to the primary font name as
|
||||
presented to the users.
|
||||
|
||||
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
|
||||
Software shall not be used to promote, endorse or advertise any
|
||||
Modified Version, except to acknowledge the contribution(s) of the
|
||||
Copyright Holder(s) and the Author(s) or with their explicit written
|
||||
permission.
|
||||
|
||||
5) The Font Software, modified or unmodified, in part or in whole,
|
||||
must be distributed entirely under this license, and must not be
|
||||
distributed under any other license. The requirement for fonts to
|
||||
remain under this license does not apply to any document created
|
||||
using the Font Software.
|
||||
|
||||
TERMINATION
|
||||
This license becomes null and void if any of the above conditions are
|
||||
not met.
|
||||
|
||||
DISCLAIMER
|
||||
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
|
||||
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
|
||||
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
|
||||
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
|
||||
OTHER DEALINGS IN THE FONT SOFTWARE.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
# Code: MIT License (https://opensource.org/licenses/MIT)
|
||||
|
||||
In the Font Awesome Free download, the MIT license applies to all non-font and
|
||||
non-icon files.
|
||||
|
||||
Copyright 2024 Fonticons, Inc.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
||||
this software and associated documentation files (the "Software"), to deal in the
|
||||
Software without restriction, including without limitation the rights to use, copy,
|
||||
modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
|
||||
and to permit persons to whom the Software is furnished to do so, subject to the
|
||||
following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
|
||||
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
|
||||
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
||||
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
||||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
# Attribution
|
||||
|
||||
Attribution is required by MIT, SIL OFL, and CC BY licenses. Downloaded Font
|
||||
Awesome Free files already contain embedded comments with sufficient
|
||||
attribution, so you shouldn't need to do anything additional when using these
|
||||
files normally.
|
||||
|
||||
We've kept attribution comments terse, so we ask that you do not actively work
|
||||
to remove them from files, especially code. They're a great way for folks to
|
||||
learn about Font Awesome.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
# Brand Icons
|
||||
|
||||
All brand icons are trademarks of their respective owners. The use of these
|
||||
trademarks does not indicate endorsement of the trademark holder by Font
|
||||
Awesome, nor vice versa. **Please do not use brand logos for any purpose except
|
||||
to represent the company, product, or service to which they refer.**
|
||||
BIN
fonts/fa-brands-400.ttf
Normal file
BIN
fonts/fa-brands-400.woff2
Normal file
BIN
fonts/fa-regular-400.ttf
Normal file
BIN
fonts/fa-regular-400.woff2
Normal file
BIN
fonts/fa-solid-900.ttf
Normal file
BIN
fonts/fa-solid-900.woff2
Normal file
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 254 KiB After Width: | Height: | Size: 254 KiB |
|
Before Width: | Height: | Size: 287 KiB After Width: | Height: | Size: 287 KiB |
|
Before Width: | Height: | Size: 694 KiB After Width: | Height: | Size: 694 KiB |
|
Before Width: | Height: | Size: 673 KiB After Width: | Height: | Size: 673 KiB |
|
Before Width: | Height: | Size: 374 KiB After Width: | Height: | Size: 374 KiB |
|
Before Width: | Height: | Size: 689 KiB After Width: | Height: | Size: 689 KiB |
|
Before Width: | Height: | Size: 288 KiB After Width: | Height: | Size: 288 KiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 152 KiB After Width: | Height: | Size: 152 KiB |
|
Before Width: | Height: | Size: 254 KiB After Width: | Height: | Size: 254 KiB |
|
Before Width: | Height: | Size: 216 KiB After Width: | Height: | Size: 216 KiB |
|
Before Width: | Height: | Size: 52 KiB After Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 193 KiB After Width: | Height: | Size: 193 KiB |
|
Before Width: | Height: | Size: 3.2 MiB After Width: | Height: | Size: 3.2 MiB |
8
index.html
Normal file
@@ -0,0 +1,8 @@
|
||||
<!doctype html><html lang=en><head><title>Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Eric X. Liu's Personal Page"><meta name=twitter:description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:url" content="https://ericxliu.me/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Eric X. Liu's Personal Page"><meta property="og:description" content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:locale" content="en"><meta property="og:type" content="website"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><link rel=alternate type=application/rss+xml href=/index.xml title="Eric X. Liu's Personal Page"><meta name=generator content="Hugo 0.154.5"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container centered"><div class=about><div class=avatar><img src=/images/gravatar.png alt=avatar width=200 height=200></div><h1>Eric X. Liu</h1><h2 id=typeit-info></h2><script src=https://unpkg.com/typeit@8.7.1/dist/index.umd.js></script><script>document.addEventListener("DOMContentLoaded",function(){new TypeIt("#typeit-info",{strings:["Software & Performance Engineer @Google","DIY Overlander & Rock Crawler","Tech Enthusiast"],speed:50,loop:!0,breakLines:!1,nextStringDelay:2e3,deleteSpeed:50,startDelay:500,lifeLike:!0}).go()})</script><ul><li><a href=https://git.ericxliu.me/eric aria-label=Git><i class="fa-brands fa-git fa-2x" aria-hidden=true></i></a></li><li><a href=https://www.linkedin.com/in/eric-x-liu-46648b93/ aria-label=linkedin><i class="fa-brands fa-linkedin fa-2x" aria-hidden=true></i></a></li><li><style>#span-17968cae.cloaked-e-mail{display:none}</style> <span class=cloaked-e-mail data-user=cire data-domain=em.uilxcire data-display="PGkgY2xhc3M9ImZhIGZhLWVudmVsb3BlIGZhLTJ4IiBhcmlhLWhpZGRlbj0idHJ1ZSI+PC9pPg==" id=span-17968cae></span>
|
||||
<script id=script-17968cae>var span,scriptTag=document.getElementById("script-17968cae"),link=document.createElement("a"),address="cire".split("").reverse().join("")+"@"+"em.uilxcire".split("").reverse().join("");link.href="mailto:"+address,span=document.getElementById("span-17968cae"),link.innerHTML=atob(span.getAttribute("data-display")),scriptTag.parentElement.insertBefore(link,scriptTag.previousElementSibling),scriptTag.parentElement.removeChild(scriptTag.previousElementSibling)</script></li><li><a href=https://ericxliu.me/index.xml aria-label=RSS rel=alternate type=application/rss+xml><i class="fa-solid fa-rss fa-2x" aria-hidden=true></i></a></li></ul></div></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
95
index.xml
Normal file
@@ -0,0 +1,95 @@
|
||||
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eric X. Liu's Personal Page</title><link>https://ericxliu.me/</link><description>Recent content on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 22 Jan 2026 06:48:07 +0000</lastBuildDate><atom:link href="https://ericxliu.me/index.xml" rel="self" type="application/rss+xml"/><item><title>Hacking a Chinese Car Stereo to fulfill my Knight Rider dreams</title><link>https://ericxliu.me/posts/vibe-coding-from-the-jeep/</link><pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/vibe-coding-from-the-jeep/</guid><description><p>&ldquo;Vibe coding&rdquo; has become my latest obsession. It&rsquo;s that flow state where the tools disappear, and you&rsquo;re just manipulating logic at the speed of thought. Usually, this happens in a high-end IDE like Antigravity. But lately, I&rsquo;ve been trying to answer a childhood dream.</p>
|
||||
<p>Growing up in China before the internet age, my window to the outside world was CCTV-6. Along with <em>Baywatch</em>, one of the first American TV shows I ever watched was <em>Knight Rider</em>. I don&rsquo;t remember the exact plot lines, but the core concept stuck with me forever: KITT. A car that could talk, think, and do things for you.</p></description></item><item><title>How I Built a Blog Agent that Writes About Itself</title><link>https://ericxliu.me/posts/reverse-engineering-antigravity-ide/</link><pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/reverse-engineering-antigravity-ide/</guid><description><p>I&rsquo;ve been spending a lot of time &ldquo;vibe coding&rdquo; in the Antigravity IDE lately. It&rsquo;s an incredible flow state—intense, iterative, and fast. But it has a major flaw: the context is ephemeral. Once the session is over, that rich history of decisions, wrong turns, and &ldquo;aha!&rdquo; moments is locked away in an opaque, internal format.</p>
|
||||
<p>I wanted to capture that value. I wanted a system that could take my chaotic coding sessions and distill them into structured, technical blog posts (like the one you&rsquo;re reading right now).</p></description></item><item><title>Why I Downgraded Magisk to Root My Pixel 2 XL</title><link>https://ericxliu.me/posts/rooting-pixel-2-xl-for-reverse-engineering/</link><pubDate>Wed, 07 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/rooting-pixel-2-xl-for-reverse-engineering/</guid><description><p>For the past few weeks, I&rsquo;ve been stuck in a stalemate with my EcoFlow Bluetooth Protocol Reverse Engineering Project. I have the hci snoop logs, I have the decompiled APK, and I have a strong suspicion about where the authentication logic is hiding. But suspicion isn&rsquo;t proof.</p>
|
||||
<p>Static analysis has its limits. I found the &ldquo;smoking gun&rdquo; function—a native method responsible for encrypting the login payload—but understanding <em>how</em> it constructs that payload within a strict 13-byte limit purely from assembly (ARM64) was proving to be a headache.</p></description></item><item><title>Why Your "Resilient" Homelab is Slower Than a Raspberry Pi</title><link>https://ericxliu.me/posts/debugging-authentik-performance/</link><pubDate>Fri, 02 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/debugging-authentik-performance/</guid><description><p>In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running &ldquo;production&rdquo; at home, there is only one metric that truly matters: <strong>The Wife Acceptance Factor (WAF)</strong>.</p>
|
||||
<p>My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was &ldquo;slow sometimes.&rdquo; She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage.</p></description></item><item><title>How I Got Open WebUI Talking to OpenAI Web Search</title><link>https://ericxliu.me/posts/open-webui-openai-websearch/</link><pubDate>Mon, 29 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/open-webui-openai-websearch/</guid><description><p>OpenAI promised native web search in GPT‑5, but LiteLLM proxy deployments (and by extension Open WebUI) still choke on it—issue <a href="https://github.com/BerriAI/litellm/issues/13042" class="external-link" target="_blank" rel="noopener">#13042</a> tracks the fallout. I needed grounded answers inside Open WebUI anyway, so I built a workaround: route GPT‑5 traffic through the Responses API and mask every <code>web_search_call</code> before the UI ever sees it.</p>
|
||||
<p>This post documents the final setup, the hotfix script that keeps LiteLLM honest, and the tests that prove Open WebUI now streams cited answers without trying to execute the tool itself.</p></description></item><item><title>From Gemini-3-Flash to T5-Gemma-2: A Journey in Distilling a Family Finance LLM</title><link>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</link><pubDate>Sat, 27 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</guid><description><p>Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and &ldquo;wait, was this dinner or <em>vacation</em> dinner?&rdquo; questions.</p>
|
||||
<p>For years, I relied on a rule-based system to categorize our credit card transactions. It worked&hellip; mostly. But maintaining <code>if &quot;UBER&quot; in description and amount &gt; 50</code> style rules is a never-ending battle against the entropy of merchant names and changing habits.</p></description></item><item><title>About</title><link>https://ericxliu.me/about/</link><pubDate>Fri, 19 Dec 2025 22:46:12 -0800</pubDate><guid>https://ericxliu.me/about/</guid><description><img src="https://ericxliu.me/images/about.jpeg" alt="Eric Liu" width="300" style="float: left; margin-right: 1.5rem; margin-bottom: 1rem; border-radius: 8px;"/>
|
||||
<p>Hi, I&rsquo;m <strong>Eric Liu</strong>.</p>
|
||||
<p>I am a <strong>Staff Software Engineer and Tech Lead Manager (TLM)</strong> at <strong>Google</strong>, based in Sunnyvale, CA.</p>
|
||||
<p>My work focuses on <strong>Infrastructure Performance and Customer Engineering</strong>, specifically for <strong>GPUs and TPUs</strong>. I lead teams that bridge the gap between cutting-edge AI hardware and the latest ML models (like Gemini), ensuring optimal performance and reliability at Google Cloud scale. I thrive in the ambiguous space where hardware constraints meet software ambition—whether it&rsquo;s debugging race conditions across thousands of chips or designing API surfaces for next-gen models.</p></description></item><item><title>The Convergence of Fast Weights, Linear Attention, and State Space Models</title><link>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</guid><description><p>Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&ldquo;Fast Weights&rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).</p>
|
||||
<p>This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.</p></description></item><item><title>vAttention</title><link>https://ericxliu.me/posts/vattention/</link><pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/vattention/</guid><description><p>Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While <strong>PagedAttention</strong> (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.</p>
|
||||
<h4 id="the-status-quo-pagedattention-and-software-tables">
|
||||
The Status Quo: PagedAttention and Software Tables
|
||||
<a class="heading-link" href="#the-status-quo-pagedattention-and-software-tables">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h4>
|
||||
<p>Prior to PagedAttention, systems allocated contiguous memory for the maximum possible context length, leading to severe fragmentation and wasted memory. PagedAttention addressed this by chunking the KV cache into non-contiguous blocks, managed by a software-defined &ldquo;page table&rdquo; (the Block Table) [1].</p></description></item><item><title>Setting Up Jellyfin SSO with Authentik: Surviving the Beta</title><link>https://ericxliu.me/posts/jellyfin-sso-with-authentik/</link><pubDate>Sat, 15 Nov 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/jellyfin-sso-with-authentik/</guid><description><p>I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren&rsquo;t immediately obvious.</p>
|
||||
<h2 id="the-setup">
|
||||
The Setup
|
||||
<a class="heading-link" href="#the-setup">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h2>
|
||||
<p>The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent.</p></description></item><item><title>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</title><link>https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/</link><pubDate>Sat, 04 Oct 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/</guid><description><h2 id="introduction">
|
||||
Introduction
|
||||
<a class="heading-link" href="#introduction">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h2>
|
||||
<p>NVIDIA&rsquo;s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there&rsquo;s a catch—one that reveals a fundamental tension in modern edge AI hardware design.</p>
|
||||
<p>After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device&rsquo;s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn&rsquo;t computation—it&rsquo;s memory bandwidth. This isn&rsquo;t just a quirk of one device; it&rsquo;s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.</p></description></item><item><title>Flashing Jetson Orin Nano in Virtualized Environments</title><link>https://ericxliu.me/posts/flashing-jetson-orin-nano-in-virtualized-environments/</link><pubDate>Thu, 02 Oct 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/flashing-jetson-orin-nano-in-virtualized-environments/</guid><description><h1 id="flashing-jetson-orin-nano-in-virtualized-environments">
|
||||
Flashing Jetson Orin Nano in Virtualized Environments
|
||||
<a class="heading-link" href="#flashing-jetson-orin-nano-in-virtualized-environments">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h1>
|
||||
<h2 id="introduction">
|
||||
Introduction
|
||||
<a class="heading-link" href="#introduction">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h2>
|
||||
<p>Flashing NVIDIA Jetson devices remotely presents unique challenges when the host machine is virtualized. This article documents the technical challenges, failures, and eventual success of flashing a Jetson Orin Nano Super developer kit using NVIDIA SDK Manager in various virtualized environments, specifically focusing on QEMU/KVM virtual machines and LXC containers on Proxmox VE.</p></description></item><item><title>OpenWrt: Fix WireGuard Connectivity with MWAN3 by Excluding the VPN Endpoint</title><link>https://ericxliu.me/posts/openwrt-mwan3-wireguard-endpoint-exclusion/</link><pubDate>Sun, 28 Sep 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/openwrt-mwan3-wireguard-endpoint-exclusion/</guid><description><h3 id="overview">
|
||||
Overview
|
||||
<a class="heading-link" href="#overview">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h3>
|
||||
<p>When using WireGuard together with MWAN3 on OpenWrt, the tunnel can fail to establish or flap when the peer&rsquo;s IP is routed into the tunnel itself. This is a classic routing bootstrap problem: WireGuard wants to route 0.0.0.0/0 into the tunnel, but the UDP packets to the peer&rsquo;s public endpoint also get captured, so they never reach the Internet to bring the tunnel up.</p></description></item><item><title>UniFi VLAN Migration to Zone-Based Architecture</title><link>https://ericxliu.me/posts/unifi-vlan-migration-to-zone-based-architecture/</link><pubDate>Mon, 22 Sep 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/unifi-vlan-migration-to-zone-based-architecture/</guid><description><p>Embarking on a network migration to a properly segmented VLAN architecture is a rite of passage for any serious home lab or small business operator. The goal is clear: improve security and organization by separating traffic. However, the path from a flat network to a segmented one is often paved with subtle but critical configuration details that can lead to hours of frustrating troubleshooting.</p>
|
||||
<p>This article documents that journey. It details the pitfalls encountered, the core networking concepts that were essential to understand, and the best practices that ultimately led to a stable, secure, and logical network design built on a zone-based firewall model.</p></description></item><item><title>Quantization in LLMs</title><link>https://ericxliu.me/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/quantization-in-llms/</guid><description><p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.</p></description></item><item><title>Breville Barista Pro Maintenance</title><link>https://ericxliu.me/posts/breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/breville-barista-pro-maintenance/</guid><description><p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.</p>
|
||||
<h4 id="understanding-the-two-primary-maintenance-cycles">
|
||||
<strong>Understanding the Two Primary Maintenance Cycles</strong>
|
||||
<a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h4>
|
||||
<p>The Breville Barista Pro has two distinct, automated maintenance procedures: the <strong>Cleaning (Flush) Cycle</strong> and the <strong>Descale Cycle</strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.</p></description></item><item><title>Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian</title><link>https://ericxliu.me/posts/secure-boot-dkms-and-mok-on-proxmox-debian/</link><pubDate>Sat, 09 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/secure-boot-dkms-and-mok-on-proxmox-debian/</guid><description><p>I hit an issue where all GPU Operator pods on one node were stuck in Init after migrating from Legacy BIOS to UEFI. The common error was NVIDIA components waiting for “toolkit-ready,” while the toolkit init container looped with:</p>
|
||||
<ul>
|
||||
<li>nvidia-smi failed to communicate with the NVIDIA driver</li>
|
||||
<li>modprobe nvidia → “Key was rejected by service”</li>
|
||||
</ul>
|
||||
<p>That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.</p></description></item><item><title>Beyond Words: How RVQ Teaches LLMs to See and Hear</title><link>https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/</guid><description><p>Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?</p>
|
||||
<p>The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is <strong>Residual Vector Quantization (RVQ)</strong>.</p></description></item><item><title>Supabase Deep Dive: It's Not Magic, It's Just Postgres</title><link>https://ericxliu.me/posts/supabase-deep-dive/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/supabase-deep-dive/</guid><description><p>In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what&rsquo;s really going on.</p>
|
||||
<p>Supabase enters this space with a radically different philosophy: <strong>transparency</strong>. It provides the convenience of a BaaS, but it’s built on the world&rsquo;s most trusted relational database: PostgreSQL. The &ldquo;magic&rdquo; isn&rsquo;t a proprietary black box; it&rsquo;s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.</p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>https://ericxliu.me/posts/ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/ppo-for-language-models/</guid><description><p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p>
|
||||
<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.
|
||||
<img src="https://ericxliu.me/images/ppo-for-language-models/7713bd3ecf27442e939b9190fa08165d.png" alt="S3 File"></p></description></item><item><title>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice</title><link>https://ericxliu.me/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</link><pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</guid><description><p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &ldquo;experts&rdquo;) to specialize in different types of inputs. A &ldquo;gating network&rdquo; or &ldquo;router&rdquo; learns to dispatch each input (or &ldquo;token&rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.</p>
|
||||
<h3 id="1-challenge-non-differentiability-of-routing-functions">
|
||||
1. Challenge: Non-Differentiability of Routing Functions
|
||||
<a class="heading-link" href="#1-challenge-non-differentiability-of-routing-functions">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h3>
|
||||
<p><strong>The Problem:</strong>
|
||||
Many routing mechanisms, especially &ldquo;Top-K routing,&rdquo; involve a discrete, hard selection process. A common function is <code>KeepTopK(v, k)</code>, which selects the top <code>k</code> scoring elements from a vector <code>v</code> and sets others to $-\infty$ or $0$.</p></description></item><item><title>An Architectural Deep Dive of T5</title><link>https://ericxliu.me/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/</link><pubDate>Sun, 01 Jun 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/</guid><description><p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p>
|
||||
<p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p></description></item><item><title>Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso</title><link>https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/</link><pubDate>Thu, 01 May 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/</guid><description><p>Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.</p>
|
||||
<p>Our overarching philosophy is simple: <strong>isolate and change only one variable at a time.</strong> While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your <strong>grind size</strong> is your most powerful lever.</p></description></item><item><title>Transformer's Core Mechanics</title><link>https://ericxliu.me/posts/transformer-s-core-mechanics/</link><pubDate>Tue, 01 Apr 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/transformer-s-core-mechanics/</guid><description><p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &ldquo;channels&rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.</p>
|
||||
<h3 id="1-the-channel-a-foundational-view-of-d_model">
|
||||
1. The &ldquo;Channel&rdquo;: A Foundational View of <code>d_model</code>
|
||||
<a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h3>
|
||||
<p>In deep learning, a &ldquo;channel&rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&rsquo;s primary embedding dimension, commonly referred to as <code>d_model</code>.</p></description></item><item><title>Some useful files</title><link>https://ericxliu.me/posts/useful/</link><pubDate>Mon, 26 Oct 2020 04:14:43 +0000</pubDate><guid>https://ericxliu.me/posts/useful/</guid><description><ul>
|
||||
<li><a href="https://ericxliu.me/rootCA.crt" >rootCA.pem</a></li>
|
||||
</ul></description></item></channel></rss>
|
||||
@@ -0,0 +1 @@
|
||||
const body=document.body,darkModeToggle=document.getElementById("dark-mode-toggle"),darkModeMediaQuery=window.matchMedia("(prefers-color-scheme: dark)");localStorage.getItem("colorscheme")?setTheme(localStorage.getItem("colorscheme")):setTheme(body.classList.contains("colorscheme-light")||body.classList.contains("colorscheme-dark")?body.classList.contains("colorscheme-dark")?"dark":"light":darkModeMediaQuery.matches?"dark":"light"),darkModeToggle&&darkModeToggle.addEventListener("click",()=>{let e=body.classList.contains("colorscheme-dark")?"light":"dark";setTheme(e),rememberTheme(e)}),darkModeMediaQuery.addListener(e=>{setTheme(e.matches?"dark":"light")}),document.addEventListener("DOMContentLoaded",function(){let e=document.querySelector(".preload-transitions");e.classList.remove("preload-transitions")});function setTheme(e){body.classList.remove("colorscheme-auto");let n=e==="dark"?"light":"dark";body.classList.remove("colorscheme-"+n),body.classList.add("colorscheme-"+e),document.documentElement.style["color-scheme"]=e;function t(e){return new Promise(t=>{if(document.querySelector(e))return t(document.querySelector(e));const n=new MutationObserver(s=>{document.querySelector(e)&&(t(document.querySelector(e)),n.disconnect())});n.observe(document.body,{childList:!0,subtree:!0})})}if(e==="dark"){const e={type:"set-theme",theme:"github-dark"};t(".utterances-frame").then(t=>{t.contentWindow.postMessage(e,"https://utteranc.es")})}else{const e={type:"set-theme",theme:"github-light"};t(".utterances-frame").then(t=>{t.contentWindow.postMessage(e,"https://utteranc.es")})}function s(e){const t=document.querySelector("iframe.giscus-frame");if(!t)return;t.contentWindow.postMessage({giscus:e},"https://giscus.app")}s({setConfig:{theme:e}});const o=new Event("themeChanged");document.dispatchEvent(o)}function rememberTheme(e){localStorage.setItem("colorscheme",e)}
|
||||
@@ -1,13 +0,0 @@
|
||||
<!-- CSP OVERRIDE ACTIVE -->
|
||||
{{ $policy := "default-src 'self';" }}
|
||||
{{ if not hugo.IsServer }}
|
||||
{{ $policy = "upgrade-insecure-requests; block-all-mixed-content; default-src 'self';" }}
|
||||
{{ end }}
|
||||
{{ $scriptsrc := printf "%s https://unpkg.com" (delimit .Site.Params.csp.scriptsrc " ") }}
|
||||
{{ printf `
|
||||
<meta http-equiv="Content-Security-Policy"
|
||||
content="%s child-src %s; font-src %s; form-action %s; frame-src %s; img-src %s; object-src %s; style-src %s; script-src %s; connect-src %s;">
|
||||
` $policy (delimit .Site.Params.csp.childsrc " ") (delimit .Site.Params.csp.fontsrc " ") (delimit
|
||||
.Site.Params.csp.formaction " ") (delimit .Site.Params.csp.framesrc " ") (delimit .Site.Params.csp.imgsrc " ") (delimit
|
||||
.Site.Params.csp.objectsrc " ") (delimit .Site.Params.csp.stylesrc " ") $scriptsrc (delimit .Site.Params.csp.connectsrc
|
||||
" ") | safeHTML }}
|
||||
@@ -1,21 +0,0 @@
|
||||
<h1>{{ .Site.Params.author }}</h1>
|
||||
|
||||
{{ if .Site.Params.info }}
|
||||
<h2 id="typeit-info"></h2>
|
||||
|
||||
<script src="https://unpkg.com/typeit@8.7.1/dist/index.umd.js"></script>
|
||||
<script>
|
||||
document.addEventListener("DOMContentLoaded", function () {
|
||||
new TypeIt("#typeit-info", {
|
||||
strings: {{ .Site.Params.info | jsonify | safeJS }},
|
||||
speed: 50,
|
||||
loop: true,
|
||||
breakLines: false,
|
||||
nextStringDelay: 2000,
|
||||
deleteSpeed: 50,
|
||||
startDelay: 500,
|
||||
lifeLike: true
|
||||
}).go();
|
||||
});
|
||||
</script>
|
||||
{{ end }}
|
||||
@@ -1,48 +0,0 @@
|
||||
{{/* Get address, protocol and other parameters */}}
|
||||
{{- $address := .address -}}
|
||||
{{- $protocol := .protocol | default "mailto" -}}
|
||||
{{- $class := .class -}}
|
||||
{{- $displaytext := .display -}}
|
||||
{{- $parts := split $address "@" -}}
|
||||
{{- $user := (index $parts 0) -}}
|
||||
{{- $domain := (index $parts 1) | default "" -}}
|
||||
{{- $query := .query | default "" -}}
|
||||
{{/* Compute md5 fingerprint */}}
|
||||
{{- $fingerprint := md5 (print $address $protocol (index (seq 999 | shuffle) 0)) | truncate 8 "" -}}
|
||||
{{/* Hide the placeholder span when display text is provided (e.g., icons) */}}
|
||||
{{- if $displaytext }}
|
||||
<style>
|
||||
#span-{{ $fingerprint }}.cloaked-e-mail {
|
||||
display: none;
|
||||
}
|
||||
</style>
|
||||
{{- else }}
|
||||
{{/* Set via CSS what is displayed when Javascript is disabled. Query is never displayed */}}
|
||||
<style>
|
||||
#span-{{ $fingerprint }}.cloaked-e-mail:before {
|
||||
content:{{ with $domain }}attr(data-domain) "\0040" {{ end }}attr(data-user);
|
||||
unicode-bidi:bidi-override;
|
||||
direction:rtl;
|
||||
}
|
||||
</style>
|
||||
{{- end }}
|
||||
 <span class="cloaked-e-mail" data-user="{{ range $index := seq (sub (len $user) 1) 0}}{{ substr $user $index 1}}{{ end }}"{{ with $domain }} data-domain="{{ range $index := seq (sub (len $domain) 1) 0}}{{ substr $domain $index 1}}{{ end }}"{{ end }}{{ with $displaytext }} data-display="{{ . | base64Encode }}"{{ end }} id="span-{{ $fingerprint }}"></span> 
|
||||
{{/* Alter display with Javascript by changing DOM */}}
|
||||
<script id="script-{{ $fingerprint }}">
|
||||
var scriptTag = document.getElementById("script-{{ $fingerprint }}");
|
||||
var link = document.createElement("a");
|
||||
var address = "{{ range $index := seq (sub (len $user) 1) 0}}{{ substr $user $index 1}}{{ end }}".split('').reverse().join(''){{ with $domain }} + "@" + "{{ range $index := seq (sub (len $domain) 1) 0}}{{ substr $domain $index 1}}{{ end }}".split('').reverse().join(''){{ with $query }} + "?" + "{{ range $index := seq (sub (len $query) 1) 0}}{{ substr $query $index 1}}{{ end }}".split('').reverse().join(''){{ end }}{{ end }};
|
||||
link.href = {{ $protocol }} + ":" + address;
|
||||
{{- with $displaytext }}
|
||||
var span = document.getElementById("span-{{ $fingerprint }}");
|
||||
link.innerHTML = atob(span.getAttribute("data-display"));
|
||||
{{- else }}
|
||||
link.innerText = address.split('?')[0];
|
||||
{{- end }}
|
||||
{{- with $class }}
|
||||
link.className = "{{ $class }}";
|
||||
{{- end }}
|
||||
scriptTag.parentElement.insertBefore(link, scriptTag.previousElementSibling);
|
||||
scriptTag.parentElement.removeChild(scriptTag.previousElementSibling);
|
||||
</script>
|
||||
{{/* The end */}}
|
||||
@@ -1,22 +0,0 @@
|
||||
{{ if not .Site.Params.hideFooter | default false }}
|
||||
<footer class="footer">
|
||||
<section class="container">
|
||||
{{ with .Site.Params.footerContent | safeHTML }}
|
||||
<p>{{ . }}</p>
|
||||
{{ end }}
|
||||
{{ if not .Site.Params.hideCopyright }}
|
||||
©
|
||||
{{ if (and (.Site.Params.since) (lt .Site.Params.since now.Year)) }}
|
||||
{{ .Site.Params.since }} -
|
||||
{{ end }}
|
||||
{{ now.Year }}
|
||||
{{ with .Site.Params.author }} {{ . }} {{ end }}
|
||||
{{ end }}
|
||||
{{ if not .Site.Params.hideCredits }}
|
||||
{{ if not .Site.Params.hideCopyright }} · {{ end }}
|
||||
{{ i18n "powered_by" }} <a href="https://gohugo.io/">Hugo</a> & <a href="https://github.com/luizdepra/hugo-coder/">Coder</a>.
|
||||
{{ end }}
|
||||
[commit]
|
||||
</section>
|
||||
</footer>
|
||||
{{ end }}
|
||||
@@ -1,31 +0,0 @@
|
||||
{{ partial "head/meta-tags.html" . }}
|
||||
<link rel="preload" href="/fonts/fa-solid-900.woff2" as="font" type="font/woff2" crossorigin>
|
||||
<link rel="preload" href="/fonts/fa-brands-400.woff2" as="font" type="font/woff2" crossorigin>
|
||||
|
||||
{{ if .Params.canonicalUrl }}
|
||||
<link rel="canonical" href="{{ .Params.canonicalUrl }}">
|
||||
{{ else }}
|
||||
<link rel="canonical" href="{{ .Permalink }}">
|
||||
{{ end }}
|
||||
|
||||
{{ partialCached "head/theme-styles.html" . }}
|
||||
|
||||
{{ partialCached "head/color-scheme.html" . }}
|
||||
|
||||
{{ partialCached "head/custom-styles.html" . }}
|
||||
|
||||
{{ partialCached "head/custom-icons.html" . }}
|
||||
|
||||
{{ partial "head/alternative-output-formats.html" . }}
|
||||
|
||||
{{ if .IsHome }}{{ partial "head/hugo-generator.html" . }}{{ end }}
|
||||
|
||||
{{ partial "head/extensions.html" . }}
|
||||
|
||||
{{ if .Site.Params.adsense.client }}
|
||||
<script async
|
||||
src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client={{ .Site.Params.adsense.client }}"
|
||||
crossorigin="anonymous"></script>
|
||||
{{ end }}
|
||||
|
||||
{{ partial "head/json-ld.html" . }}
|
||||
@@ -1,34 +0,0 @@
|
||||
{{ if .Site.Params.schema }}
|
||||
<script type="application/ld+json">
|
||||
{
|
||||
"@context": "http://schema.org",
|
||||
"@type": "{{ .Site.Params.schema.type }}",
|
||||
"name": "{{ .Site.Params.schema.name }}",
|
||||
"url": "{{ .Site.BaseURL }}",
|
||||
"description": "{{ .Site.Params.schema.description }}",
|
||||
"sameAs": [
|
||||
{{ range $index, $url := .Site.Params.schema.sameAs }}{{ if $index }}, {{ end }}"{{ $url }}"{{ end }}
|
||||
]
|
||||
}
|
||||
</script>
|
||||
{{ end }}
|
||||
|
||||
{{ if .IsPage }}
|
||||
<script type="application/ld+json">
|
||||
{
|
||||
"@context": "http://schema.org",
|
||||
"@type": "BlogPosting",
|
||||
"headline": "{{ .Title }}",
|
||||
"genre": "{{ .Params.categories | default "Blog" }}",
|
||||
"wordcount": "{{ .WordCount }}",
|
||||
"url": "{{ .Permalink }}",
|
||||
"datePublished": "{{ .Date.Format "2006-01-02T15:04:05-07:00" }}",
|
||||
"dateModified": "{{ .Lastmod.Format "2006-01-02T15:04:05-07:00" }}",
|
||||
"description": "{{ .Description | default .Summary }}",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "{{ .Site.Params.author }}"
|
||||
}
|
||||
}
|
||||
</script>
|
||||
{{ end }}
|
||||
@@ -1,9 +0,0 @@
|
||||
{{ if and (isset .Site.Params "avatarurl") (not (isset .Site.Params "gravatar")) }}
|
||||
{{ with .Site.Params.avatarURL }}
|
||||
<div class="avatar"><img src="{{ . | relURL }}" alt="avatar" width="200" height="200"></div>
|
||||
{{ end }}
|
||||
{{ end }}
|
||||
{{ with .Site.Params.gravatar }}
|
||||
<div class="avatar"><img src="https://www.gravatar.com/avatar/{{md5 .}}?s=240&d=mp" alt="gravatar" width="200"
|
||||
height="200"></div>
|
||||
{{ end }}
|
||||
@@ -1,28 +0,0 @@
|
||||
{{ with .Site.Params.social }}
|
||||
<ul>
|
||||
{{ range sort . "weight" }}
|
||||
{{ if .icon }}
|
||||
<li>
|
||||
{{ if .email }}
|
||||
{{ $iconHTML := printf "<i class=\"%s\" aria-hidden=\"true\"></i>" .icon }}
|
||||
{{ partial "cloakemail" (dict "address" .email "protocol" "mailto" "display" $iconHTML) }}
|
||||
{{ else }}
|
||||
<a href="{{ .url | safeURL }}" aria-label="{{ .name }}" {{ if .rel }}rel="{{ .rel }}" {{ end }} {{ if .target
|
||||
}}target="{{ .target }}" {{ end }} {{ if .type }}type="{{ .type }}" {{ end }}>
|
||||
<i class="{{ .icon }}" aria-hidden="true"></i>
|
||||
</a>
|
||||
{{ end }}
|
||||
</li>
|
||||
{{ else }}
|
||||
<li>
|
||||
{{ if .email }}
|
||||
{{ partial "cloakemail" (dict "address" .email "protocol" "mailto" "display" .name) }}
|
||||
{{ else }}
|
||||
<a href="{{ .url | safeURL }}" aria-label="{{ .name }}" {{ if .rel }}rel="{{ .rel }}" {{ end }} {{ if .target
|
||||
}}target="{{ .target }}" {{ end }}>{{ .name }}</a>
|
||||
{{ end }}
|
||||
</li>
|
||||
{{ end }}
|
||||
{{ end }}
|
||||
</ul>
|
||||
{{ end }}
|
||||
65
posts/benchmarking-llms-on-jetson-orin-nano/index.html
Normal file
@@ -0,0 +1,65 @@
|
||||
<!doctype html><html lang=en><head><title>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI) · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="
|
||||
Introduction
|
||||
|
||||
|
||||
Link to heading
|
||||
|
||||
|
||||
NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
|
||||
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"><meta name=twitter:description content="Introduction Link to heading NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
|
||||
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta property="og:url" content="https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"><meta property="og:description" content="Introduction Link to heading NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
|
||||
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-10-04T00:00:00+00:00"><meta property="article:modified_time" content="2026-01-10T20:10:48+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Why Your Jetson Orin Nano\u0027s 40 TOPS Goes Unused (And What That Means for Edge AI)","genre":"Blog","wordcount":"1866","url":"https:\/\/ericxliu.me\/posts\/benchmarking-llms-on-jetson-orin-nano\/","datePublished":"2025-10-04T00:00:00\u002b00:00","dateModified":"2026-01-10T20:10:48\u002b00:00","description":"\u003ch2 id=\u0022introduction\u0022\u003e\n Introduction\n \u003ca class=\u0022heading-link\u0022 href=\u0022#introduction\u0022\u003e\n \u003ci class=\u0022fa-solid fa-link\u0022 aria-hidden=\u0022true\u0022 title=\u0022Link to heading\u0022\u003e\u003c\/i\u003e\n \u003cspan class=\u0022sr-only\u0022\u003eLink to heading\u003c\/span\u003e\n \u003c\/a\u003e\n\u003c\/h2\u003e\n\u003cp\u003eNVIDIA\u0026rsquo;s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there\u0026rsquo;s a catch—one that reveals a fundamental tension in modern edge AI hardware design.\u003c\/p\u003e\n\u003cp\u003eAfter running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device\u0026rsquo;s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn\u0026rsquo;t computation—it\u0026rsquo;s memory bandwidth. This isn\u0026rsquo;t just a quirk of one device; it\u0026rsquo;s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
|
||||
<time datetime=2025-10-04T00:00:00Z>October 4, 2025
|
||||
</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
|
||||
9-minute read</span></div></div></header><div class=post-content><h2 id=introduction>Introduction
|
||||
<a class=heading-link href=#introduction><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.</p><p>After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.</p><h2 id=the-hardware-what-were-working-with>The Hardware: What We’re Working With
|
||||
<a class=heading-link href=#the-hardware-what-were-working-with><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>The NVIDIA Jetson Orin Nano 8GB I tested features:</p><ul><li><strong>GPU</strong>: NVIDIA Ampere architecture with 1024 CUDA cores and 32 Tensor Cores</li><li><strong>Compute Performance</strong>: 40 TOPS (INT8), 10 TFLOPS (FP16), 5 TFLOPS (FP32)</li><li><strong>Memory</strong>: 8GB LPDDR5 unified memory with 68 GB/s bandwidth</li><li><strong>Available VRAM</strong>: Approximately 5.2GB after OS overhead</li><li><strong>CPU</strong>: 6-core ARM Cortex-A78AE (ARMv8.2, 64-bit)</li><li><strong>TDP</strong>: 7-25W configurable</li></ul><p>The unified memory architecture is a double-edged sword: CPU and GPU share the same physical memory pool, which eliminates PCIe transfer overhead but also means you’re working with just 5.2GB of usable VRAM after the OS takes its share. This constraint shapes everything about LLM deployment on this device.</p><h2 id=testing-methodology>Testing Methodology
|
||||
<a class=heading-link href=#testing-methodology><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><h3 id=the-models>The Models
|
||||
<a class=heading-link href=#the-models><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>I tested seven models ranging from 0.5B to 5.4B parameters—essentially the entire practical deployment range for this hardware. The selection covered two inference backends (Ollama and vLLM) and various quantization strategies:</p><p><strong>Ollama-served models (with quantization):</strong></p><ul><li>Gemma 3 1B (Q4_K_M, 815MB)</li><li>Gemma 3n E2B (Q4_K_M, 3.5GB, 5.44B total params, 2B effective)</li><li>Qwen 2.5 0.5B (Q4_K_M, 350MB)</li><li>Qwen 3 0.6B (FP8, 600MB)</li></ul><p><strong>vLLM-served models (minimal/no quantization):</strong></p><ul><li>google/gemma-3-1b-it (FP16, 2GB)</li><li>Qwen/Qwen2.5-0.5B-Instruct (FP16, 1GB)</li><li>Qwen/Qwen3-0.6B-FP8 (FP8, 600MB)</li></ul><h3 id=the-testing-process>The Testing Process
|
||||
<a class=heading-link href=#the-testing-process><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>Each model faced 10-12 prompts of varying complexity—from simple arithmetic to technical explanations about LLMs themselves. All tests ran with batch size = 1, simulating a single user interacting with a local chatbot—the typical edge deployment scenario. Out of 84 planned tests, 66 completed successfully (78.6% success rate). The failures? Mostly out-of-memory crashes on larger models and occasional inference engine instability.</p><h3 id=understanding-the-limits-roofline-analysis>Understanding the Limits: Roofline Analysis
|
||||
<a class=heading-link href=#understanding-the-limits-roofline-analysis><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>To understand where performance hits its ceiling, I applied roofline analysis—a method that reveals whether a workload is compute-bound (limited by processing power) or memory-bound (limited by data transfer speed). For each model, I calculated:</p><ul><li><strong>FLOPs per token</strong>: Approximately 2 × total_parameters (accounting for matrix multiplications in forward pass)</li><li><strong>Bytes per token</strong>: model_size × 1.1 (including 10% overhead for activations and KV cache)</li><li><strong>Operational Intensity (OI)</strong>: FLOPs per token / Bytes per token</li><li><strong>Theoretical performance</strong>: min(compute_limit, bandwidth_limit)</li></ul><p>The roofline model works by comparing a workload’s operational intensity (how many calculations you do per byte of data moved) against the device’s balance point. If your operational intensity is too low, you’re bottlenecked by memory bandwidth—and as we’ll see, that’s exactly what happens with LLM inference.</p><p><img src=/images/benchmarking-llms-on-jetson-orin-nano/16d64bdc9cf14b05b7c40c4718b8091b.png alt="S3 File"></p><h2 id=the-results-speed-and-efficiency>The Results: Speed and Efficiency
|
||||
<a class=heading-link href=#the-results-speed-and-efficiency><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><h3 id=what-actually-runs-fast>What Actually Runs Fast
|
||||
<a class=heading-link href=#what-actually-runs-fast><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>Here’s how the models ranked by token generation speed:</p><table><thead><tr><th>Rank</th><th>Model</th><th>Backend</th><th>Avg Speed (t/s)</th><th>Std Dev</th><th>Success Rate</th></tr></thead><tbody><tr><td>1</td><td>qwen3:0.6b</td><td>Ollama</td><td>38.84</td><td>1.42</td><td>100%</td></tr><tr><td>2</td><td>qwen2.5:0.5b</td><td>Ollama</td><td>35.24</td><td>2.72</td><td>100%</td></tr><tr><td>3</td><td>gemma3:1b</td><td>Ollama</td><td>26.33</td><td>2.56</td><td>100%</td></tr><tr><td>4</td><td>Qwen/Qwen2.5-0.5B-Instruct</td><td>vLLM</td><td>15.18</td><td>2.15</td><td>100%</td></tr><tr><td>5</td><td>Qwen/Qwen3-0.6B-FP8</td><td>vLLM</td><td>12.81</td><td>0.36</td><td>100%</td></tr><tr><td>6</td><td>gemma3n:e2b</td><td>Ollama</td><td>8.98</td><td>1.22</td><td>100%</td></tr><tr><td>7</td><td>google/gemma-3-1b-it</td><td>vLLM</td><td>4.59</td><td>1.52</td><td>100%</td></tr></tbody></table><p>The standout finding: quantized sub-1B models hit 25-40 tokens/second, with Ollama consistently outperforming vLLM by 2-6× thanks to aggressive quantization and edge-optimized execution. These numbers align well with independent benchmarks from NVIDIA’s Jetson AI Lab (Llama 3.2 3B at 27.7 t/s, SmolLM2 at 41 t/s), confirming this is typical performance for the hardware class.
|
||||
<img src=/images/benchmarking-llms-on-jetson-orin-nano/ee04876d75d247f9b27a647462555777.png alt="S3 File"></p><h3 id=responsiveness-first-token-latency>Responsiveness: First Token Latency
|
||||
<a class=heading-link href=#responsiveness-first-token-latency><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>The time to generate the first output token—a critical metric for interactive applications—varied significantly:</p><ul><li>qwen3:0.6b (Ollama): 0.522 seconds</li><li>gemma3:1b (Ollama): 1.000 seconds</li><li>qwen2.5:0.5b (Ollama): 1.415 seconds</li><li>gemma3n:e2b (Ollama): 1.998 seconds</li></ul><p>Smaller, quantized models get to that first token faster—exactly what you want for a chatbot or interactive assistant where perceived responsiveness matters as much as raw throughput.</p><h3 id=the-memory-bottleneck-revealed>The Memory Bottleneck Revealed
|
||||
<a class=heading-link href=#the-memory-bottleneck-revealed><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>When I compared actual performance against theoretical limits, the results were striking:</p><table><thead><tr><th>Model</th><th>Theoretical (t/s)</th><th>Actual (t/s)</th><th>Efficiency</th><th>Bottleneck</th><th>OI (FLOPs/byte)</th></tr></thead><tbody><tr><td>gemma3:1b</td><td>109.90</td><td>26.33</td><td>24.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>qwen3:0.6b</td><td>103.03</td><td>38.84</td><td>37.7%</td><td>Memory</td><td>1.82</td></tr><tr><td>qwen2.5:0.5b</td><td>219.80</td><td>35.24</td><td>16.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>gemma3n:e2b</td><td>54.95</td><td>8.98</td><td>16.3%</td><td>Memory</td><td>3.23</td></tr><tr><td>google/gemma-3-1b-it</td><td>30.91</td><td>4.59</td><td>14.9%</td><td>Memory</td><td>0.91</td></tr><tr><td>Qwen/Qwen3-0.6B-FP8</td><td>103.03</td><td>12.81</td><td>12.4%</td><td>Memory</td><td>1.82</td></tr><tr><td>Qwen/Qwen2.5-0.5B-Instruct</td><td>61.82</td><td>15.18</td><td>24.6%</td><td>Memory</td><td>0.91</td></tr></tbody></table><p>Every single model is memory-bound in this single-stream inference scenario. Average hardware efficiency sits at just 20.8%—meaning the computational units spend most of their time waiting for data rather than crunching numbers. That advertised 40 TOPS? Largely untapped when generating one token at a time for a single user.
|
||||
<img src=/images/benchmarking-llms-on-jetson-orin-nano/ee04876d75d247f9b27a647462555777.png alt="S3 File"></p><h2 id=what-this-actually-means>What This Actually Means
|
||||
<a class=heading-link href=#what-this-actually-means><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><h3 id=why-memory-bandwidth-dominates-in-single-stream-inference>Why Memory Bandwidth Dominates (in Single-Stream Inference)
|
||||
<a class=heading-link href=#why-memory-bandwidth-dominates-in-single-stream-inference><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>The roofline numbers tell a clear story: operational intensity ranges from 0.91 to 3.23 FLOPs/byte across all tested models during single-token generation (batch size = 1). To actually saturate those 1024 CUDA cores and hit compute-bound operation, you’d need an operational intensity around 147 FLOPs/byte at the device’s 68 GB/s memory bandwidth.</p><p>In practice, for a model to actually become compute-bound on this device during single-stream inference, it would need an operational intensity exceeding:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-fallback data-lang=fallback><span style=display:flex><span>OI_threshold = Peak_Compute / Memory_Bandwidth
|
||||
</span></span><span style=display:flex><span> = (40 × 10^12 ops/s) / (68 × 10^9 bytes/s)
|
||||
</span></span><span style=display:flex><span> = 588 FLOPs/byte
|
||||
</span></span></code></pre></div><p>Single-stream autoregressive decoding falls 100-600× short of this threshold because each token generation requires loading the entire model from memory (matrix-vector multiplication) while performing only ~2 FLOPs per parameter. The compute units are idle most of the time, simply waiting for model weights and activations to arrive from memory.</p><p>Note: Production LLM serving with large batch sizes (32-256 requests) changes this dynamic dramatically—batching transforms matrix-vector operations into matrix-matrix multiplications, increasing operational intensity by 30-250× and making workloads compute-bound. However, edge devices serving single users cannot exploit this optimization.</p><p>The largest model tested—gemma3n:e2b at 3.5GB quantized (5.44B total parameters, 2B effective)—shows only 16.3% efficiency, similar to other quantized models. Despite being the largest model, Q4_K_M quantization keeps its memory footprint manageable, resulting in similar operational intensity (3.23 FLOPs/byte) to the other INT4-quantized models. Its MatFormer architecture with selective parameter activation (only 2B of 5.44B params active per token) actually helps reduce memory traffic, though this benefit is partially offset by the overhead of routing logic.</p><h3 id=what-this-means-for-edge-deployment>What This Means for Edge Deployment
|
||||
<a class=heading-link href=#what-this-means-for-edge-deployment><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>The performance gap between Ollama and vLLM (2.3-5.7×) tells us something important about optimization priorities for single-user edge devices:</p><p><strong>Qwen 2.5 0.5B:</strong> Ollama (Q4_K_M, 350MB) at 35.24 t/s vs vLLM (FP16, 1GB) at 15.18 t/s—2.32× faster
|
||||
<strong>Qwen 3 0.6B:</strong> Ollama (FP8) at 38.84 t/s vs vLLM (FP8) at 12.81 t/s—3.03× faster despite identical quantization
|
||||
<strong>Gemma 3 1B:</strong> Ollama (Q4_K_M, 815MB) at 26.33 t/s vs vLLM (FP16, 2GB) at 4.59 t/s—5.74× faster</p><p>In single-stream scenarios, quantization delivers near-linear performance gains by directly attacking the memory bandwidth bottleneck. Q4_K_M quantization (4.5 bits/parameter) hits a sweet spot between model quality and speed. Going lower to INT2 might help further, but you’ll need to carefully evaluate output quality.</p><p>The real insight: Ollama’s edge-first design philosophy (GGUF format, streamlined execution, optimized kernels from llama.cpp) is fundamentally better aligned with single-stream, memory-constrained edge scenarios. vLLM’s datacenter features—continuous batching, PagedAttention, tensor parallelism—add overhead without providing benefits when serving individual users on unified memory architectures. These features shine in multi-user production serving where batching can be exploited, but hurt performance in the single-stream case.</p><p><strong>What you should actually do</strong>: Stick with Ollama or TensorRT-LLM using Q4_K_M/INT4 quantized models in GGUF format. Target the 0.5-1B parameter range (under 3GB) to leave headroom for KV cache. Focus your optimization efforts on memory access patterns and bandwidth reduction. Watch for emerging techniques like INT4 AWQ, sparse attention, and quantized KV caches.</p><h3 id=room-for-improvement>Room for Improvement
|
||||
<a class=heading-link href=#room-for-improvement><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>The 20.8% average efficiency might sound terrible, but it’s actually typical for edge AI devices running single-stream inference. Datacenter GPUs hit 60-80% efficiency on optimized workloads—but that’s typically with large batch sizes that increase operational intensity. In comparable single-stream scenarios, even high-end GPUs see similar efficiency drops. Edge devices commonly land in the 15-40% range due to architectural tradeoffs and memory bandwidth constraints relative to their compute capability.</p><p>Three factors explain the gap:</p><ol><li><strong>Architecture</strong>: Unified memory sacrifices bandwidth for integration simplicity. The 4MB L2 cache and 7-15W TDP limit further constrain performance.</li><li><strong>Software maturity</strong>: Edge inference frameworks lag behind their datacenter counterparts in optimization.</li><li><strong>Runtime overhead</strong>: Quantization/dequantization operations, Python abstractions, and non-optimized kernels all add up.</li></ol><p>The consistent 16-24% efficiency across most models suggests there’s room for 2-3× speedups through better software optimization—particularly in memory access patterns and kernel implementations. But fundamental performance leaps will require hardware changes—specifically, prioritizing memory bandwidth (200+ GB/s) over raw compute capability in future edge AI chips.</p><h2 id=where-to-go-from-here>Where to Go From Here
|
||||
<a class=heading-link href=#where-to-go-from-here><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><h3 id=software-optimizations-worth-pursuing>Software Optimizations Worth Pursuing
|
||||
<a class=heading-link href=#software-optimizations-worth-pursuing><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><ul><li>Optimize memory access patterns in attention and MLP kernels</li><li>Implement quantized KV cache (8-bit or lower)</li><li>Tune for small batch sizes (2-4) to improve memory bus utilization</li><li>Overlap CPU-GPU pipeline operations to hide latency</li></ul><h3 id=research-directions>Research Directions
|
||||
<a class=heading-link href=#research-directions><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><ul><li>Architectures with higher operational intensity (fewer memory accesses per compute operation)</li><li>Sparse attention patterns to reduce memory movement</li><li>On-device LoRA fine-tuning with frozen, quantized base weights</li><li>Multi-model serving with shared base model weights</li></ul><h3 id=what-edge-ai-hardware-designers-should-focus-on>What Edge AI Hardware Designers Should Focus On
|
||||
<a class=heading-link href=#what-edge-ai-hardware-designers-should-focus-on><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>Future edge AI devices optimized for local, single-user LLM inference need a fundamental shift in priorities: memory bandwidth over raw compute capability. Specifically:</p><ul><li>200+ GB/s memory bandwidth (3× current Jetson Orin Nano)</li><li>HBM integration for higher bandwidth density</li><li>16GB+ capacity to support 7B+ parameter models</li><li>Purpose-built INT4/INT8 accelerators with larger on-chip caches to reduce DRAM traffic</li></ul><hr><h2 id=references>References
|
||||
<a class=heading-link href=#references><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><ol><li><p>Williams, S., Waterman, A., & Patterson, D. (2009). “Roofline: An Insightful Visual Performance Model for Multicore Architectures.” <em>Communications of the ACM</em>, 52(4), 65-76.</p></li><li><p>NVIDIA Corporation. (2024). “Jetson Orin Nano Developer Kit Technical Specifications.” <a href=https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit class=external-link target=_blank rel=noopener>https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit</a></p></li><li><p>“Jetson AI Lab Benchmarks.” NVIDIA Jetson AI Lab. <a href=https://www.jetson-ai-lab.com/benchmarks.html class=external-link target=_blank rel=noopener>https://www.jetson-ai-lab.com/benchmarks.html</a></p></li><li><p>Gerganov, G., et al. (2023). “GGML - AI at the edge.” <em>GitHub</em>. <a href=https://github.com/ggerganov/ggml class=external-link target=_blank rel=noopener>https://github.com/ggerganov/ggml</a></p></li><li><p>Kwon, W., et al. (2023). “Efficient Memory Management for Large Language Model Serving with PagedAttention.” <em>Proceedings of SOSP 2023</em>.</p></li><li><p>Team, G., Mesnard, T., et al. (2025). “Gemma 3: Technical Report.” <em>arXiv preprint arXiv:2503.19786v1</em>. <a href=https://arxiv.org/html/2503.19786v1 class=external-link target=_blank rel=noopener>https://arxiv.org/html/2503.19786v1</a></p></li><li><p>Yang, A., et al. (2025). “Qwen3 Technical Report.” <em>arXiv preprint arXiv:2505.09388</em>. <a href=https://arxiv.org/pdf/2505.09388 class=external-link target=_blank rel=noopener>https://arxiv.org/pdf/2505.09388</a></p></li><li><p>DeepSeek-AI. (2025). “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” <em>arXiv preprint arXiv:2501.12948v1</em>. <a href=https://arxiv.org/html/2501.12948v1 class=external-link target=_blank rel=noopener>https://arxiv.org/html/2501.12948v1</a></p></li><li><p>“Running LLMs with TensorRT-LLM on NVIDIA Jetson Orin Nano Super.” Collabnix. <a href=https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/ class=external-link target=_blank rel=noopener>https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/</a></p></li><li><p>Pope, R., et al. (2022). “Efficiently Scaling Transformer Inference.” <em>Proceedings of MLSys 2022</em>.</p></li><li><p>Frantar, E., et al. (2023). “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.” <em>Proceedings of ICLR 2023</em>.</p></li><li><p>Dettmers, T., et al. (2023). “QLoRA: Efficient Finetuning of Quantized LLMs.” <em>Proceedings of NeurIPS 2023</em>.</p></li><li><p>Lin, J., et al. (2023). “AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.” <em>arXiv preprint arXiv:2306.00978</em>.</p></li></ol></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
28
posts/breville-barista-pro-maintenance/index.html
Normal file
@@ -0,0 +1,28 @@
|
||||
<!doctype html><html lang=en><head><title>Breville Barista Pro Maintenance · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
|
||||
|
||||
Understanding the Two Primary Maintenance Cycles
|
||||
|
||||
|
||||
Link to heading
|
||||
|
||||
|
||||
The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Breville Barista Pro Maintenance"><meta name=twitter:description content="Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
|
||||
Understanding the Two Primary Maintenance Cycles Link to heading The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine."><meta property="og:url" content="https://ericxliu.me/posts/breville-barista-pro-maintenance/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Breville Barista Pro Maintenance"><meta property="og:description" content="Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
|
||||
Understanding the Two Primary Maintenance Cycles Link to heading The Breville Barista Pro has two distinct, automated maintenance procedures: the Cleaning (Flush) Cycle and the Descale Cycle. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-08-16T00:00:00+00:00"><meta property="article:modified_time" content="2025-08-20T06:04:36+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/breville-barista-pro-maintenance/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Breville Barista Pro Maintenance","genre":"Blog","wordcount":"920","url":"https:\/\/ericxliu.me\/posts\/breville-barista-pro-maintenance\/","datePublished":"2025-08-16T00:00:00\u002b00:00","dateModified":"2025-08-20T06:04:36\u002b00:00","description":"\u003cp\u003eProper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.\u003c\/p\u003e\n\u003ch4 id=\u0022understanding-the-two-primary-maintenance-cycles\u0022\u003e\n \u003cstrong\u003eUnderstanding the Two Primary Maintenance Cycles\u003c\/strong\u003e\n \u003ca class=\u0022heading-link\u0022 href=\u0022#understanding-the-two-primary-maintenance-cycles\u0022\u003e\n \u003ci class=\u0022fa-solid fa-link\u0022 aria-hidden=\u0022true\u0022 title=\u0022Link to heading\u0022\u003e\u003c\/i\u003e\n \u003cspan class=\u0022sr-only\u0022\u003eLink to heading\u003c\/span\u003e\n \u003c\/a\u003e\n\u003c\/h4\u003e\n\u003cp\u003eThe Breville Barista Pro has two distinct, automated maintenance procedures: the \u003cstrong\u003eCleaning (Flush) Cycle\u003c\/strong\u003e and the \u003cstrong\u003eDescale Cycle\u003c\/strong\u003e. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/breville-barista-pro-maintenance/>Breville Barista Pro Maintenance</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
|
||||
<time datetime=2025-08-16T00:00:00Z>August 16, 2025
|
||||
</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
|
||||
5-minute read</span></div></div></header><div class=post-content><p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.</p><h4 id=understanding-the-two-primary-maintenance-cycles><strong>Understanding the Two Primary Maintenance Cycles</strong>
|
||||
<a class=heading-link href=#understanding-the-two-primary-maintenance-cycles><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>The Breville Barista Pro has two distinct, automated maintenance procedures: the <strong>Cleaning (Flush) Cycle</strong> and the <strong>Descale Cycle</strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.</p><ul><li><strong>Cleaning Cycle (Flush):</strong> This process is designed to remove coffee oils and granulated residue from the group head, shower screen, and portafilter system.</li><li><strong>Descale Cycle:</strong> This process targets the internal components of the machine, such as the thermocoil and water lines, to remove mineral and limescale deposits from water.</li></ul><h4 id=procedure-1-the-cleaning-flush-cycle><strong>Procedure 1: The Cleaning (Flush) Cycle</strong>
|
||||
<a class=heading-link href=#procedure-1-the-cleaning-flush-cycle><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>The machine will indicate when a cleaning cycle is needed by displaying a “FLUSH” alert on the LCD screen. This typically occurs after approximately 200 extractions.</p><p><strong>Required Materials:</strong></p><ul><li>1-Cup filter basket</li><li>Grey silicone cleaning disc (provided with the machine)</li><li>One cleaning tablet</li></ul><p><strong>Step-by-Step Instructions:</strong></p><ol><li>Insert the 1-cup filter basket into the portafilter.</li><li>Place the grey silicone cleaning disc inside the basket.</li><li>Position one cleaning tablet in the center of the disc.</li><li>Lock the portafilter firmly into the group head.</li><li>Ensure the drip tray is empty and the water tank is filled.</li><li>Press the ‘MENU’ button and use the ‘Grind Amount’ dial to navigate to the ‘FLUSH’ option. Press the dial to select it.</li><li>The ‘1 CUP’ button will illuminate. Press it to initiate the cycle.</li><li>The cleaning process will last approximately five minutes, with the machine backflushing water under pressure. The remaining time will be displayed on the screen.</li><li>Upon completion, the machine will beep and return to its ready state.</li><li>Remove the portafilter and discard the water and dissolved tablet residue. Thoroughly rinse the portafilter, cleaning disc, and filter basket.</li><li>Re-insert the portafilter (without the disc or tablet) and run a shot of hot water through the group head to rinse any remaining cleaning solution.</li></ol><h4 id=procedure-2-the-descale-cycle><strong>Procedure 2: The Descale Cycle</strong>
|
||||
<a class=heading-link href=#procedure-2-the-descale-cycle><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>The machine will alert you when descaling is required. The frequency depends on water hardness and usage but is generally recommended every 2-3 months.</p><p><strong>Required Materials:</strong></p><ul><li>Breville-recommended descaling solution</li><li>A large container (minimum 2-liter capacity)</li></ul><p><strong>Step-by-Step Instructions:</strong></p><p><strong>Part A: Preparation</strong></p><ol><li>Empty the drip tray and re-insert it.</li><li>Remove the water filter from the water tank.</li><li>Pour the descaling solution into the empty water tank and add fresh water up to the indicated “DESCALE” line.</li><li>Place a large container under the group head, hot water outlet, and steam wand.</li></ol><p><strong>Part B: The Descaling Process</strong></p><ol><li>Turn the machine on and press the ‘MENU’ button. Navigate to the ‘DESCALE’ option and select it by pressing the dial.</li><li>Press the illuminated ‘1 CUP’ button to begin.</li><li>The cycle proceeds in three stages. You must manually advance through them using the steam dial based on the LCD prompts:<ul><li><strong>Group Head (d3):</strong> The machine descales the coffee brewing components.</li><li><strong>Hot Water (d2):</strong> After a beep, the LCD shows “d2”. Turn the steam dial to the hot water position.</li><li><strong>Steam (d1):</strong> After another beep, the display reads “d1”. Turn the dial to the steam position.</li></ul></li></ol><p><strong>Part C: The Rinse Cycle</strong></p><ol><li>Once the descaling solution is expended, the machine will beep and prompt for a rinse cycle (“r”).</li><li>Empty the large container and rinse the water tank thoroughly.</li><li>Fill the water tank with fresh, cold water to the MAX line and re-insert it.</li><li>Place the empty container back under the outlets and press the ‘1 CUP’ button.</li><li>The rinse cycle will mirror the descaling process, prompting you to engage the group head (“r3”), hot water (“r2”), and steam wand (“r1”) in sequence.</li><li>After the rinse is complete, the machine will exit the maintenance mode and return to its ready state.</li></ol><h4 id=routine-and-preventative-maintenance-schedule><strong>Routine and Preventative Maintenance Schedule</strong>
|
||||
<a class=heading-link href=#routine-and-preventative-maintenance-schedule><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>In addition to the automated cycles, regular manual cleaning is essential for machine health.</p><p><strong>Daily Tasks:</strong></p><ul><li><strong>Purge Group Head:</strong> After the final use of the day, run hot water through the group head (without the portafilter) to clear grounds.</li><li><strong>Clean Portafilter & Baskets:</strong> Do not let used coffee grounds sit in the portafilter. Rinse with hot water after every use.</li><li><strong>Clean Steam Wand:</strong> Immediately after texturing milk, wipe the wand with a damp cloth and purge steam for 2-3 seconds to clear internal passages.</li><li><strong>Empty Drip Tray:</strong> Empty and rinse the drip tray regularly.</li></ul><p><strong>Weekly Tasks:</strong></p><ul><li><strong>Soak Components:</strong> Remove the filter basket from the portafilter. Soak both components in a solution of hot water and a cleaning tablet (or specific espresso cleaner) for 20-30 minutes to dissolve accumulated coffee oils. Rinse thoroughly.</li><li><strong>Clean Grinder:</strong> Empty the bean hopper. Run the grinder to clear any remaining beans, then use a brush and/or vacuum to clean out fines and oil residue from the burrs and chute.</li></ul><p><strong>Periodic Tasks (Every 2-3 Months):</strong></p><ul><li><strong>Replace Water Filter:</strong> The water filter located inside the water tank should be replaced every 3 months. This reduces the rate of scale buildup.</li><li><strong>Inspect Shower Screen:</strong> Use a brush to gently scrub the shower screen inside the group head to remove any stubborn coffee grounds.</li></ul><p>By adhering to this comprehensive maintenance schedule, you can ensure your Breville Barista Pro operates at peak performance and consistently produces high-quality espresso.</p><hr><p><strong>Reference:</strong></p><ul><li>Breville Barista Pro Instruction Manual and official manufacturer guidelines.</li></ul></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
47
posts/debugging-authentik-performance/index.html
Normal file
@@ -0,0 +1,47 @@
|
||||
<!doctype html><html lang=en><head><title>Why Your "Resilient" Homelab is Slower Than a Raspberry Pi · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running “production” at home, there is only one metric that truly matters: The Wife Acceptance Factor (WAF).
|
||||
My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was “slow sometimes.” She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content='Why Your "Resilient" Homelab is Slower Than a Raspberry Pi'><meta name=twitter:description content="In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running “production” at home, there is only one metric that truly matters: The Wife Acceptance Factor (WAF).
|
||||
My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was “slow sometimes.” She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage."><meta property="og:url" content="https://ericxliu.me/posts/debugging-authentik-performance/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content='Why Your "Resilient" Homelab is Slower Than a Raspberry Pi'><meta property="og:description" content="In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running “production” at home, there is only one metric that truly matters: The Wife Acceptance Factor (WAF).
|
||||
My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was “slow sometimes.” She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2026-01-02T00:00:00+00:00"><meta property="article:modified_time" content="2026-01-03T06:57:12+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/debugging-authentik-performance/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Why Your \u0022Resilient\u0022 Homelab is Slower Than a Raspberry Pi","genre":"Blog","wordcount":"1031","url":"https:\/\/ericxliu.me\/posts\/debugging-authentik-performance\/","datePublished":"2026-01-02T00:00:00\u002b00:00","dateModified":"2026-01-03T06:57:12\u002b00:00","description":"\u003cp\u003eIn the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running \u0026ldquo;production\u0026rdquo; at home, there is only one metric that truly matters: \u003cstrong\u003eThe Wife Acceptance Factor (WAF)\u003c\/strong\u003e.\u003c\/p\u003e\n\u003cp\u003eMy detailed Grafana dashboards said everything was fine. But my wife said the SSO login was \u0026ldquo;slow sometimes.\u0026rdquo; She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/debugging-authentik-performance/>Why Your "Resilient" Homelab is Slower Than a Raspberry Pi</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
|
||||
<time datetime=2026-01-02T00:00:00Z>January 2, 2026
|
||||
</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
|
||||
5-minute read</span></div></div></header><div class=post-content><p>In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running “production” at home, there is only one metric that truly matters: <strong>The Wife Acceptance Factor (WAF)</strong>.</p><p>My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was “slow sometimes.” She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage.</p><p>Here is a breakdown of the symptoms, the red herrings, and the root cause that was hiding in plain sight.</p><h2 id=the-environment>The Environment
|
||||
<a class=heading-link href=#the-environment><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>My homelab is designed for node-level resilience, which adds complexity to the storage layer. It is not running on a single server, but rather a 3-node <strong>Proxmox</strong> cluster where every component is redundant:</p><ul><li><strong>Orchestration</strong>: Kubernetes (k3s) managed via Flux CD.</li><li><strong>Storage</strong>: A <strong>Ceph</strong> cluster running on the Proxmox nodes, utilizing enterprise NVMe SSDs (<code>bluestore</code>) for OSDs.</li><li><strong>Database</strong>: Postgres managed by the Zalando Postgres Operator, with persistent volumes (PVCs) provisioned on Ceph RBD (block storage).</li><li><strong>Identity</strong>: Authentik for SSO.</li></ul><p>While the underlying disks are blazing fast NVMe drives, the architecture dictates that a write to a Ceph RBD volume is not complete until it is replicated over the network and acknowledged by multiple OSDs. This setup provides incredible resilience—I can pull the plug on a node and nothing stops—but it introduces unavoidable network latency for synchronous write operations. <strong>Keep this particular trade-off in mind; it plays a starring role in the investigation later.</strong></p><h2 id=the-symptom>The Symptom
|
||||
<a class=heading-link href=#the-symptom><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>The issue was insidious because it was intermittent. Clicking “Login” would sometimes hang for 5-8 seconds, while other times it was instant. To an engineer, “sometimes slow” is the worst kind of bug because it defies easy reproduction.</p><p>The breakthrough came when I put aside the server-side Grafana dashboards and looked at the client side. By opening Chrome DevTools and monitoring the <strong>Network</strong> tab during a slow login attempt, I was able to capture the exact failing request.</p><p>I identified the culprit: the <code>/api/v3/core/applications/</code> endpoint. It wasn’t a connection timeout or a DNS issue; the server was simply taking 5+ seconds to respond to this specific GET request.</p><p>Armed with this “smoking gun,” I copied the request as cURL (preserving the session cookies) and converted it into a Python benchmark script (<code>reproduce_latency.py</code>). This allowed me to reliably trigger the latency on demand, turning an intermittent “heisenbug” into a reproducible test case.</p><p>The results were validating and horrifying:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-text data-lang=text><span style=display:flex><span>Request 1: 2.1642s
|
||||
</span></span><span style=display:flex><span>Request 2: 8.4321s
|
||||
</span></span><span style=display:flex><span>Request 3: 5.1234s
|
||||
</span></span><span style=display:flex><span>...
|
||||
</span></span><span style=display:flex><span>Avg Latency: 4.8s
|
||||
</span></span></code></pre></div><h2 id=investigation--red-herrings>Investigation & Red Herrings
|
||||
<a class=heading-link href=#investigation--red-herrings><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><h3 id=attempt-1-the-connection-overhead-hypothesis>Attempt 1: The Connection Overhead Hypothesis
|
||||
<a class=heading-link href=#attempt-1-the-connection-overhead-hypothesis><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p><strong>The Hypothesis</strong>: Authentik defaults to <code>CONN_MAX_AGE=0</code>, meaning it closes the database connection after every request. Since I enforce SSL for the database, I assumed the handshake overhead was killing performance.</p><p><strong>The Fix Attempt</strong>: I updated the Authentik configuration to enable persistent connections:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-yaml data-lang=yaml><span style=display:flex><span><span style=color:#7ee787>env</span>:<span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span>- <span style=color:#7ee787>name</span>:<span style=color:#6e7681> </span><span style=color:#a5d6ff>AUTHENTIK_POSTGRESQL__CONN_MAX_AGE</span><span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span><span style=color:#7ee787>value</span>:<span style=color:#6e7681> </span><span style=color:#a5d6ff>"600"</span><span style=color:#6e7681>
|
||||
</span></span></span></code></pre></div><p><strong>The Reality</strong>: The benchmark showed a slight improvement (~4.2s average), but the random 5-8s spikes remained. The 300ms connection setup was a factor, but not the root cause. As a side note, enabling this without configuring TCP Keepalives caused the Authentik worker to crash with <code>OperationalError('the connection is closed')</code> when firewalls silently dropped idle connections.</p><h3 id=attempt-2-cpu-starvation>Attempt 2: CPU Starvation
|
||||
<a class=heading-link href=#attempt-2-cpu-starvation><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p><strong>The Hypothesis</strong>: The pods were CPU throttled during request processing.</p><p><strong>The Reality</strong>: <code>kubectl top pods</code> showed the server using only 29m (2.9% of a core). Even increasing the Gunicorn worker count from 2 to 4 did not improve the latency of individual requests, though it did help with concurrency.</p><h2 id=the-root-cause-a-perfect-storm>The Root Cause: A Perfect Storm
|
||||
<a class=heading-link href=#the-root-cause-a-perfect-storm><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>I was stuck. The CPU was idle, network was fine, and individual database queries were fast (<1ms). Then I looked at the traffic patterns:</p><ol><li><strong>Redis</strong>: Almost zero traffic.</li><li><strong>Postgres</strong>: High <code>WALSync</code> and <code>WALWrite</code> wait times.</li><li><strong>The Table</strong>: <code>django_postgres_cache_cacheentry</code> was getting hammered.</li></ol><h3 id=insight-the-breaking-change>Insight: The Breaking Change
|
||||
<a class=heading-link href=#insight-the-breaking-change><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>I checked the release notes for <strong>Authentik 2025.10</strong>:</p><blockquote><p><em>Breaking Change: Redis is no longer used for caching. All caching has been moved to the PostgreSQL database to simplify deployment.</em></p></blockquote><p>This architectural shift created a bottleneck specific to my storage backend:</p><ol><li><strong>The Change</strong>: Every API request triggers a cache write (session updates) to Postgres instead of Redis.</li><li><strong>The Default</strong>: Postgres defaults to <code>synchronous_commit = on</code>. A transaction is not considered “committed” until it is flushed to disk.</li><li><strong>The Storage</strong>: Ceph RBD replicates data across the network to multiple OSDs.</li></ol><p>Every time I loaded the dashboard, Authentik tried to update the cache. Postgres paused, verified the write was replicated to 3 other servers over the network (WAL Sync), and <em>then</em> responded.</p><h2 id=the-solution>The Solution
|
||||
<a class=heading-link href=#the-solution><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>I couldn’t move the database to local NVMe without losing the failover capabilities I built the cluster for. However, for a cache-heavy workload, I could compromise on strict durability.</p><p>I patched the Postgres configuration to disable synchronous commits:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-yaml data-lang=yaml><span style=display:flex><span><span style=color:#7ee787>spec</span>:<span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span><span style=color:#7ee787>postgresql</span>:<span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span><span style=color:#7ee787>parameters</span>:<span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span><span style=color:#7ee787>synchronous_commit</span>:<span style=color:#6e7681> </span><span style=color:#a5d6ff>"off"</span><span style=color:#6e7681> </span><span style=color:#8b949e;font-style:italic># The magic switch</span><span style=color:#6e7681>
|
||||
</span></span></span></code></pre></div><p><strong>What this does</strong>: Postgres returns “Success” to the application as soon as the transaction is in memory. It flushes to disk in the background. In the event of a crash, I might lose the last ~500ms of data (mostly cache entries), which is an acceptable trade-off.</p><h2 id=verification>Verification
|
||||
<a class=heading-link href=#verification><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>I re-ran the benchmark with <code>synchronous_commit = off</code>.</p><table><thead><tr><th>Metric</th><th>Before (<code>sync=on</code>)</th><th>After (<code>sync=off</code>)</th><th>Improvement</th></tr></thead><tbody><tr><td>Sequential x8 stream (Avg)</td><td>~4.8s</td><td><strong>0.40s</strong></td><td><strong>12x Faster</strong></td></tr><tr><td>Parallel x8 stream (Wall)</td><td>~10.5s</td><td><strong>2.45s</strong></td><td><strong>4x Faster</strong></td></tr></tbody></table><p>The latency vanished. The login became instant.</p><h2 id=key-insights>Key Insights
|
||||
<a class=heading-link href=#key-insights><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><ul><li><strong>Read Release Notes</strong>: The shift from Redis to Postgres for caching was a major architectural change that I missed during the upgrade.</li><li><strong>Storage Matters</strong>: Distributed storage (Ceph/Longhorn) handles linear writes well, but struggles with latency-sensitive, high-frequency sync operations like WAL updates.</li><li><strong>Tuning Postgres</strong>: For workloads where immediate durability is less critical than latency (like caching tables), <code>synchronous_commit = off</code> is a powerful tool.</li><li><strong>Observability</strong>: The “Wife Test” is a valid monitoring alert. If a user complains it’s slow, investigate the P99 latency, not just the average.</li></ul><h3 id=references>References
|
||||
<a class=heading-link href=#references><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><ul><li><a href=https://docs.goauthentik.io/releases/2025.10/ class=external-link target=_blank rel=noopener>Authentik 2025.10 Release Notes</a></li><li><a href=https://www.postgresql.org/docs/current/wal-async-commit.html class=external-link target=_blank rel=noopener>PostgreSQL Documentation: Synchronous Commit</a></li></ul></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
@@ -0,0 +1,23 @@
|
||||
<!doctype html><html lang=en><head><title>Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.
|
||||
Our overarching philosophy is simple: isolate and change only one variable at a time. While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your grind size is your most powerful lever."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso"><meta name=twitter:description content="Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.
|
||||
Our overarching philosophy is simple: isolate and change only one variable at a time. While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your grind size is your most powerful lever."><meta property="og:url" content="https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso"><meta property="og:description" content="Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.
|
||||
Our overarching philosophy is simple: isolate and change only one variable at a time. While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your grind size is your most powerful lever."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-05-01T00:00:00+00:00"><meta property="article:modified_time" content="2025-08-03T04:20:20+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso","genre":"Blog","wordcount":"1125","url":"https:\/\/ericxliu.me\/posts\/espresso-theory-application-a-guide-for-the-breville-barista-pro\/","datePublished":"2025-05-01T00:00:00\u002b00:00","dateModified":"2025-08-03T04:20:20\u002b00:00","description":"\u003cp\u003eAre you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.\u003c\/p\u003e\n\u003cp\u003eOur overarching philosophy is simple: \u003cstrong\u003eisolate and change only one variable at a time.\u003c\/strong\u003e While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your \u003cstrong\u003egrind size\u003c\/strong\u003e is your most powerful lever.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/>Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
|
||||
<time datetime=2025-05-01T00:00:00Z>May 1, 2025
|
||||
</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
|
||||
6-minute read</span></div></div></header><div class=post-content><p>Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.</p><p>Our overarching philosophy is simple: <strong>isolate and change only one variable at a time.</strong> While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your <strong>grind size</strong> is your most powerful lever.</p><p>Let’s dive in!</p><hr><h3 id=part-1-the-foundation--dose-the-weight-of-dry-coffee><strong>Part 1: The Foundation — Dose (The Weight of Dry Coffee)</strong>
|
||||
<a class=heading-link href=#part-1-the-foundation--dose-the-weight-of-dry-coffee><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>Your dose is the bedrock of your espresso. It’s the weight of your ground coffee, and it should be the first variable you set and then keep <strong>constant</strong> during the initial dialing-in process.</p><p><strong>Why Dose Matters:</strong></p><ul><li><strong>Basket Size is Key:</strong> Your portafilter basket dictates your ideal dose. Too little coffee (under-dosing) creates excessive “headspace,” leading to soupy extractions. Too much (over-dosing) causes the coffee puck to touch the shower screen, preventing even water flow and causing channeling.</li><li><strong>Extraction “Work”:</strong> A higher dose means more coffee mass, requiring more “work” (a finer grind, more water) to extract properly.</li><li><strong>Coffee Type:</strong><ul><li><strong>Light Roasts:</strong> Denser and harder to extract. Consider a <strong>slightly lower dose</strong>.</li><li><strong>Dark Roasts:</strong> More brittle and soluble. You can often use a <strong>slightly higher dose</strong>.</li></ul></li></ul><p><strong>Application for Your Breville Barista Pro (54mm Portafilter):</strong></p><ul><li><strong>Your Starting Point:</strong> Always begin with <strong>18 grams</strong>. Use a scale for accuracy!</li><li><strong>Adjusting for Roast:</strong> For light roasts, if you’re struggling, drop to 17g. For dark roasts, you can try 19g.</li><li><strong>Golden Rule:</strong> Once you choose your starting dose (e.g., 18g), <strong>do not change it</strong> until you’ve dialed in your grind size.</li></ul><hr><h3 id=part-2-defining-the-drink--brew-ratio-dose-vs-yield><strong>Part 2: Defining the Drink — Brew Ratio (Dose vs. Yield)</strong>
|
||||
<a class=heading-link href=#part-2-defining-the-drink--brew-ratio-dose-vs-yield><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>The brew ratio defines the relationship between your dry coffee dose and the weight of your liquid espresso yield. Always measure by <strong>weight (grams)</strong>, not volume (mL), as crema can be inconsistent.</p><p><strong>Understanding Ratios:</strong></p><ul><li><strong>Ristretto (1:1 – 1:1.5):</strong> E.g., 18g in → 18g to 27g out. Strong, textured, less extracted.</li><li><strong>Espresso (Normale) (1:1.5 – 1:2.5):</strong> E.g., 18g in → 27g to 45g out. The standard, balanced shot.</li><li><strong>Lungo (1:2.5+):</strong> E.g., 18g in → 45g+ out. Weaker, less textured, more extracted.</li></ul><p><strong>The Fundamental Trade-Off:</strong></p><ul><li><strong>Longer Ratio (more water):</strong> Higher extraction, but lower strength (more diluted).</li><li><strong>Shorter Ratio (less water):</strong> Lower extraction, but higher strength (more concentrated).</li></ul><p><strong>Application for Your Breville Barista Pro:</strong></p><ul><li><strong>Recommended Starting Ratio:</strong> A <strong>1:2 ratio</strong> is the perfect place to begin.</li><li><strong>Practical Numbers:</strong> With your 18g dose, your target yield is <strong>36 grams</strong> of liquid espresso.</li><li><strong>Execution:</strong> Place your cup on a scale and use the manual brew function to stop the shot precisely when the scale reads 36g.</li></ul><hr><h3 id=part-3-the-diagnostic-tool--brew-time><strong>Part 3: The Diagnostic Tool — Brew Time</strong>
|
||||
<a class=heading-link href=#part-3-the-diagnostic-tool--brew-time><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>Brew time is not something you set directly; it’s the <strong>result</strong> of how much resistance your coffee puck provides against the machine’s water pressure. Think of it as a <strong>diagnostic tool</strong>.</p><p><strong>The 25-30 Second Guideline:</strong></p><p>This is a benchmark. If your 1:2 ratio shot falls within this time, your grind size is likely in the correct range for a balanced extraction.</p><ul><li><strong>Too Fast (<25s):</strong> Indicates under-extraction (often tastes sour).</li><li><strong>Too Slow (>30s):</strong> Indicates over-extraction (often tastes bitter).</li></ul><p><strong>Taste is King:</strong> Remember, if a shot tastes fantastic at 32 seconds, it’s a great shot! The time simply becomes part of your successful recipe for that specific coffee.</p><p><strong>Application for Your Breville Barista Pro:</strong></p><ul><li><strong>Pre-infusion:</strong> The Barista Pro’s low-pressure pre-infusion is <strong>part of your total brew time</strong>. Its purpose is to saturate the puck evenly to prevent channeling. Keep it consistent for every shot while dialing in.</li></ul><hr><h3 id=part-4-the-primary-control--grind-setting><strong>Part 4: The Primary Control — Grind Setting</strong>
|
||||
<a class=heading-link href=#part-4-the-primary-control--grind-setting><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>This is where the magic (and sometimes frustration) happens. Grind size is your main tool for controlling the resistance of the coffee puck, which directly dictates your brew time.</p><p><strong>The Dual Impact of Grinding Finer:</strong></p><ol><li><strong>Increases surface area:</strong> Allows for more efficient flavor extraction.</li><li><strong>Increases resistance:</strong> Slows down water flow and increases contact time.</li></ol><p><strong>The Risk of Grinding Too Fine (Channeling):</strong></p><p>If the grind is too fine, the puck becomes so dense that high-pressure water can’t flow evenly. Instead, it “breaks” the puck and punches an easy path (a channel) through a weak spot. This results in a disastrous shot that is simultaneously:</p><ul><li><strong>Under-extracted:</strong> Most of the coffee is bypassed.</li><li><strong>Over-extracted:</strong> The water that does flow blasts through the channel, extracting harsh, bitter compounds.</li><li><strong>The Taste:</strong> A channeled shot tastes hollow, weak, sour, <em>and</em> bitter all at once.</li></ul><p><strong>The Goal:</strong> You want to <strong>grind as fine as you possibly can <em>without</em> causing significant channeling</strong>. This is the sweet spot for maximizing surface area and resistance for high, even extraction.</p><p><strong>Grind Retention (Purging):</strong> Most grinders retain some old grounds. When you change your grind setting, always purge a few grams of coffee to ensure your dose is entirely at the new setting.</p><p><strong>Application for Your Breville Barista Pro:</strong></p><ul><li><strong>Grinder Mechanism:</strong> The “Grind Amount” dial controls the <strong>TIME</strong> the grinder runs, not the weight. When you adjust the fineness, you <strong>must</strong> re-adjust the grind time to ensure you are still getting your target 18g dose.</li><li><strong>Tackling Channeling:</strong> The Barista Pro is prone to channeling. To fight this, focus on excellent <strong>puck prep</strong>: use a WDT (Weiss Distribution Technique) tool to break up clumps and evenly distribute the grounds before tamping levelly.</li></ul><hr><h3 id=the-complete-dialing-in-workflow><strong>The Complete Dialing-In Workflow</strong>
|
||||
<a class=heading-link href=#the-complete-dialing-in-workflow><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>This systematic process will get you to a delicious shot from your Breville Barista Pro efficiently:</p><ol><li><strong>Set Your Constants:</strong><ul><li><strong>Dose:</strong> <strong>18g</strong>.</li><li><strong>Ratio:</strong> <strong>1:2</strong> (meaning a <strong>Yield</strong> of <strong>36g</strong>).</li><li><strong>Pre-infusion:</strong> Use a consistent method (e.g., manual 8-second hold).</li></ul></li><li><strong>Make an Initial Grind:</strong><ul><li>Set the grinder to a starting point of <strong>15</strong>.</li><li>Adjust the grind <strong>time</strong> until the grinder dispenses exactly 18g.</li></ul></li><li><strong>Pull the First Shot:</strong><ul><li>Brew manually, stopping at <strong>36g</strong> of liquid in the cup. Note the <strong>total brew time</strong>.</li></ul></li><li><strong>Taste and Diagnose:</strong><ul><li><strong>Fast & Sour? (<25s):</strong> Grind is too coarse.</li><li><strong>Slow & Bitter? (>32s):</strong> Grind is too fine.</li></ul></li><li><strong>Make ONE Adjustment - THE GRIND SIZE:</strong><ul><li>If fast/sour, adjust the grind <strong>finer</strong> (e.g., from 15 down to 13).</li><li>If slow/bitter, adjust the grind <strong>coarser</strong> (e.g., from 15 up to 17).</li></ul></li><li><strong>Re-adjust and Repeat:</strong><ul><li>After changing the grind setting, <strong>purge</strong> a small amount of coffee.</li><li>Re-weigh your next dose and <strong>adjust the grind time</strong> to get back to exactly 18g.</li><li>Pull another 36g shot. Repeat this process until your shot tastes balanced and the time falls roughly between <strong>25-32 seconds</strong>.</li></ul></li></ol><p>Happy brewing! With patience and this systematic approach, you’ll be pulling consistently delicious espresso shots from your Breville Barista Pro in no time.</p></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
21
posts/how-rvq-teaches-llms-to-see-and-hear/index.html
Normal file
@@ -0,0 +1,21 @@
|
||||
<!doctype html><html lang=en><head><title>Beyond Words: How RVQ Teaches LLMs to See and Hear · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?
|
||||
The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is Residual Vector Quantization (RVQ)."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Beyond Words: How RVQ Teaches LLMs to See and Hear"><meta name=twitter:description content="Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?
|
||||
The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is Residual Vector Quantization (RVQ)."><meta property="og:url" content="https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Beyond Words: How RVQ Teaches LLMs to See and Hear"><meta property="og:description" content="Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?
|
||||
The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is Residual Vector Quantization (RVQ)."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-08-07T00:00:00+00:00"><meta property="article:modified_time" content="2025-08-08T17:36:52+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Beyond Words: How RVQ Teaches LLMs to See and Hear","genre":"Blog","wordcount":"1150","url":"https:\/\/ericxliu.me\/posts\/how-rvq-teaches-llms-to-see-and-hear\/","datePublished":"2025-08-07T00:00:00\u002b00:00","dateModified":"2025-08-08T17:36:52\u002b00:00","description":"\u003cp\u003eLarge Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?\u003c\/p\u003e\n\u003cp\u003eThe answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is \u003cstrong\u003eResidual Vector Quantization (RVQ)\u003c\/strong\u003e.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/>Beyond Words: How RVQ Teaches LLMs to See and Hear</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
|
||||
<time datetime=2025-08-07T00:00:00Z>August 7, 2025
|
||||
</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
|
||||
6-minute read</span></div></div></header><div class=post-content><p>Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?</p><p>The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is <strong>Residual Vector Quantization (RVQ)</strong>.</p><p>This article dives deep into RVQ, exploring how it turns raw data into meaningful semantic IDs and how these IDs, in turn, unlock multi-modal understanding in LLMs.</p><h4 id=what-is-residual-vector-quantization-the-art-of-smart-compression><strong>What is Residual Vector Quantization? The Art of Smart Compression</strong>
|
||||
<a class=heading-link href=#what-is-residual-vector-quantization-the-art-of-smart-compression><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>At its core, Vector Quantization (VQ) is a compression technique. It maps a high-dimensional vector (like an data embedding) to the single closest vector in a predefined dictionary, called a <strong>codebook</strong>. You then only need to store the index of that chosen vector. The problem? To represent complex data accurately, you’d need a codebook with an astronomical number of entries, which is computationally impossible.</p><p>This is where <strong>Residual</strong> Vector Quantization shines. Instead of one giant codebook, RVQ uses a series of smaller codebooks in stages.</p><ol><li><strong>Stage 1 (Coarse Quantization):</strong> The input vector is quantized by the first codebook. This finds the broadest, most general category for the data.</li><li><strong>Calculate the Residual:</strong> The system calculates the error, or “residual,” between the original vector and its quantized version from Stage 1. This residual vector represents the information that was lost in the first coarse approximation.</li><li><strong>Stage 2 (Refinement):</strong> This residual vector is then quantized by the <em>second</em> codebook. This stage doesn’t re-evaluate the whole vector, but only focuses on correcting the error from the previous stage.</li><li><strong>Iterate:</strong> This process repeats for several stages, with each subsequent codebook quantizing the residual error from the previous one, adding a finer and finer layer of detail.</li></ol><p>The final compressed representation is simply the sequence of indices from each codebook. For example, an ID like <code>[8, 5, 4, 1]</code> is produced. The magic of this approach is that it creates a <strong>hierarchical ID</strong>. The first digit <code>[8]</code> might represent “Sports,” the next <code>[5]</code> refines it to “Court Sports,” <code>[4]</code> to “Beach Volleyball,” and the final <code>[1]</code> distinguishes a specific match. Videos with similar content will naturally share a longer prefix in their Semantic ID.</p><h4 id=learning-what-matters-the-trainable-vq-autoencoder><strong>Learning What Matters: The Trainable VQ-Autoencoder</strong>
|
||||
<a class=heading-link href=#learning-what-matters-the-trainable-vq-autoencoder><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>A key insight is that RVQ is not a fixed algorithm but a <strong>trainable neural network component</strong>. Its codebooks are not predefined; they are learned. This learning happens within a <strong>Vector-Quantized Autoencoder (VQ-AE)</strong> architecture.</p><ol><li><strong>Encoder:</strong> A powerful neural network (e.g., a Transformer or CNN) takes the raw data (like video frames and audio) and converts it into a continuous semantic embedding.</li><li><strong>RVQ Bottleneck:</strong> This embedding is fed into the RVQ module, which quantizes it into the sequence of discrete IDs.</li><li><strong>Decoder:</strong> The decoder takes these discrete IDs, looks up the corresponding codebook vectors, sums them up to get a reconstructed embedding, and attempts to rebuild the original video/audio.</li></ol><p>The entire system is trained end-to-end. The <strong>reconstruction loss</strong> (the difference between the original and reconstructed data) is used to update the parameters of the Encoder, the Decoder, and, most importantly, <strong>the codebook vectors within the RVQ module</strong>. Initially random, the codebook vectors are gradually pushed to become meaningful “anchors” for the core concepts present in the training data.</p><h4 id=from-implicit-to-explicit-controlling-semantics-with-contrastive-learning><strong>From Implicit to Explicit: Controlling Semantics with Contrastive Learning</strong>
|
||||
<a class=heading-link href=#from-implicit-to-explicit-controlling-semantics-with-contrastive-learning><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>A standard VQ-AE learns implicit semantics. It gets good at reconstruction, but we can’t control <em>what</em> concepts it learns. To make the Semantic IDs truly meaningful and aligned with human language, we introduce <strong>contrastive learning</strong>.</p><p>The architecture is enhanced with a parallel text encoder (like BERT or CLIP’s). The model is then trained with a joint loss function:</p><p><code>L_total = L_reconstruction + λ * L_contrastive</code></p><ul><li><strong>Reconstruction Loss</strong> ensures the RVQ codes contain enough information to rebuild the input.</li><li><strong>Contrastive Loss</strong> forces the media embedding (from the video/audio encoder) to be mathematically “close” to the text embedding of its description, and “far” from the embeddings of unrelated text descriptions.</li></ul><p>This dual goal forces the model to organize its embedding space according to the semantics of human language. The codebook vectors now learn to represent concepts that are not just useful for reconstruction, but are also tied to explicit textual descriptions.</p><h4 id=integrating-with-llms-two-powerful-paths-to-multi-modality><strong>Integrating with LLMs: Two Powerful Paths to Multi-Modality</strong>
|
||||
<a class=heading-link href=#integrating-with-llms-two-powerful-paths-to-multi-modality><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h4><p>Once we have a contrastively-trained VQ-AE, we can use its output to give LLMs the ability to see and hear. There are two primary strategies for this.</p><p><strong>Path 1: The Tokenizer Approach - Teaching the LLM a New Language</strong></p><p>This path treats the RVQ IDs as a new vocabulary. It’s a two-stage process ideal for high-fidelity content generation.</p><ol><li><strong>Create a Neural Codec:</strong> The trained VQ-AE serves as a powerful “codec.” You can take any piece of media (e.g., a song) and use the codec to compress it into a sequence of discrete RVQ tokens (e.g., <code>[8, 5, 4, 1, 8, 5, 9, 2, ...]</code>).</li><li><strong>Train a Generative LLM:</strong> A new Transformer model is trained auto-regressively on a massive dataset of these media-derived tokens. Its sole purpose is to learn the patterns and predict the next token in a sequence.</li></ol><p><strong>Use Case:</strong> This is the architecture behind models like Meta’s MusicGen. A user provides a text prompt, which conditions the Transformer to generate a new sequence of RVQ tokens. These tokens are then fed to the VQ-AE’s decoder to synthesize the final audio waveform.</p><p><strong>Path 2: The Adapter Approach - Translating for a Language Expert</strong></p><p>This path is used to augment a powerful, pre-trained, text-only LLM without the astronomical cost of retraining it.</p><ol><li><strong>Freeze the LLM:</strong> A massive, pre-trained LLM (like LLaMA) is frozen. Its deep language understanding is preserved.</li><li><strong>Use the Pre-Quantized Embedding:</strong> Instead of using the discrete RVQ tokens, we take the rich, continuous embedding vector produced by our media encoder <em>just before</em> it enters the RVQ module.</li><li><strong>Train a Small Adapter:</strong> A small, lightweight projection layer (or “adapter”) is trained. Its only job is to translate the media embedding into a vector that has the same format and structure as the LLM’s own word embeddings. It learns to map visual concepts to their corresponding “word” concepts in the LLM’s latent space.</li></ol><p><strong>Use Case:</strong> This is the principle behind models like Google’s Flamingo. To answer a question about an image, the image is passed through the media encoder and adapter. The resulting “vision-as-a-word” vector is inserted into the prompt sequence alongside the text tokens. The frozen LLM can now “reason” about the visual input because it has been translated into a format it already understands.</p></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
17
posts/index.html
Normal file
@@ -0,0 +1,17 @@
|
||||
<!doctype html><html lang=en><head><title>Posts · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Posts"><meta name=twitter:description content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:url" content="https://ericxliu.me/posts/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Posts"><meta property="og:description" content="Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."><meta property="og:locale" content="en"><meta property="og:type" content="website"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><link rel=alternate type=application/rss+xml href=/posts/index.xml title="Eric X. Liu's Personal Page"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container list"><header><h1 class=title><a class=title-link href=https://ericxliu.me/posts/>Posts</a></h1></header><ul><li><span class=date>January 21, 2026</span>
|
||||
<a class=title href=/posts/vibe-coding-from-the-jeep/>Hacking a Chinese Car Stereo to fulfill my Knight Rider dreams</a></li><li><span class=date>January 16, 2026</span>
|
||||
<a class=title href=/posts/reverse-engineering-antigravity-ide/>How I Built a Blog Agent that Writes About Itself</a></li><li><span class=date>January 7, 2026</span>
|
||||
<a class=title href=/posts/rooting-pixel-2-xl-for-reverse-engineering/>Why I Downgraded Magisk to Root My Pixel 2 XL</a></li><li><span class=date>January 2, 2026</span>
|
||||
<a class=title href=/posts/debugging-authentik-performance/>Why Your "Resilient" Homelab is Slower Than a Raspberry Pi</a></li><li><span class=date>December 29, 2025</span>
|
||||
<a class=title href=/posts/open-webui-openai-websearch/>How I Got Open WebUI Talking to OpenAI Web Search</a></li><li><span class=date>December 27, 2025</span>
|
||||
<a class=title href=/posts/technical-deep-dive-llm-categorization/>From Gemini-3-Flash to T5-Gemma-2: A Journey in Distilling a Family Finance LLM</a></li><li><span class=date>December 19, 2025</span>
|
||||
<a class=title href=/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/>The Convergence of Fast Weights, Linear Attention, and State Space Models</a></li><li><span class=date>December 8, 2025</span>
|
||||
<a class=title href=/posts/vattention/>vAttention</a></li><li><span class=date>November 15, 2025</span>
|
||||
<a class=title href=/posts/jellyfin-sso-with-authentik/>Setting Up Jellyfin SSO with Authentik: Surviving the Beta</a></li><li><span class=date>October 4, 2025</span>
|
||||
<a class=title href=/posts/benchmarking-llms-on-jetson-orin-nano/>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</a></li></ul><ul class=pagination><li>1</li><li><a href=/posts/page/2/>2</a></li><li><a href=/posts/page/3/>3</a></li><li class=hidden><a href=/posts/page/2/>›</a></li><li><a href=/posts/page/3/>»</a></li></ul></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
92
posts/index.xml
Normal file
@@ -0,0 +1,92 @@
|
||||
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Eric X. Liu's Personal Page</title><link>https://ericxliu.me/posts/</link><description>Recent content in Posts on Eric X. Liu's Personal Page</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 22 Jan 2026 06:48:07 +0000</lastBuildDate><atom:link href="https://ericxliu.me/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Hacking a Chinese Car Stereo to fulfill my Knight Rider dreams</title><link>https://ericxliu.me/posts/vibe-coding-from-the-jeep/</link><pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/vibe-coding-from-the-jeep/</guid><description><p>&ldquo;Vibe coding&rdquo; has become my latest obsession. It&rsquo;s that flow state where the tools disappear, and you&rsquo;re just manipulating logic at the speed of thought. Usually, this happens in a high-end IDE like Antigravity. But lately, I&rsquo;ve been trying to answer a childhood dream.</p>
|
||||
<p>Growing up in China before the internet age, my window to the outside world was CCTV-6. Along with <em>Baywatch</em>, one of the first American TV shows I ever watched was <em>Knight Rider</em>. I don&rsquo;t remember the exact plot lines, but the core concept stuck with me forever: KITT. A car that could talk, think, and do things for you.</p></description></item><item><title>How I Built a Blog Agent that Writes About Itself</title><link>https://ericxliu.me/posts/reverse-engineering-antigravity-ide/</link><pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/reverse-engineering-antigravity-ide/</guid><description><p>I&rsquo;ve been spending a lot of time &ldquo;vibe coding&rdquo; in the Antigravity IDE lately. It&rsquo;s an incredible flow state—intense, iterative, and fast. But it has a major flaw: the context is ephemeral. Once the session is over, that rich history of decisions, wrong turns, and &ldquo;aha!&rdquo; moments is locked away in an opaque, internal format.</p>
|
||||
<p>I wanted to capture that value. I wanted a system that could take my chaotic coding sessions and distill them into structured, technical blog posts (like the one you&rsquo;re reading right now).</p></description></item><item><title>Why I Downgraded Magisk to Root My Pixel 2 XL</title><link>https://ericxliu.me/posts/rooting-pixel-2-xl-for-reverse-engineering/</link><pubDate>Wed, 07 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/rooting-pixel-2-xl-for-reverse-engineering/</guid><description><p>For the past few weeks, I&rsquo;ve been stuck in a stalemate with my EcoFlow Bluetooth Protocol Reverse Engineering Project. I have the hci snoop logs, I have the decompiled APK, and I have a strong suspicion about where the authentication logic is hiding. But suspicion isn&rsquo;t proof.</p>
|
||||
<p>Static analysis has its limits. I found the &ldquo;smoking gun&rdquo; function—a native method responsible for encrypting the login payload—but understanding <em>how</em> it constructs that payload within a strict 13-byte limit purely from assembly (ARM64) was proving to be a headache.</p></description></item><item><title>Why Your "Resilient" Homelab is Slower Than a Raspberry Pi</title><link>https://ericxliu.me/posts/debugging-authentik-performance/</link><pubDate>Fri, 02 Jan 2026 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/debugging-authentik-performance/</guid><description><p>In the world of self-hosting, there are many metrics for success: 99.9% uptime, sub-second latency, or a perfect GitOps pipeline. But for those of us running &ldquo;production&rdquo; at home, there is only one metric that truly matters: <strong>The Wife Acceptance Factor (WAF)</strong>.</p>
|
||||
<p>My detailed Grafana dashboards said everything was fine. But my wife said the SSO login was &ldquo;slow sometimes.&rdquo; She was right. Debugging it took me down a rabbit hole of connection pooling, misplaced assumptions, and the harsh reality of running databases on distributed storage.</p></description></item><item><title>How I Got Open WebUI Talking to OpenAI Web Search</title><link>https://ericxliu.me/posts/open-webui-openai-websearch/</link><pubDate>Mon, 29 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/open-webui-openai-websearch/</guid><description><p>OpenAI promised native web search in GPT‑5, but LiteLLM proxy deployments (and by extension Open WebUI) still choke on it—issue <a href="https://github.com/BerriAI/litellm/issues/13042" class="external-link" target="_blank" rel="noopener">#13042</a> tracks the fallout. I needed grounded answers inside Open WebUI anyway, so I built a workaround: route GPT‑5 traffic through the Responses API and mask every <code>web_search_call</code> before the UI ever sees it.</p>
|
||||
<p>This post documents the final setup, the hotfix script that keeps LiteLLM honest, and the tests that prove Open WebUI now streams cited answers without trying to execute the tool itself.</p></description></item><item><title>From Gemini-3-Flash to T5-Gemma-2: A Journey in Distilling a Family Finance LLM</title><link>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</link><pubDate>Sat, 27 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/technical-deep-dive-llm-categorization/</guid><description><p>Running a family finance system is surprisingly complex. What starts as a simple spreadsheet often evolves into a web of rules, exceptions, and &ldquo;wait, was this dinner or <em>vacation</em> dinner?&rdquo; questions.</p>
|
||||
<p>For years, I relied on a rule-based system to categorize our credit card transactions. It worked&hellip; mostly. But maintaining <code>if &quot;UBER&quot; in description and amount &gt; 50</code> style rules is a never-ending battle against the entropy of merchant names and changing habits.</p></description></item><item><title>The Convergence of Fast Weights, Linear Attention, and State Space Models</title><link>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/the-convergence-of-fast-weights-linear-attention-and-state-space-models/</guid><description><p>Modern Large Language Models (LLMs) are dominated by the Transformer architecture. However, as context windows grow, the computational cost of the Transformer’s attention mechanism has become a primary bottleneck. Recent discussions in the AI community—most notably by Geoffrey Hinton—have highlighted a theoretical link between biological memory mechanisms (&ldquo;Fast Weights&rdquo;) and efficient engineering solutions like Linear Transformers and State Space Models (SSMs).</p>
|
||||
<p>This article explores the mathematical equivalence between Hinton’s concept of Fast Weights as Associative Memory and the recurrence mechanisms found in models such as Mamba and RWKV.</p></description></item><item><title>vAttention</title><link>https://ericxliu.me/posts/vattention/</link><pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/vattention/</guid><description><p>Large Language Model (LLM) inference is memory-bound, primarily due to the Key-Value (KV) cache—a store of intermediate state that grows linearly with sequence length. Efficient management of this memory is critical for throughput. While <strong>PagedAttention</strong> (popularized by vLLM) became the industry standard by solving memory fragmentation via software, recent research suggests that leveraging the GPU’s native hardware Memory Management Unit (MMU) offers a more performant and portable solution.</p>
|
||||
<h4 id="the-status-quo-pagedattention-and-software-tables">
|
||||
The Status Quo: PagedAttention and Software Tables
|
||||
<a class="heading-link" href="#the-status-quo-pagedattention-and-software-tables">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h4>
|
||||
<p>Prior to PagedAttention, systems allocated contiguous memory for the maximum possible context length, leading to severe fragmentation and wasted memory. PagedAttention addressed this by chunking the KV cache into non-contiguous blocks, managed by a software-defined &ldquo;page table&rdquo; (the Block Table) [1].</p></description></item><item><title>Setting Up Jellyfin SSO with Authentik: Surviving the Beta</title><link>https://ericxliu.me/posts/jellyfin-sso-with-authentik/</link><pubDate>Sat, 15 Nov 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/jellyfin-sso-with-authentik/</guid><description><p>I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren&rsquo;t immediately obvious.</p>
|
||||
<h2 id="the-setup">
|
||||
The Setup
|
||||
<a class="heading-link" href="#the-setup">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h2>
|
||||
<p>The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent.</p></description></item><item><title>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</title><link>https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/</link><pubDate>Sat, 04 Oct 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/</guid><description><h2 id="introduction">
|
||||
Introduction
|
||||
<a class="heading-link" href="#introduction">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h2>
|
||||
<p>NVIDIA&rsquo;s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there&rsquo;s a catch—one that reveals a fundamental tension in modern edge AI hardware design.</p>
|
||||
<p>After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device&rsquo;s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn&rsquo;t computation—it&rsquo;s memory bandwidth. This isn&rsquo;t just a quirk of one device; it&rsquo;s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.</p></description></item><item><title>Flashing Jetson Orin Nano in Virtualized Environments</title><link>https://ericxliu.me/posts/flashing-jetson-orin-nano-in-virtualized-environments/</link><pubDate>Thu, 02 Oct 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/flashing-jetson-orin-nano-in-virtualized-environments/</guid><description><h1 id="flashing-jetson-orin-nano-in-virtualized-environments">
|
||||
Flashing Jetson Orin Nano in Virtualized Environments
|
||||
<a class="heading-link" href="#flashing-jetson-orin-nano-in-virtualized-environments">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h1>
|
||||
<h2 id="introduction">
|
||||
Introduction
|
||||
<a class="heading-link" href="#introduction">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h2>
|
||||
<p>Flashing NVIDIA Jetson devices remotely presents unique challenges when the host machine is virtualized. This article documents the technical challenges, failures, and eventual success of flashing a Jetson Orin Nano Super developer kit using NVIDIA SDK Manager in various virtualized environments, specifically focusing on QEMU/KVM virtual machines and LXC containers on Proxmox VE.</p></description></item><item><title>OpenWrt: Fix WireGuard Connectivity with MWAN3 by Excluding the VPN Endpoint</title><link>https://ericxliu.me/posts/openwrt-mwan3-wireguard-endpoint-exclusion/</link><pubDate>Sun, 28 Sep 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/openwrt-mwan3-wireguard-endpoint-exclusion/</guid><description><h3 id="overview">
|
||||
Overview
|
||||
<a class="heading-link" href="#overview">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h3>
|
||||
<p>When using WireGuard together with MWAN3 on OpenWrt, the tunnel can fail to establish or flap when the peer&rsquo;s IP is routed into the tunnel itself. This is a classic routing bootstrap problem: WireGuard wants to route 0.0.0.0/0 into the tunnel, but the UDP packets to the peer&rsquo;s public endpoint also get captured, so they never reach the Internet to bring the tunnel up.</p></description></item><item><title>UniFi VLAN Migration to Zone-Based Architecture</title><link>https://ericxliu.me/posts/unifi-vlan-migration-to-zone-based-architecture/</link><pubDate>Mon, 22 Sep 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/unifi-vlan-migration-to-zone-based-architecture/</guid><description><p>Embarking on a network migration to a properly segmented VLAN architecture is a rite of passage for any serious home lab or small business operator. The goal is clear: improve security and organization by separating traffic. However, the path from a flat network to a segmented one is often paved with subtle but critical configuration details that can lead to hours of frustrating troubleshooting.</p>
|
||||
<p>This article documents that journey. It details the pitfalls encountered, the core networking concepts that were essential to understand, and the best practices that ultimately led to a stable, secure, and logical network design built on a zone-based firewall model.</p></description></item><item><title>Quantization in LLMs</title><link>https://ericxliu.me/posts/quantization-in-llms/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/quantization-in-llms/</guid><description><p>The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.</p></description></item><item><title>Breville Barista Pro Maintenance</title><link>https://ericxliu.me/posts/breville-barista-pro-maintenance/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/breville-barista-pro-maintenance/</guid><description><p>Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.</p>
|
||||
<h4 id="understanding-the-two-primary-maintenance-cycles">
|
||||
<strong>Understanding the Two Primary Maintenance Cycles</strong>
|
||||
<a class="heading-link" href="#understanding-the-two-primary-maintenance-cycles">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h4>
|
||||
<p>The Breville Barista Pro has two distinct, automated maintenance procedures: the <strong>Cleaning (Flush) Cycle</strong> and the <strong>Descale Cycle</strong>. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.</p></description></item><item><title>Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian</title><link>https://ericxliu.me/posts/secure-boot-dkms-and-mok-on-proxmox-debian/</link><pubDate>Sat, 09 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/secure-boot-dkms-and-mok-on-proxmox-debian/</guid><description><p>I hit an issue where all GPU Operator pods on one node were stuck in Init after migrating from Legacy BIOS to UEFI. The common error was NVIDIA components waiting for “toolkit-ready,” while the toolkit init container looped with:</p>
|
||||
<ul>
|
||||
<li>nvidia-smi failed to communicate with the NVIDIA driver</li>
|
||||
<li>modprobe nvidia → “Key was rejected by service”</li>
|
||||
</ul>
|
||||
<p>That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.</p></description></item><item><title>Beyond Words: How RVQ Teaches LLMs to See and Hear</title><link>https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/how-rvq-teaches-llms-to-see-and-hear/</guid><description><p>Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?</p>
|
||||
<p>The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is <strong>Residual Vector Quantization (RVQ)</strong>.</p></description></item><item><title>Supabase Deep Dive: It's Not Magic, It's Just Postgres</title><link>https://ericxliu.me/posts/supabase-deep-dive/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/supabase-deep-dive/</guid><description><p>In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what&rsquo;s really going on.</p>
|
||||
<p>Supabase enters this space with a radically different philosophy: <strong>transparency</strong>. It provides the convenience of a BaaS, but it’s built on the world&rsquo;s most trusted relational database: PostgreSQL. The &ldquo;magic&rdquo; isn&rsquo;t a proprietary black box; it&rsquo;s a carefully assembled suite of open-source tools that enhance Postgres, not hide it.</p></description></item><item><title>A Deep Dive into PPO for Language Models</title><link>https://ericxliu.me/posts/ppo-for-language-models/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/ppo-for-language-models/</guid><description><p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p>
|
||||
<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.
|
||||
<img src="https://ericxliu.me/images/ppo-for-language-models/7713bd3ecf27442e939b9190fa08165d.png" alt="S3 File"></p></description></item><item><title>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice</title><link>https://ericxliu.me/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</link><pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/</guid><description><p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &ldquo;experts&rdquo;) to specialize in different types of inputs. A &ldquo;gating network&rdquo; or &ldquo;router&rdquo; learns to dispatch each input (or &ldquo;token&rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.</p>
|
||||
<h3 id="1-challenge-non-differentiability-of-routing-functions">
|
||||
1. Challenge: Non-Differentiability of Routing Functions
|
||||
<a class="heading-link" href="#1-challenge-non-differentiability-of-routing-functions">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h3>
|
||||
<p><strong>The Problem:</strong>
|
||||
Many routing mechanisms, especially &ldquo;Top-K routing,&rdquo; involve a discrete, hard selection process. A common function is <code>KeepTopK(v, k)</code>, which selects the top <code>k</code> scoring elements from a vector <code>v</code> and sets others to $-\infty$ or $0$.</p></description></item><item><title>An Architectural Deep Dive of T5</title><link>https://ericxliu.me/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/</link><pubDate>Sun, 01 Jun 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/</guid><description><p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p>
|
||||
<p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p></description></item><item><title>Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso</title><link>https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/</link><pubDate>Thu, 01 May 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/</guid><description><p>Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.</p>
|
||||
<p>Our overarching philosophy is simple: <strong>isolate and change only one variable at a time.</strong> While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your <strong>grind size</strong> is your most powerful lever.</p></description></item><item><title>Transformer's Core Mechanics</title><link>https://ericxliu.me/posts/transformer-s-core-mechanics/</link><pubDate>Tue, 01 Apr 2025 00:00:00 +0000</pubDate><guid>https://ericxliu.me/posts/transformer-s-core-mechanics/</guid><description><p>The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of &ldquo;channels&rdquo; to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.</p>
|
||||
<h3 id="1-the-channel-a-foundational-view-of-d_model">
|
||||
1. The &ldquo;Channel&rdquo;: A Foundational View of <code>d_model</code>
|
||||
<a class="heading-link" href="#1-the-channel-a-foundational-view-of-d_model">
|
||||
<i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
|
||||
<span class="sr-only">Link to heading</span>
|
||||
</a>
|
||||
</h3>
|
||||
<p>In deep learning, a &ldquo;channel&rdquo; can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model&rsquo;s primary embedding dimension, commonly referred to as <code>d_model</code>.</p></description></item><item><title>Some useful files</title><link>https://ericxliu.me/posts/useful/</link><pubDate>Mon, 26 Oct 2020 04:14:43 +0000</pubDate><guid>https://ericxliu.me/posts/useful/</guid><description><ul>
|
||||
<li><a href="https://ericxliu.me/rootCA.crt" >rootCA.pem</a></li>
|
||||
</ul></description></item></channel></rss>
|
||||
74
posts/jellyfin-sso-with-authentik/index.html
Normal file
@@ -0,0 +1,74 @@
|
||||
<!doctype html><html lang=en><head><title>Setting Up Jellyfin SSO with Authentik: Surviving the Beta · Eric X. Liu's Personal Page</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta http-equiv=Content-Security-Policy content="upgrade-insecure-requests; block-all-mixed-content; default-src 'self'; child-src 'self'; font-src 'self' https://fonts.gstatic.com https://cdn.jsdelivr.net/; form-action 'self'; frame-src 'self' https://www.youtube.com https://disqus.com; img-src 'self' https://referrer.disqus.com https://c.disquscdn.com https://*.disqus.com; object-src 'none'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com/ https://cdn.jsdelivr.net/; script-src 'self' 'unsafe-inline' https://www.google-analytics.com https://cdn.jsdelivr.net/ https://pagead2.googlesyndication.com https://static.cloudflareinsights.com https://unpkg.com https://ericxliu-me.disqus.com https://disqus.com https://*.disqus.com https://*.disquscdn.com https://unpkg.com; connect-src 'self' https://www.google-analytics.com https://pagead2.googlesyndication.com https://cloudflareinsights.com ws://localhost:1313 ws://localhost:* wss://localhost:* https://links.services.disqus.com https://*.disqus.com;"><meta name=author content="Eric X. Liu"><meta name=description content="I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren’t immediately obvious.
|
||||
|
||||
The Setup
|
||||
|
||||
|
||||
Link to heading
|
||||
|
||||
|
||||
The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Setting Up Jellyfin SSO with Authentik: Surviving the Beta"><meta name=twitter:description content="I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren’t immediately obvious.
|
||||
The Setup Link to heading The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent."><meta property="og:url" content="https://ericxliu.me/posts/jellyfin-sso-with-authentik/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Setting Up Jellyfin SSO with Authentik: Surviving the Beta"><meta property="og:description" content="I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren’t immediately obvious.
|
||||
The Setup Link to heading The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-11-15T00:00:00+00:00"><meta property="article:modified_time" content="2025-12-28T21:21:42+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/jellyfin-sso-with-authentik/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Setting Up Jellyfin SSO with Authentik: Surviving the Beta","genre":"Blog","wordcount":"516","url":"https:\/\/ericxliu.me\/posts\/jellyfin-sso-with-authentik\/","datePublished":"2025-11-15T00:00:00\u002b00:00","dateModified":"2025-12-28T21:21:42\u002b00:00","description":"\u003cp\u003eI recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren\u0026rsquo;t immediately obvious.\u003c\/p\u003e\n\u003ch2 id=\u0022the-setup\u0022\u003e\n The Setup\n \u003ca class=\u0022heading-link\u0022 href=\u0022#the-setup\u0022\u003e\n \u003ci class=\u0022fa-solid fa-link\u0022 aria-hidden=\u0022true\u0022 title=\u0022Link to heading\u0022\u003e\u003c\/i\u003e\n \u003cspan class=\u0022sr-only\u0022\u003eLink to heading\u003c\/span\u003e\n \u003c\/a\u003e\n\u003c\/h2\u003e\n\u003cp\u003eThe configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
|
||||
</a><input type=checkbox id=menu-toggle>
|
||||
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/jellyfin-sso-with-authentik/>Setting Up Jellyfin SSO with Authentik: Surviving the Beta</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
|
||||
<time datetime=2025-11-15T00:00:00Z>November 15, 2025
|
||||
</time></span><span class=reading-time><i class="fa-solid fa-clock" aria-hidden=true></i>
|
||||
3-minute read</span></div></div></header><div class=post-content><p>I recently integrated Jellyfin with Authentik for Single Sign-On (SSO). While the plugin works, it is still very much in an early development phase. The logging is often sparse or cryptic, and the feedback loop can be frustrating. Here is a guide focused on the obscure errors you might encounter and the simple fixes that aren’t immediately obvious.</p><h2 id=the-setup>The Setup
|
||||
<a class=heading-link href=#the-setup><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>The configuration is best handled via API (curl) rather than the UI, as it ensures all fields are correctly typed and persistent.</p><h3 id=1-authentik-terraform>1. Authentik (Terraform)
|
||||
<a class=heading-link href=#1-authentik-terraform><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p>Let Authentik manage the secrets. Don’t hardcode them.</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-hcl data-lang=hcl><span style=display:flex><span><span style=color:#ff7b72>resource</span> <span style=color:#a5d6ff>"authentik_provider_oauth2" "jellyfin"</span> {
|
||||
</span></span><span style=display:flex><span> name <span style=color:#ff7b72;font-weight:700>=</span> <span style=color:#a5d6ff>"Jellyfin"</span>
|
||||
</span></span><span style=display:flex><span> client_id <span style=color:#ff7b72;font-weight:700>=</span> <span style=color:#a5d6ff>"jellyfin-ericxliu-me"</span><span style=color:#8b949e;font-style:italic>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#8b949e;font-style:italic> # client_secret omitted -> auto-generated
|
||||
</span></span></span><span style=display:flex><span> property_mappings <span style=color:#ff7b72;font-weight:700>=</span> [
|
||||
</span></span><span style=display:flex><span> <span style=color:#ff7b72>authentik_scope_mapping</span>.<span style=color:#ff7b72>openid</span>.<span style=color:#ff7b72>id</span>,
|
||||
</span></span><span style=display:flex><span> <span style=color:#ff7b72>authentik_scope_mapping</span>.<span style=color:#ff7b72>profile</span>.<span style=color:#ff7b72>id</span>,
|
||||
</span></span><span style=display:flex><span> <span style=color:#ff7b72>authentik_scope_mapping</span>.<span style=color:#ff7b72>email</span>.<span style=color:#ff7b72>id</span>,
|
||||
</span></span><span style=display:flex><span> <span style=color:#ff7b72>authentik_scope_mapping</span>.<span style=color:#ff7b72>groups</span>.<span style=color:#ff7b72>id</span>
|
||||
</span></span><span style=display:flex><span> ]<span style=color:#8b949e;font-style:italic>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#8b949e;font-style:italic> # ...
|
||||
</span></span></span><span style=display:flex><span>}
|
||||
</span></span></code></pre></div><h3 id=2-jellyfin-plugin-bashcurl>2. Jellyfin Plugin (Bash/Curl)
|
||||
<a class=heading-link href=#2-jellyfin-plugin-bashcurl><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-bash data-lang=bash><span style=display:flex><span><span style=color:#8b949e;font-style:italic># ... (retrieve secret from terraform) ...</span>
|
||||
</span></span><span style=display:flex><span>curl -X POST <span style=color:#a5d6ff>"https://jellyfin.ericxliu.me/SSO/OID/Add/authentik"</span> ... -d <span style=color:#a5d6ff>'{
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> "OidClientId": "jellyfin-ericxliu-me",
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> "OidSecret": "'</span><span style=color:#a5d6ff>"</span><span style=color:#a5d6ff>${</span><span style=color:#79c0ff>SECRET</span><span style=color:#a5d6ff>}</span><span style=color:#a5d6ff>"</span><span style=color:#a5d6ff>'",
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> "OidScopes": ["openid", "profile", "email", "groups"],
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> "SchemeOverride": "https",
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> "RoleClaim": "groups"
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> }'</span>
|
||||
</span></span></code></pre></div><h2 id=obscure-errors--fixes>Obscure Errors & Fixes
|
||||
<a class=heading-link href=#obscure-errors--fixes><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>Because the plugin is still maturing, it doesn’t always handle configuration errors gracefully. Here are the two main “cryptic” failures I encountered.</p><h3 id=1-the-value-cannot-be-null-crash>1. The “Value cannot be null” Crash
|
||||
<a class=heading-link href=#1-the-value-cannot-be-null-crash><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p><strong>The Symptom</strong>:
|
||||
You attempt to start the SSO flow and get a generic 500 error. The Jellyfin logs show a C# exception:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-fallback data-lang=fallback><span style=display:flex><span>System.ArgumentNullException: Value cannot be null. (Parameter 'source')
|
||||
</span></span><span style=display:flex><span> at System.Linq.Enumerable.Prepend[TSource](IEnumerable`1 source, TSource element)
|
||||
</span></span><span style=display:flex><span> at Jellyfin.Plugin.SSO.Api.SSOController.OidChallenge(...)
|
||||
</span></span></code></pre></div><p><strong>The Reality</strong>:
|
||||
This looks like deep internal failure, but it’s actually a simple configuration miss. The plugin code attempts to prepend “openid profile” to your configured scopes without checking if your scopes array exists first.
|
||||
<strong>The Fix</strong>:
|
||||
You <strong>must</strong> explicitly provide <code>"OidScopes"</code> in your JSON configuration. It cannot be null or omitted.</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-json data-lang=json><span style=display:flex><span><span style=color:#a5d6ff>"OidScopes"</span><span style=color:#f85149>:</span> [<span style=color:#a5d6ff>"openid"</span>, <span style=color:#a5d6ff>"profile"</span>, <span style=color:#a5d6ff>"email"</span>, <span style=color:#a5d6ff>"groups"</span>]
|
||||
</span></span></code></pre></div><h3 id=2-the-httphttps-mismatch-redirect-loop>2. The HTTP/HTTPS Mismatch (Redirect Loop)
|
||||
<a class=heading-link href=#2-the-httphttps-mismatch-redirect-loop><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p><strong>The Symptom</strong>:
|
||||
Authentik rejects the authorization request with “Redirect URI mismatch”, or the browser enters a redirect loop.
|
||||
<strong>The Reality</strong>:
|
||||
Jellyfin often sits behind a reverse proxy (Ingress/Traefik) terminating TLS. Use <code>Browser Developer Tools</code> to inspect the network requests. You will likely see the <code>redirect_uri</code> parameter encoded as <code>http://jellyfin...</code> instead of <code>https://</code>. configuration.
|
||||
<strong>The Fix</strong>:
|
||||
Do not rely on header forwarding magic. Force the scheme in the plugin configuration:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-json data-lang=json><span style=display:flex><span><span style=color:#a5d6ff>"SchemeOverride"</span><span style=color:#f85149>:</span> <span style=color:#a5d6ff>"https"</span>
|
||||
</span></span></code></pre></div><h3 id=3-case-sensitivity-in-json>3. Case Sensitivity in JSON
|
||||
<a class=heading-link href=#3-case-sensitivity-in-json><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><p><strong>The Symptom</strong>: Configuration seems to be ignored or fields remain empty after a POST.
|
||||
<strong>The Reality</strong>: The plugin’s API controller keys are Case Sensitive in some versions/contexts.
|
||||
<strong>The Fix</strong>: Stick to PascalCase for the keys (<code>OidEndpoint</code>, <code>AdminRoles</code>) as seen in the C# DTOs, rather than camelCase (<code>oidEndpoint</code>), unless the specific version documentation explicitly states otherwise. When in doubt, checking the source code (<code>SSOController.cs</code>) is often faster than trusting the README.</p><h2 id=summary>Summary
|
||||
<a class=heading-link href=#summary><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>When debugging Jellyfin SSO, don’t trust the UI to tell you what’s wrong.</p><ol><li><strong>Check the logs</strong> (<code>kubectl logs</code>) for C# stack traces.</li><li><strong>Sanitize your JSON</strong> inputs (arrays can’t be null).</li><li><strong>Inspect the URL parameters</strong> in your browser to see what Redirect URI is actually being generated.</li></ol><h3 id=references>References
|
||||
<a class=heading-link href=#references><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h3><ul><li>Jellyfin SSO Plugin Repository: <code>https://github.com/9p4/jellyfin-plugin-sso</code></li><li>Authentik Documentation: <code>https://goauthentik.io/docs/providers/oauth2/</code></li><li>Jellyfin API Documentation: <code>https://api.jellyfin.org/</code></li></ul></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||