deploy: f7528b364e

2026-01-08 18:14:12 +00:00
parent 9c66ed1b1b
commit 4355096bdc
35 changed files with 54 additions and 54 deletions
--- a/posts/benchmarking-llms-on-jetson-orin-nano/index.html
+++ b/posts/benchmarking-llms-on-jetson-orin-nano/index.html
@@ -8,7 +8,7 @@
 NVIDIA&rsquo;s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there&rsquo;s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
 After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device&rsquo;s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn&rsquo;t computation—it&rsquo;s memory bandwidth. This isn&rsquo;t just a quirk of one device; it&rsquo;s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"><meta name=twitter:description content="Introduction Link to heading NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
 After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta property="og:url" content="https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"><meta property="og:description" content="Introduction Link to heading NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
-After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-10-04T00:00:00+00:00"><meta property="article:modified_time" content="2025-10-04T20:41:50+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Why Your Jetson Orin Nano\u0027s 40 TOPS Goes Unused (And What That Means for Edge AI)","genre":"Blog","wordcount":"1866","url":"https:\/\/ericxliu.me\/posts\/benchmarking-llms-on-jetson-orin-nano\/","datePublished":"2025-10-04T00:00:00\u002b00:00","dateModified":"2025-10-04T20:41:50\u002b00:00","description":"\u003ch2 id=\u0022introduction\u0022\u003e\n  Introduction\n  \u003ca class=\u0022heading-link\u0022 href=\u0022#introduction\u0022\u003e\n    \u003ci class=\u0022fa-solid fa-link\u0022 aria-hidden=\u0022true\u0022 title=\u0022Link to heading\u0022\u003e\u003c\/i\u003e\n    \u003cspan class=\u0022sr-only\u0022\u003eLink to heading\u003c\/span\u003e\n  \u003c\/a\u003e\n\u003c\/h2\u003e\n\u003cp\u003eNVIDIA\u0026rsquo;s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there\u0026rsquo;s a catch—one that reveals a fundamental tension in modern edge AI hardware design.\u003c\/p\u003e\n\u003cp\u003eAfter running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device\u0026rsquo;s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn\u0026rsquo;t computation—it\u0026rsquo;s memory bandwidth. This isn\u0026rsquo;t just a quirk of one device; it\u0026rsquo;s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
+After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device’s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn’t computation—it’s memory bandwidth. This isn’t just a quirk of one device; it’s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-10-04T00:00:00+00:00"><meta property="article:modified_time" content="2026-01-08T18:13:13+00:00"><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=canonical href=https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.4b392a85107b91dbdabc528edf014a6ab1a30cd44cafcd5325c8efe796794fca.css integrity="sha256-SzkqhRB7kdvavFKO3wFKarGjDNRMr81TJcjv55Z5T8o=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3972604619956476" crossorigin=anonymous></script><script type=application/ld+json>{"@context":"http://schema.org","@type":"Person","name":"Eric X. Liu","url":"https:\/\/ericxliu.me\/","description":"Software \u0026 Performance Engineer at Google","sameAs":["https:\/\/www.linkedin.com\/in\/eric-x-liu-46648b93\/","https:\/\/git.ericxliu.me\/eric"]}</script><script type=application/ld+json>{"@context":"http://schema.org","@type":"BlogPosting","headline":"Why Your Jetson Orin Nano\u0027s 40 TOPS Goes Unused (And What That Means for Edge AI)","genre":"Blog","wordcount":"1866","url":"https:\/\/ericxliu.me\/posts\/benchmarking-llms-on-jetson-orin-nano\/","datePublished":"2025-10-04T00:00:00\u002b00:00","dateModified":"2026-01-08T18:13:13\u002b00:00","description":"\u003ch2 id=\u0022introduction\u0022\u003e\n  Introduction\n  \u003ca class=\u0022heading-link\u0022 href=\u0022#introduction\u0022\u003e\n    \u003ci class=\u0022fa-solid fa-link\u0022 aria-hidden=\u0022true\u0022 title=\u0022Link to heading\u0022\u003e\u003c\/i\u003e\n    \u003cspan class=\u0022sr-only\u0022\u003eLink to heading\u003c\/span\u003e\n  \u003c\/a\u003e\n\u003c\/h2\u003e\n\u003cp\u003eNVIDIA\u0026rsquo;s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there\u0026rsquo;s a catch—one that reveals a fundamental tension in modern edge AI hardware design.\u003c\/p\u003e\n\u003cp\u003eAfter running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device\u0026rsquo;s computational muscle sits largely idle during single-stream LLM inference. The bottleneck isn\u0026rsquo;t computation—it\u0026rsquo;s memory bandwidth. This isn\u0026rsquo;t just a quirk of one device; it\u0026rsquo;s a fundamental characteristic of single-user, autoregressive token generation on edge hardware—a reality that shapes how we should approach local LLM deployment.\u003c\/p\u003e","author":{"@type":"Person","name":"Eric X. Liu"}}</script></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://ericxliu.me/>Eric X. Liu's Personal Page
 </a><input type=checkbox id=menu-toggle>
 <label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/about/>About</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
 <time datetime=2025-10-04T00:00:00Z>October 4, 2025
@@ -25,17 +25,17 @@ After running 66 inference tests across seven different language models ranging
 <a class=heading-link href=#the-testing-process><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 <span class=sr-only>Link to heading</span></a></h3><p>Each model faced 10-12 prompts of varying complexity—from simple arithmetic to technical explanations about LLMs themselves. All tests ran with batch size = 1, simulating a single user interacting with a local chatbot—the typical edge deployment scenario. Out of 84 planned tests, 66 completed successfully (78.6% success rate). The failures? Mostly out-of-memory crashes on larger models and occasional inference engine instability.</p><h3 id=understanding-the-limits-roofline-analysis>Understanding the Limits: Roofline Analysis
 <a class=heading-link href=#understanding-the-limits-roofline-analysis><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
-<span class=sr-only>Link to heading</span></a></h3><p>To understand where performance hits its ceiling, I applied roofline analysis—a method that reveals whether a workload is compute-bound (limited by processing power) or memory-bound (limited by data transfer speed). For each model, I calculated:</p><ul><li><strong>FLOPs per token</strong>: Approximately 2 × total_parameters (accounting for matrix multiplications in forward pass)</li><li><strong>Bytes per token</strong>: model_size × 1.1 (including 10% overhead for activations and KV cache)</li><li><strong>Operational Intensity (OI)</strong>: FLOPs per token / Bytes per token</li><li><strong>Theoretical performance</strong>: min(compute_limit, bandwidth_limit)</li></ul><p>The roofline model works by comparing a workload&rsquo;s operational intensity (how many calculations you do per byte of data moved) against the device&rsquo;s balance point. If your operational intensity is too low, you&rsquo;re bottlenecked by memory bandwidth—and as we&rsquo;ll see, that&rsquo;s exactly what happens with LLM inference.</p><p><img src=/images/benchmarking-llms-on-jetson-orin-nano/16d64bdc9cf14b05b7c40c4718b8091b.png alt="S3 File"></p><h2 id=the-results-speed-and-efficiency>The Results: Speed and Efficiency
+<span class=sr-only>Link to heading</span></a></h3><p>To understand where performance hits its ceiling, I applied roofline analysis—a method that reveals whether a workload is compute-bound (limited by processing power) or memory-bound (limited by data transfer speed). For each model, I calculated:</p><ul><li><strong>FLOPs per token</strong>: Approximately 2 × total_parameters (accounting for matrix multiplications in forward pass)</li><li><strong>Bytes per token</strong>: model_size × 1.1 (including 10% overhead for activations and KV cache)</li><li><strong>Operational Intensity (OI)</strong>: FLOPs per token / Bytes per token</li><li><strong>Theoretical performance</strong>: min(compute_limit, bandwidth_limit)</li></ul><p>The roofline model works by comparing a workload&rsquo;s operational intensity (how many calculations you do per byte of data moved) against the device&rsquo;s balance point. If your operational intensity is too low, you&rsquo;re bottlenecked by memory bandwidth—and as we&rsquo;ll see, that&rsquo;s exactly what happens with LLM inference.</p><p><img src="http://localhost:4998/attachments/image-79378d40267258c0d8968238cc62bd197dc894fa.png?client=default&amp;bucket=obsidian" alt="S3 File"></p><h2 id=the-results-speed-and-efficiency>The Results: Speed and Efficiency
 <a class=heading-link href=#the-results-speed-and-efficiency><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 <span class=sr-only>Link to heading</span></a></h2><h3 id=what-actually-runs-fast>What Actually Runs Fast
 <a class=heading-link href=#what-actually-runs-fast><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 <span class=sr-only>Link to heading</span></a></h3><p>Here&rsquo;s how the models ranked by token generation speed:</p><table><thead><tr><th>Rank</th><th>Model</th><th>Backend</th><th>Avg Speed (t/s)</th><th>Std Dev</th><th>Success Rate</th></tr></thead><tbody><tr><td>1</td><td>qwen3:0.6b</td><td>Ollama</td><td>38.84</td><td>1.42</td><td>100%</td></tr><tr><td>2</td><td>qwen2.5:0.5b</td><td>Ollama</td><td>35.24</td><td>2.72</td><td>100%</td></tr><tr><td>3</td><td>gemma3:1b</td><td>Ollama</td><td>26.33</td><td>2.56</td><td>100%</td></tr><tr><td>4</td><td>Qwen/Qwen2.5-0.5B-Instruct</td><td>vLLM</td><td>15.18</td><td>2.15</td><td>100%</td></tr><tr><td>5</td><td>Qwen/Qwen3-0.6B-FP8</td><td>vLLM</td><td>12.81</td><td>0.36</td><td>100%</td></tr><tr><td>6</td><td>gemma3n:e2b</td><td>Ollama</td><td>8.98</td><td>1.22</td><td>100%</td></tr><tr><td>7</td><td>google/gemma-3-1b-it</td><td>vLLM</td><td>4.59</td><td>1.52</td><td>100%</td></tr></tbody></table><p>The standout finding: quantized sub-1B models hit 25-40 tokens/second, with Ollama consistently outperforming vLLM by 2-6× thanks to aggressive quantization and edge-optimized execution. These numbers align well with independent benchmarks from NVIDIA&rsquo;s Jetson AI Lab (Llama 3.2 3B at 27.7 t/s, SmolLM2 at 41 t/s), confirming this is typical performance for the hardware class.
-<img src=/images/benchmarking-llms-on-jetson-orin-nano/ee04876d75d247f9b27a647462555777.png alt="S3 File"></p><h3 id=responsiveness-first-token-latency>Responsiveness: First Token Latency
+<img src="http://localhost:4998/attachments/image-7913a54157c2f4b8d0b7f961640a9c359b2d2a4f.png?client=default&amp;bucket=obsidian" alt="S3 File"></p><h3 id=responsiveness-first-token-latency>Responsiveness: First Token Latency
 <a class=heading-link href=#responsiveness-first-token-latency><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 <span class=sr-only>Link to heading</span></a></h3><p>The time to generate the first output token—a critical metric for interactive applications—varied significantly:</p><ul><li>qwen3:0.6b (Ollama): 0.522 seconds</li><li>gemma3:1b (Ollama): 1.000 seconds</li><li>qwen2.5:0.5b (Ollama): 1.415 seconds</li><li>gemma3n:e2b (Ollama): 1.998 seconds</li></ul><p>Smaller, quantized models get to that first token faster—exactly what you want for a chatbot or interactive assistant where perceived responsiveness matters as much as raw throughput.</p><h3 id=the-memory-bottleneck-revealed>The Memory Bottleneck Revealed
 <a class=heading-link href=#the-memory-bottleneck-revealed><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 <span class=sr-only>Link to heading</span></a></h3><p>When I compared actual performance against theoretical limits, the results were striking:</p><table><thead><tr><th>Model</th><th>Theoretical (t/s)</th><th>Actual (t/s)</th><th>Efficiency</th><th>Bottleneck</th><th>OI (FLOPs/byte)</th></tr></thead><tbody><tr><td>gemma3:1b</td><td>109.90</td><td>26.33</td><td>24.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>qwen3:0.6b</td><td>103.03</td><td>38.84</td><td>37.7%</td><td>Memory</td><td>1.82</td></tr><tr><td>qwen2.5:0.5b</td><td>219.80</td><td>35.24</td><td>16.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>gemma3n:e2b</td><td>54.95</td><td>8.98</td><td>16.3%</td><td>Memory</td><td>3.23</td></tr><tr><td>google/gemma-3-1b-it</td><td>30.91</td><td>4.59</td><td>14.9%</td><td>Memory</td><td>0.91</td></tr><tr><td>Qwen/Qwen3-0.6B-FP8</td><td>103.03</td><td>12.81</td><td>12.4%</td><td>Memory</td><td>1.82</td></tr><tr><td>Qwen/Qwen2.5-0.5B-Instruct</td><td>61.82</td><td>15.18</td><td>24.6%</td><td>Memory</td><td>0.91</td></tr></tbody></table><p>Every single model is memory-bound in this single-stream inference scenario. Average hardware efficiency sits at just 20.8%—meaning the computational units spend most of their time waiting for data rather than crunching numbers. That advertised 40 TOPS? Largely untapped when generating one token at a time for a single user.
-<img src=/images/benchmarking-llms-on-jetson-orin-nano/ee04876d75d247f9b27a647462555777.png alt="S3 File"></p><h2 id=what-this-actually-means>What This Actually Means
+<img src="http://localhost:4998/attachments/image-7913a54157c2f4b8d0b7f961640a9c359b2d2a4f.png?client=default&amp;bucket=obsidian" alt="S3 File"></p><h2 id=what-this-actually-means>What This Actually Means
 <a class=heading-link href=#what-this-actually-means><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 <span class=sr-only>Link to heading</span></a></h2><h3 id=why-memory-bandwidth-dominates-in-single-stream-inference>Why Memory Bandwidth Dominates (in Single-Stream Inference)
 <a class=heading-link href=#why-memory-bandwidth-dominates-in-single-stream-inference><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
@@ -62,4 +62,4 @@ After running 66 inference tests across seven different language models ranging
 2016 -
 2026
 Eric X. Liu
-<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/3b1396d">[3b1396d]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
+<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/f7528b3">[f7528b3]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>