This commit is contained in:
eric
2025-10-04 17:44:24 +00:00
parent 7a88de8adc
commit c8d7b92351
25 changed files with 30 additions and 30 deletions

View File

@@ -8,7 +8,7 @@
NVIDIA’s Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But there’s a catch—one that reveals a fundamental tension in modern edge AI hardware design.
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the device&rsquo;s computational muscle sits largely idle during LLM inference. The bottleneck isn&rsquo;t computation—it&rsquo;s memory bandwidth. This isn&rsquo;t just a quirk of one device; it&rsquo;s a reality that affects how we should think about deploying LLMs at the edge."><meta name=keywords content="software engineer,performance engineering,Google engineer,tech blog,software development,performance optimization,Eric Liu,engineering blog,mountain biking,Jeep enthusiast,overlanding,camping,outdoor adventures"><meta name=twitter:card content="summary"><meta name=twitter:title content="Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"><meta name=twitter:description content="Introduction Link to heading NVIDIAs Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But theres a catch—one that reveals a fundamental tension in modern edge AI hardware design.
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the devices computational muscle sits largely idle during LLM inference. The bottleneck isnt computation—its memory bandwidth. This isnt just a quirk of one device; its a reality that affects how we should think about deploying LLMs at the edge."><meta property="og:url" content="/posts/benchmarking-llms-on-jetson-orin-nano/"><meta property="og:site_name" content="Eric X. Liu's Personal Page"><meta property="og:title" content="Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)"><meta property="og:description" content="Introduction Link to heading NVIDIAs Jetson Orin Nano promises impressive specs: 1024 CUDA cores, 32 Tensor Cores, and 40 TOPS of INT8 compute performance packed into a compact, power-efficient edge device. On paper, it looks like a capable platform for running Large Language Models locally. But theres a catch—one that reveals a fundamental tension in modern edge AI hardware design.
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the devices computational muscle sits largely idle during LLM inference. The bottleneck isnt computation—its memory bandwidth. This isnt just a quirk of one device; its a reality that affects how we should think about deploying LLMs at the edge."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-10-04T00:00:00+00:00"><meta property="article:modified_time" content="2025-10-04T05:52:46+00:00"><link rel=canonical href=/posts/benchmarking-llms-on-jetson-orin-nano/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.f03d6359cf766772af14fbe07ce6aca734b321c2e15acba0bbf4e2254941c460.css integrity="sha256-8D1jWc92Z3KvFPvgfOaspzSzIcLhWsugu/TiJUlBxGA=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/>Eric X. Liu's Personal Page
After running 66 inference tests across seven different language models ranging from 0.5B to 5.4B parameters, I discovered something counterintuitive: the devices computational muscle sits largely idle during LLM inference. The bottleneck isnt computation—its memory bandwidth. This isnt just a quirk of one device; its a reality that affects how we should think about deploying LLMs at the edge."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="posts"><meta property="article:published_time" content="2025-10-04T00:00:00+00:00"><meta property="article:modified_time" content="2025-10-04T17:44:11+00:00"><link rel=canonical href=/posts/benchmarking-llms-on-jetson-orin-nano/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.f03d6359cf766772af14fbe07ce6aca734b321c2e15acba0bbf4e2254941c460.css integrity="sha256-8D1jWc92Z3KvFPvgfOaspzSzIcLhWsugu/TiJUlBxGA=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/>Eric X. Liu's Personal Page
</a><input type=checkbox id=menu-toggle>
<label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/posts/>Posts</a></li><li class=navigation-item><a class=navigation-link href=https://chat.ericxliu.me>Chat</a></li><li class=navigation-item><a class=navigation-link href=https://git.ericxliu.me/user/oauth2/Authenitk>Git</a></li><li class=navigation-item><a class=navigation-link href=https://coder.ericxliu.me/api/v2/users/oidc/callback>Coder</a></li><li class=navigation-item><a class=navigation-link href=/>|</a></li><li class=navigation-item><a class=navigation-link href=https://sso.ericxliu.me>Sign in</a></li></ul></section></nav><div class=content><section class="container post"><article><header><div class=post-title><h1 class=title><a class=title-link href=/posts/benchmarking-llms-on-jetson-orin-nano/>Why Your Jetson Orin Nano's 40 TOPS Goes Unused (And What That Means for Edge AI)</a></h1></div><div class=post-meta><div class=date><span class=posted-on><i class="fa-solid fa-calendar" aria-hidden=true></i>
<time datetime=2025-10-04T00:00:00Z>October 4, 2025
@@ -21,7 +21,7 @@ After running 66 inference tests across seven different language models ranging
<a class=heading-link href=#testing-methodology><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h2><h3 id=the-models>The Models
<a class=heading-link href=#the-models><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>I tested seven models ranging from 0.5B to 5.4B parameters—essentially the entire practical deployment range for this hardware. The selection covered two inference backends (Ollama and vLLM) and various quantization strategies:</p><p><strong>Ollama-served models (with quantization):</strong></p><ul><li>Gemma 3 1B (Q4_K_M, 815MB)</li><li>Gemma 3n E2B (bfloat16, 11GB, 5.44B total params, 2B effective)</li><li>Qwen 2.5 0.5B (Q4_K_M, 350MB)</li><li>Qwen 3 0.6B (FP8, 600MB)</li></ul><p><strong>vLLM-served models (minimal/no quantization):</strong></p><ul><li>google/gemma-3-1b-it (FP16, 2GB)</li><li>Qwen/Qwen2.5-0.5B-Instruct (FP16, 1GB)</li><li>Qwen/Qwen3-0.6B-FP8 (FP8, 600MB)</li></ul><h3 id=the-testing-process>The Testing Process
<span class=sr-only>Link to heading</span></a></h3><p>I tested seven models ranging from 0.5B to 5.4B parameters—essentially the entire practical deployment range for this hardware. The selection covered two inference backends (Ollama and vLLM) and various quantization strategies:</p><p><strong>Ollama-served models (with quantization):</strong></p><ul><li>Gemma 3 1B (Q4_K_M, 815MB)</li><li>Gemma 3n E2B (Q4_K_M, 3.5GB, 5.44B total params, 2B effective)</li><li>Qwen 2.5 0.5B (Q4_K_M, 350MB)</li><li>Qwen 3 0.6B (FP8, 600MB)</li></ul><p><strong>vLLM-served models (minimal/no quantization):</strong></p><ul><li>google/gemma-3-1b-it (FP16, 2GB)</li><li>Qwen/Qwen2.5-0.5B-Instruct (FP16, 1GB)</li><li>Qwen/Qwen3-0.6B-FP8 (FP8, 600MB)</li></ul><h3 id=the-testing-process>The Testing Process
<a class=heading-link href=#the-testing-process><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>Each model faced 10-12 prompts of varying complexity—from simple arithmetic to technical explanations about LLMs themselves. Out of 84 planned tests, 66 completed successfully (78.6% success rate). The failures? Mostly out-of-memory crashes on larger models and occasional inference engine instability.</p><h3 id=understanding-the-limits-roofline-analysis>Understanding the Limits: Roofline Analysis
<a class=heading-link href=#understanding-the-limits-roofline-analysis><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
@@ -34,7 +34,7 @@ After running 66 inference tests across seven different language models ranging
<a class=heading-link href=#responsiveness-first-token-latency><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>The time to generate the first output token—a critical metric for interactive applications—varied significantly:</p><ul><li>qwen3:0.6b (Ollama): 0.522 seconds</li><li>gemma3:1b (Ollama): 1.000 seconds</li><li>qwen2.5:0.5b (Ollama): 1.415 seconds</li><li>gemma3n:e2b (Ollama): 1.998 seconds</li></ul><p>Smaller, quantized models get to that first token faster—exactly what you want for a chatbot or interactive assistant where perceived responsiveness matters as much as raw throughput.</p><h3 id=the-memory-bottleneck-revealed>The Memory Bottleneck Revealed
<a class=heading-link href=#the-memory-bottleneck-revealed><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>When I compared actual performance against theoretical limits, the results were striking:</p><table><thead><tr><th>Model</th><th>Theoretical (t/s)</th><th>Actual (t/s)</th><th>Efficiency</th><th>Bottleneck</th><th>OI (FLOPs/byte)</th></tr></thead><tbody><tr><td>gemma3:1b</td><td>109.90</td><td>26.33</td><td>24.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>qwen3:0.6b</td><td>103.03</td><td>38.84</td><td>37.7%</td><td>Memory</td><td>1.82</td></tr><tr><td>qwen2.5:0.5b</td><td>219.80</td><td>35.24</td><td>16.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>gemma3n:e2b</td><td>15.45</td><td>8.98</td><td>58.1%</td><td>Memory</td><td>0.91</td></tr><tr><td>google/gemma-3-1b-it</td><td>30.91</td><td>4.59</td><td>14.9%</td><td>Memory</td><td>0.91</td></tr><tr><td>Qwen/Qwen3-0.6B-FP8</td><td>103.03</td><td>12.81</td><td>12.4%</td><td>Memory</td><td>1.82</td></tr><tr><td>Qwen/Qwen2.5-0.5B-Instruct</td><td>61.82</td><td>15.18</td><td>24.6%</td><td>Memory</td><td>0.91</td></tr></tbody></table><p>Every single model is memory-bound. Average hardware efficiency sits at just 26.8%—meaning the computational units spend most of their time waiting for data rather than crunching numbers. That advertised 40 TOPS? Largely untapped.
<span class=sr-only>Link to heading</span></a></h3><p>When I compared actual performance against theoretical limits, the results were striking:</p><table><thead><tr><th>Model</th><th>Theoretical (t/s)</th><th>Actual (t/s)</th><th>Efficiency</th><th>Bottleneck</th><th>OI (FLOPs/byte)</th></tr></thead><tbody><tr><td>gemma3:1b</td><td>109.90</td><td>26.33</td><td>24.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>qwen3:0.6b</td><td>103.03</td><td>38.84</td><td>37.7%</td><td>Memory</td><td>1.82</td></tr><tr><td>qwen2.5:0.5b</td><td>219.80</td><td>35.24</td><td>16.0%</td><td>Memory</td><td>3.23</td></tr><tr><td>gemma3n:e2b</td><td>54.95</td><td>8.98</td><td>16.3%</td><td>Memory</td><td>3.23</td></tr><tr><td>google/gemma-3-1b-it</td><td>30.91</td><td>4.59</td><td>14.9%</td><td>Memory</td><td>0.91</td></tr><tr><td>Qwen/Qwen3-0.6B-FP8</td><td>103.03</td><td>12.81</td><td>12.4%</td><td>Memory</td><td>1.82</td></tr><tr><td>Qwen/Qwen2.5-0.5B-Instruct</td><td>61.82</td><td>15.18</td><td>24.6%</td><td>Memory</td><td>0.91</td></tr></tbody></table><p>Every single model is memory-bound. Average hardware efficiency sits at just 20.8%—meaning the computational units spend most of their time waiting for data rather than crunching numbers. That advertised 40 TOPS? Largely untapped.
<img src=/images/benchmarking-llms-on-jetson-orin-nano/ee04876d75d247f9b27a647462555777.png alt="S3 File"></p><h2 id=what-this-actually-means>What This Actually Means
<a class=heading-link href=#what-this-actually-means><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h2><h3 id=why-memory-bandwidth-dominates>Why Memory Bandwidth Dominates
@@ -42,13 +42,13 @@ After running 66 inference tests across seven different language models ranging
<span class=sr-only>Link to heading</span></a></h3><p>The roofline numbers tell a clear story: operational intensity ranges from 0.91 to 3.23 FLOPs/byte across all tested models. To actually saturate those 1024 CUDA cores and hit compute-bound operation, you&rsquo;d need an operational intensity around 147 FLOPs/byte at the device&rsquo;s 68 GB/s memory bandwidth.</p><p>In practice, for a model to actually become compute-bound on this device, it would need an operational intensity exceeding:</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-fallback data-lang=fallback><span style=display:flex><span>OI_threshold = Peak_Compute / Memory_Bandwidth
</span></span><span style=display:flex><span> = (40 × 10^12 ops/s) / (68 × 10^9 bytes/s)
</span></span><span style=display:flex><span> = 588 FLOPs/byte
</span></span></code></pre></div><p>Current LLM architectures fall 100-600× short of this threshold during autoregressive decoding. The compute units are idle most of the time, simply waiting for model weights and activations to arrive from memory.</p><p>Interestingly, the largest model tested—gemma3n:e2b at 11GB and 5.44B parameters—achieved the highest efficiency at 58.1%. This makes sense: its massive 4.4 GB/token memory requirement means it&rsquo;s saturating the memory bandwidth, so actual performance approaches the theoretical ceiling. The model&rsquo;s Mixture-of-Experts architecture helps too, since it only activates a subset of parameters per token, reducing memory movement while maintaining model capacity.</p><h3 id=what-this-means-for-deployment>What This Means for Deployment
</span></span></code></pre></div><p>Current LLM architectures fall 100-600× short of this threshold during autoregressive decoding. The compute units are idle most of the time, simply waiting for model weights and activations to arrive from memory.</p><p>The largest model tested—gemma3n:e2b at 3.5GB quantized (5.44B total parameters, 2B effective)—shows only 16.3% efficiency, similar to other quantized models. Despite being the largest model, Q4_K_M quantization keeps its memory footprint manageable, resulting in similar operational intensity (3.23 FLOPs/byte) to the other INT4-quantized models. Its MatFormer architecture with selective parameter activation (only 2B of 5.44B params active per token) actually helps reduce memory traffic, though this benefit is partially offset by the overhead of routing logic.</p><h3 id=what-this-means-for-deployment>What This Means for Deployment
<a class=heading-link href=#what-this-means-for-deployment><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>The performance gap between Ollama and vLLM (2.3-5.7×) tells us something important about optimization priorities for edge devices:</p><p><strong>Qwen 2.5 0.5B:</strong> Ollama (Q4_K_M, 350MB) at 35.24 t/s vs vLLM (FP16, 1GB) at 15.18 t/s—2.32× faster
<strong>Qwen 3 0.6B:</strong> Ollama (FP8) at 38.84 t/s vs vLLM (FP8) at 12.81 t/s—3.03× faster despite identical quantization
<strong>Gemma 3 1B:</strong> Ollama (Q4_K_M, 815MB) at 26.33 t/s vs vLLM (FP16, 2GB) at 4.59 t/s—5.74× faster</p><p>Quantization delivers near-linear performance gains by directly attacking the memory bandwidth bottleneck. Q4_K_M quantization (4.5 bits/parameter) hits a sweet spot between model quality and speed. Going lower to INT2 might help further, but you&rsquo;ll need to carefully evaluate output quality.</p><p>The real insight: Ollama&rsquo;s edge-first design philosophy (GGUF format, streamlined execution, optimized kernels from llama.cpp) is fundamentally better aligned with single-stream, memory-constrained edge scenarios. vLLM&rsquo;s datacenter features—continuous batching, PagedAttention, tensor parallelism—add overhead without providing benefits on unified memory architectures serving single users.</p><p><strong>What you should actually do</strong>: Stick with Ollama or TensorRT-LLM using Q4_K_M/INT4 quantized models in GGUF format. Target the 0.5-1B parameter range (under 3GB) to leave headroom for KV cache. Focus your optimization efforts on memory access patterns and bandwidth reduction. Watch for emerging techniques like INT4 AWQ, sparse attention, and quantized KV caches.</p><h3 id=room-for-improvement>Room for Improvement
<a class=heading-link href=#room-for-improvement><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>The 26.8% average efficiency might sound terrible, but it&rsquo;s actually typical for edge AI devices. Datacenter GPUs hit 60-80% on optimized workloads, while edge devices commonly land in the 30-50% range due to architectural tradeoffs.</p><p>Three factors explain the gap:</p><ol><li><strong>Architecture</strong>: Unified memory sacrifices bandwidth for integration simplicity. The 4MB L2 cache and 7-15W TDP limit further constrain performance.</li><li><strong>Software maturity</strong>: Edge inference frameworks lag behind their datacenter counterparts in optimization.</li><li><strong>Runtime overhead</strong>: Quantization/dequantization operations, Python abstractions, and non-optimized kernels all add up.</li></ol><p>The gemma3n:e2b model proving that 58.1% is achievable suggests smaller models could see 2-3× speedups through better software. But fundamental performance leaps will require hardware changes—specifically, prioritizing memory bandwidth (200+ GB/s) over raw compute capability in future edge AI chips.</p><h2 id=where-to-go-from-here>Where to Go From Here
<span class=sr-only>Link to heading</span></a></h3><p>The 20.8% average efficiency might sound terrible, but it&rsquo;s actually typical for edge AI devices. Datacenter GPUs hit 60-80% on optimized workloads, while edge devices commonly land in the 15-40% range due to architectural tradeoffs and memory bandwidth constraints.</p><p>Three factors explain the gap:</p><ol><li><strong>Architecture</strong>: Unified memory sacrifices bandwidth for integration simplicity. The 4MB L2 cache and 7-15W TDP limit further constrain performance.</li><li><strong>Software maturity</strong>: Edge inference frameworks lag behind their datacenter counterparts in optimization.</li><li><strong>Runtime overhead</strong>: Quantization/dequantization operations, Python abstractions, and non-optimized kernels all add up.</li></ol><p>The consistent 16-24% efficiency across most models suggests there&rsquo;s room for 2-3× speedups through better software optimization—particularly in memory access patterns and kernel implementations. But fundamental performance leaps will require hardware changes—specifically, prioritizing memory bandwidth (200+ GB/s) over raw compute capability in future edge AI chips.</p><h2 id=where-to-go-from-here>Where to Go From Here
<a class=heading-link href=#where-to-go-from-here><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h2><h3 id=software-optimizations-worth-pursuing>Software Optimizations Worth Pursuing
<a class=heading-link href=#software-optimizations-worth-pursuing><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
@@ -62,4 +62,4 @@ After running 66 inference tests across seven different language models ranging
2016 -
2025
Eric X. Liu
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/0e4b419">[0e4b419]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/85e0d05">[85e0d05]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>