This commit is contained in:
eric
2026-01-03 06:28:22 +00:00
parent 346f1f1450
commit 41ec0626e2
29 changed files with 32 additions and 32 deletions

View File

@@ -60,6 +60,6 @@ After running 66 inference tests across seven different language models ranging
<a class=heading-link href=#references><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h2><ol><li><p>Williams, S., Waterman, A., & Patterson, D. (2009). &ldquo;Roofline: An Insightful Visual Performance Model for Multicore Architectures.&rdquo; <em>Communications of the ACM</em>, 52(4), 65-76.</p></li><li><p>NVIDIA Corporation. (2024). &ldquo;Jetson Orin Nano Developer Kit Technical Specifications.&rdquo; <a href=https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit class=external-link target=_blank rel=noopener>https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit</a></p></li><li><p>&ldquo;Jetson AI Lab Benchmarks.&rdquo; NVIDIA Jetson AI Lab. <a href=https://www.jetson-ai-lab.com/benchmarks.html class=external-link target=_blank rel=noopener>https://www.jetson-ai-lab.com/benchmarks.html</a></p></li><li><p>Gerganov, G., et al. (2023). &ldquo;GGML - AI at the edge.&rdquo; <em>GitHub</em>. <a href=https://github.com/ggerganov/ggml class=external-link target=_blank rel=noopener>https://github.com/ggerganov/ggml</a></p></li><li><p>Kwon, W., et al. (2023). &ldquo;Efficient Memory Management for Large Language Model Serving with PagedAttention.&rdquo; <em>Proceedings of SOSP 2023</em>.</p></li><li><p>Team, G., Mesnard, T., et al. (2025). &ldquo;Gemma 3: Technical Report.&rdquo; <em>arXiv preprint arXiv:2503.19786v1</em>. <a href=https://arxiv.org/html/2503.19786v1 class=external-link target=_blank rel=noopener>https://arxiv.org/html/2503.19786v1</a></p></li><li><p>Yang, A., et al. (2025). &ldquo;Qwen3 Technical Report.&rdquo; <em>arXiv preprint arXiv:2505.09388</em>. <a href=https://arxiv.org/pdf/2505.09388 class=external-link target=_blank rel=noopener>https://arxiv.org/pdf/2505.09388</a></p></li><li><p>DeepSeek-AI. (2025). &ldquo;DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.&rdquo; <em>arXiv preprint arXiv:2501.12948v1</em>. <a href=https://arxiv.org/html/2501.12948v1 class=external-link target=_blank rel=noopener>https://arxiv.org/html/2501.12948v1</a></p></li><li><p>&ldquo;Running LLMs with TensorRT-LLM on NVIDIA Jetson Orin Nano Super.&rdquo; Collabnix. <a href=https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/ class=external-link target=_blank rel=noopener>https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/</a></p></li><li><p>Pope, R., et al. (2022). &ldquo;Efficiently Scaling Transformer Inference.&rdquo; <em>Proceedings of MLSys 2022</em>.</p></li><li><p>Frantar, E., et al. (2023). &ldquo;GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.&rdquo; <em>Proceedings of ICLR 2023</em>.</p></li><li><p>Dettmers, T., et al. (2023). &ldquo;QLoRA: Efficient Finetuning of Quantized LLMs.&rdquo; <em>Proceedings of NeurIPS 2023</em>.</p></li><li><p>Lin, J., et al. (2023). &ldquo;AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.&rdquo; <em>arXiv preprint arXiv:2306.00978</em>.</p></li></ol></div><footer><div id=disqus_thread></div><script>window.disqus_config=function(){},function(){if(["localhost","127.0.0.1"].indexOf(window.location.hostname)!=-1){document.getElementById("disqus_thread").innerHTML="Disqus comments not available by default when the website is previewed locally.";return}var t=document,e=t.createElement("script");e.async=!0,e.src="//ericxliu-me.disqus.com/embed.js",e.setAttribute("data-timestamp",+new Date),(t.head||t.body).appendChild(e)}(),document.addEventListener("themeChanged",function(){document.readyState=="complete"&&DISQUS.reset({reload:!0,config:disqus_config})})</script></footer></article><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css integrity=sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0 crossorigin=anonymous><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js integrity=sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4 crossorigin=anonymous></script><script defer src=https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js integrity=sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05 crossorigin=anonymous onload='renderMathInElement(document.body,{delimiters:[{left:"$$",right:"$$",display:!0},{left:"$",right:"$",display:!1},{left:"\\(",right:"\\)",display:!1},{left:"\\[",right:"\\]",display:!0}]})'></script></section></div><footer class=footer><section class=container>©
2016 -
2025
2026
Eric X. Liu
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/f1178d3">[f1178d3]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>