diff --git a/404.html b/404.html index 75f0f1d..354a8ac 100644 --- a/404.html +++ b/404.html @@ -2,6 +2,6 @@

404

Page Not Found

Sorry, this page does not exist.
You can head back to the homepage.

\ No newline at end of file diff --git a/about/index.html b/about/index.html index c7c261b..527e02d 100644 --- a/about/index.html +++ b/about/index.html @@ -11,6 +11,6 @@ My work focuses on Infrastructure Performance and Customer Engineering, specific Link to heading

I’m a tinkerer at heart, whether digital or physical:

Welcome to my corner of the internet.

\ No newline at end of file diff --git a/authors/index.html b/authors/index.html index 5a81e24..8ee776c 100644 --- a/authors/index.html +++ b/authors/index.html @@ -2,6 +2,6 @@

Authors

\ No newline at end of file diff --git a/categories/index.html b/categories/index.html index 30d47b1..988c2b6 100644 --- a/categories/index.html +++ b/categories/index.html @@ -2,6 +2,6 @@

Categories

\ No newline at end of file diff --git a/index.html b/index.html index b7459cd..5184207 100644 --- a/index.html +++ b/index.html @@ -1,8 +1,8 @@ -Eric X. Liu's Personal Page
avatar

Eric X. Liu

  • +
\ No newline at end of file diff --git a/posts/benchmarking-llms-on-jetson-orin-nano/index.html b/posts/benchmarking-llms-on-jetson-orin-nano/index.html index 9c4b478..a45ecdd 100644 --- a/posts/benchmarking-llms-on-jetson-orin-nano/index.html +++ b/posts/benchmarking-llms-on-jetson-orin-nano/index.html @@ -60,6 +60,6 @@ After running 66 inference tests across seven different language models ranging Link to heading
  1. Williams, S., Waterman, A., & Patterson, D. (2009). “Roofline: An Insightful Visual Performance Model for Multicore Architectures.” Communications of the ACM, 52(4), 65-76.

  2. NVIDIA Corporation. (2024). “Jetson Orin Nano Developer Kit Technical Specifications.” https://developer.nvidia.com/embedded/jetson-orin-nano-developer-kit

  3. “Jetson AI Lab Benchmarks.” NVIDIA Jetson AI Lab. https://www.jetson-ai-lab.com/benchmarks.html

  4. Gerganov, G., et al. (2023). “GGML - AI at the edge.” GitHub. https://github.com/ggerganov/ggml

  5. Kwon, W., et al. (2023). “Efficient Memory Management for Large Language Model Serving with PagedAttention.” Proceedings of SOSP 2023.

  6. Team, G., Mesnard, T., et al. (2025). “Gemma 3: Technical Report.” arXiv preprint arXiv:2503.19786v1. https://arxiv.org/html/2503.19786v1

  7. Yang, A., et al. (2025). “Qwen3 Technical Report.” arXiv preprint arXiv:2505.09388. https://arxiv.org/pdf/2505.09388

  8. DeepSeek-AI. (2025). “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv preprint arXiv:2501.12948v1. https://arxiv.org/html/2501.12948v1

  9. “Running LLMs with TensorRT-LLM on NVIDIA Jetson Orin Nano Super.” Collabnix. https://collabnix.com/running-llms-with-tensorrt-llm-on-nvidia-jetson-orin-nano-super/

  10. Pope, R., et al. (2022). “Efficiently Scaling Transformer Inference.” Proceedings of MLSys 2022.

  11. Frantar, E., et al. (2023). “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.” Proceedings of ICLR 2023.

  12. Dettmers, T., et al. (2023). “QLoRA: Efficient Finetuning of Quantized LLMs.” Proceedings of NeurIPS 2023.

  13. Lin, J., et al. (2023). “AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.” arXiv preprint arXiv:2306.00978.

\ No newline at end of file diff --git a/posts/breville-barista-pro-maintenance/index.html b/posts/breville-barista-pro-maintenance/index.html index 68973e2..3d5750d 100644 --- a/posts/breville-barista-pro-maintenance/index.html +++ b/posts/breville-barista-pro-maintenance/index.html @@ -23,6 +23,6 @@ Understanding the Two Primary Maintenance Cycles Link to heading The Breville Ba Link to heading

In addition to the automated cycles, regular manual cleaning is essential for machine health.

Daily Tasks:

Weekly Tasks:

Periodic Tasks (Every 2-3 Months):

By adhering to this comprehensive maintenance schedule, you can ensure your Breville Barista Pro operates at peak performance and consistently produces high-quality espresso.


Reference:

\ No newline at end of file diff --git a/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html b/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html index bbc0dd5..c84ef35 100644 --- a/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html +++ b/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro/index.html @@ -18,6 +18,6 @@ Our overarching philosophy is simple: isolate and change only one variable at a Link to heading

This systematic process will get you to a delicious shot from your Breville Barista Pro efficiently:

  1. Set Your Constants:
  2. Make an Initial Grind:
  3. Pull the First Shot:
  4. Taste and Diagnose:
  5. Make ONE Adjustment - THE GRIND SIZE:
  6. Re-adjust and Repeat:

Happy brewing! With patience and this systematic approach, you’ll be pulling consistently delicious espresso shots from your Breville Barista Pro in no time.

\ No newline at end of file diff --git a/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html b/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html index d9b691c..10ca08f 100644 --- a/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html +++ b/posts/flashing-jetson-orin-nano-in-virtualized-environments/index.html @@ -166,6 +166,6 @@ Flashing NVIDIA Jetson devices remotely presents unique challenges when the host Link to heading \ No newline at end of file diff --git a/posts/how-rvq-teaches-llms-to-see-and-hear/index.html b/posts/how-rvq-teaches-llms-to-see-and-hear/index.html index 9fa753b..60d975d 100644 --- a/posts/how-rvq-teaches-llms-to-see-and-hear/index.html +++ b/posts/how-rvq-teaches-llms-to-see-and-hear/index.html @@ -16,6 +16,6 @@ The answer lies in creating a universal language—a bridge between the continuo Link to heading

Once we have a contrastively-trained VQ-AE, we can use its output to give LLMs the ability to see and hear. There are two primary strategies for this.

Path 1: The Tokenizer Approach - Teaching the LLM a New Language

This path treats the RVQ IDs as a new vocabulary. It’s a two-stage process ideal for high-fidelity content generation.

  1. Create a Neural Codec: The trained VQ-AE serves as a powerful “codec.” You can take any piece of media (e.g., a song) and use the codec to compress it into a sequence of discrete RVQ tokens (e.g., [8, 5, 4, 1, 8, 5, 9, 2, ...]).
  2. Train a Generative LLM: A new Transformer model is trained auto-regressively on a massive dataset of these media-derived tokens. Its sole purpose is to learn the patterns and predict the next token in a sequence.

Use Case: This is the architecture behind models like Meta’s MusicGen. A user provides a text prompt, which conditions the Transformer to generate a new sequence of RVQ tokens. These tokens are then fed to the VQ-AE’s decoder to synthesize the final audio waveform.

Path 2: The Adapter Approach - Translating for a Language Expert

This path is used to augment a powerful, pre-trained, text-only LLM without the astronomical cost of retraining it.

  1. Freeze the LLM: A massive, pre-trained LLM (like LLaMA) is frozen. Its deep language understanding is preserved.
  2. Use the Pre-Quantized Embedding: Instead of using the discrete RVQ tokens, we take the rich, continuous embedding vector produced by our media encoder just before it enters the RVQ module.
  3. Train a Small Adapter: A small, lightweight projection layer (or “adapter”) is trained. Its only job is to translate the media embedding into a vector that has the same format and structure as the LLM’s own word embeddings. It learns to map visual concepts to their corresponding “word” concepts in the LLM’s latent space.

Use Case: This is the principle behind models like Google’s Flamingo. To answer a question about an image, the image is passed through the media encoder and adapter. The resulting “vision-as-a-word” vector is inserted into the prompt sequence alongside the text tokens. The frozen LLM can now “reason” about the visual input because it has been translated into a format it already understands.

\ No newline at end of file diff --git a/posts/index.html b/posts/index.html index 0171582..af6b9a4 100644 --- a/posts/index.html +++ b/posts/index.html @@ -12,6 +12,6 @@ UniFi VLAN Migration to Zone-Based Architecture
  • August 19, 2025 Quantization in LLMs
  • \ No newline at end of file diff --git a/posts/jellyfin-sso-with-authentik/index.html b/posts/jellyfin-sso-with-authentik/index.html index 2a0c2e0..637b6bb 100644 --- a/posts/jellyfin-sso-with-authentik/index.html +++ b/posts/jellyfin-sso-with-authentik/index.html @@ -69,6 +69,6 @@ Do not rely on header forwarding magic. Force the scheme in the plugin configura Link to heading \ No newline at end of file diff --git a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html index 0cfc88f..bb04d95 100644 --- a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html +++ b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html @@ -42,6 +42,6 @@ Sparse MoE models, despite only activating a few experts per token, possess a ve MoE models offer significant advantages in terms of model capacity and computational efficiency, but their unique sparse activation pattern introduces challenges in training and fine-tuning. Overcoming non-differentiability in routing and ensuring balanced expert utilization are crucial for effective pre-training. During fine-tuning, managing the model’s vast parameter count to prevent overfitting on smaller datasets requires either strategic parameter freezing or access to very large and diverse fine-tuning data. The Top-K routing mechanism, as illustrated in the provided image, is a core component in many modern Mixture-of-Experts (MoE) models. It involves selecting a fixed number (K) of experts for each input based on relevance scores.


    Traditional Top-K (Deterministic Selection):

    Alternative: Sampling from Softmax (Probabilistic Selection):

    Key Takeaway: While deterministic Top-K is simpler and directly picks the “highest-scoring” experts, sampling from the softmax offers a more robust training dynamic by ensuring that all experts receive training data, thereby preventing some experts from becoming unused (“dead experts”).


    \ No newline at end of file diff --git a/posts/open-webui-openai-websearch/index.html b/posts/open-webui-openai-websearch/index.html index dbb6213..81556e9 100644 --- a/posts/open-webui-openai-websearch/index.html +++ b/posts/open-webui-openai-websearch/index.html @@ -84,6 +84,6 @@ This post documents the final setup, the hotfix script that keeps LiteLLM honest Link to heading \ No newline at end of file diff --git a/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html b/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html index 218c70f..131b8bb 100644 --- a/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html +++ b/posts/openwrt-mwan3-wireguard-endpoint-exclusion/index.html @@ -96,6 +96,6 @@ When using WireGuard together with MWAN3 on OpenWrt, the tunnel can fail to esta Link to heading \ No newline at end of file diff --git a/posts/page/2/index.html b/posts/page/2/index.html index 5544597..7dfb7f5 100644 --- a/posts/page/2/index.html +++ b/posts/page/2/index.html @@ -12,6 +12,6 @@ Transformer's Core Mechanics
  • October 26, 2020 Some useful files
  • \ No newline at end of file diff --git a/posts/ppo-for-language-models/index.html b/posts/ppo-for-language-models/index.html index 0318805..d8d6738 100644 --- a/posts/ppo-for-language-models/index.html +++ b/posts/ppo-for-language-models/index.html @@ -23,6 +23,6 @@ where δ_t = r_t + γV(s_{t+1}) - V(s_t)