From a977deebd1a7387c2f31a154093c25b752bd0b3a Mon Sep 17 00:00:00 2001 From: eric Date: Sun, 3 Aug 2025 03:30:50 +0000 Subject: [PATCH] deploy: 9c5d4a2102bc7a06536fbdcec04c7476e3a60edd --- 404.html | 2 +- about/index.html | 2 +- categories/index.html | 2 +- index.html | 2 +- index.xml | 8 ++++---- posts/a-deep-dive-into-ppo-for-language-models/index.html | 6 +++--- posts/index.html | 8 ++++---- posts/index.xml | 8 ++++---- .../index.html | 6 +++--- .../index.html | 6 +++--- posts/useful/index.html | 2 +- sitemap.xml | 2 +- tags/index.html | 2 +- 13 files changed, 28 insertions(+), 28 deletions(-) diff --git a/404.html b/404.html index 6413795..21cfb31 100644 --- a/404.html +++ b/404.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[23b9adc] \ No newline at end of file +[9c5d4a2] \ No newline at end of file diff --git a/about/index.html b/about/index.html index adb13a5..a04d393 100644 --- a/about/index.html +++ b/about/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[23b9adc] \ No newline at end of file +[9c5d4a2] \ No newline at end of file diff --git a/categories/index.html b/categories/index.html index 927307e..a4e6ef5 100644 --- a/categories/index.html +++ b/categories/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[23b9adc] \ No newline at end of file +[9c5d4a2] \ No newline at end of file diff --git a/index.html b/index.html index 1ce611b..da1cff5 100644 --- a/index.html +++ b/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[23b9adc] \ No newline at end of file +[9c5d4a2] \ No newline at end of file diff --git a/index.xml b/index.xml index aa542f8..6fdfcbd 100644 --- a/index.xml +++ b/index.xml @@ -1,5 +1,6 @@ -Eric X. Liu's Personal Page/Recent content on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 03:19:53 +0000A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sun, 03 Aug 2025 03:19:06 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> -<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/Sun, 03 Aug 2025 03:19:06 +0000/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/<p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &ldquo;experts&rdquo;) to specialize in different types of inputs. A &ldquo;gating network&rdquo; or &ldquo;router&rdquo; learns to dispatch each input (or &ldquo;token&rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.</p> +Eric X. Liu's Personal Page/Recent content on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 03:29:23 +0000T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 03:29:14 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> +<p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p>A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sat, 02 Aug 2025 00:00:00 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> +<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/Wed, 02 Jul 2025 00:00:00 +0000/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/<p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &ldquo;experts&rdquo;) to specialize in different types of inputs. A &ldquo;gating network&rdquo; or &ldquo;router&rdquo; learns to dispatch each input (or &ldquo;token&rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.</p> <h3 id="1-challenge-non-differentiability-of-routing-functions"> 1. Challenge: Non-Differentiability of Routing Functions <a class="heading-link" href="#1-challenge-non-differentiability-of-routing-functions"> @@ -8,8 +9,7 @@ </a> </h3> <p><strong>The Problem:</strong> -Many routing mechanisms, especially &ldquo;Top-K routing,&rdquo; involve a discrete, hard selection process. A common function is <code>KeepTopK(v, k)</code>, which selects the top <code>k</code> scoring elements from a vector <code>v</code> and sets others to $-\infty$ or $0$.</p>T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 03:19:06 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> -<p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p>Some useful files/posts/useful/Mon, 26 Oct 2020 04:14:43 +0000/posts/useful/<ul> +Many routing mechanisms, especially &ldquo;Top-K routing,&rdquo; involve a discrete, hard selection process. A common function is <code>KeepTopK(v, k)</code>, which selects the top <code>k</code> scoring elements from a vector <code>v</code> and sets others to $-\infty$ or $0$.</p>Some useful files/posts/useful/Mon, 26 Oct 2020 04:14:43 +0000/posts/useful/<ul> <li><a href="https://ericxliu.me/rootCA.pem" class="external-link" target="_blank" rel="noopener">rootCA.pem</a></li> <li><a href="https://ericxliu.me/vpnclient.ovpn" class="external-link" target="_blank" rel="noopener">vpnclient.ovpn</a></li> </ul>About/about/Fri, 01 Jun 2018 07:13:52 +0000/about/ \ No newline at end of file diff --git a/posts/a-deep-dive-into-ppo-for-language-models/index.html b/posts/a-deep-dive-into-ppo-for-language-models/index.html index f3f2059..6475cd9 100644 --- a/posts/a-deep-dive-into-ppo-for-language-models/index.html +++ b/posts/a-deep-dive-into-ppo-for-language-models/index.html @@ -1,10 +1,10 @@ A Deep Dive into PPO for Language Models · Eric X. Liu's Personal Page
\ No newline at end of file diff --git a/posts/index.html b/posts/index.html index ce91bcb..940f3bd 100644 --- a/posts/index.html +++ b/posts/index.html @@ -1,11 +1,11 @@ Posts · Eric X. Liu's Personal Page
\ No newline at end of file +[9c5d4a2] \ No newline at end of file diff --git a/posts/index.xml b/posts/index.xml index b1b8350..7facfb4 100644 --- a/posts/index.xml +++ b/posts/index.xml @@ -1,5 +1,6 @@ -Posts on Eric X. Liu's Personal Page/posts/Recent content in Posts on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 03:19:53 +0000A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sun, 03 Aug 2025 03:19:06 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> -<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/Sun, 03 Aug 2025 03:19:06 +0000/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/<p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &ldquo;experts&rdquo;) to specialize in different types of inputs. A &ldquo;gating network&rdquo; or &ldquo;router&rdquo; learns to dispatch each input (or &ldquo;token&rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.</p> +Posts on Eric X. Liu's Personal Page/posts/Recent content in Posts on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 03:29:23 +0000T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 03:29:14 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> +<p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p>A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sat, 02 Aug 2025 00:00:00 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> +<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/Wed, 02 Jul 2025 00:00:00 +0000/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/<p>Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called &ldquo;experts&rdquo;) to specialize in different types of inputs. A &ldquo;gating network&rdquo; or &ldquo;router&rdquo; learns to dispatch each input (or &ldquo;token&rdquo;) to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.</p> <h3 id="1-challenge-non-differentiability-of-routing-functions"> 1. Challenge: Non-Differentiability of Routing Functions <a class="heading-link" href="#1-challenge-non-differentiability-of-routing-functions"> @@ -8,8 +9,7 @@ </a> </h3> <p><strong>The Problem:</strong> -Many routing mechanisms, especially &ldquo;Top-K routing,&rdquo; involve a discrete, hard selection process. A common function is <code>KeepTopK(v, k)</code>, which selects the top <code>k</code> scoring elements from a vector <code>v</code> and sets others to $-\infty$ or $0$.</p>T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 03:19:06 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> -<p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p>Some useful files/posts/useful/Mon, 26 Oct 2020 04:14:43 +0000/posts/useful/<ul> +Many routing mechanisms, especially &ldquo;Top-K routing,&rdquo; involve a discrete, hard selection process. A common function is <code>KeepTopK(v, k)</code>, which selects the top <code>k</code> scoring elements from a vector <code>v</code> and sets others to $-\infty$ or $0$.</p>Some useful files/posts/useful/Mon, 26 Oct 2020 04:14:43 +0000/posts/useful/<ul> <li><a href="https://ericxliu.me/rootCA.pem" class="external-link" target="_blank" rel="noopener">rootCA.pem</a></li> <li><a href="https://ericxliu.me/vpnclient.ovpn" class="external-link" target="_blank" rel="noopener">vpnclient.ovpn</a></li> </ul> \ No newline at end of file diff --git a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html index 84e3875..bd1122b 100644 --- a/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html +++ b/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/index.html @@ -9,10 +9,10 @@ The Problem: Many routing mechanisms, especially “Top-K routing,” involve a discrete, hard selection process. A common function is KeepTopK(v, k), which selects the top k scoring elements from a vector v and sets others to $-\infty$ or $0$.">
\ No newline at end of file diff --git a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html index c98d339..640db24 100644 --- a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html +++ b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html @@ -1,10 +1,10 @@ T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive · Eric X. Liu's Personal Page
\ No newline at end of file diff --git a/posts/useful/index.html b/posts/useful/index.html index 0619119..ac0965e 100644 --- a/posts/useful/index.html +++ b/posts/useful/index.html @@ -10,4 +10,4 @@ One-minute read
  • [23b9adc] \ No newline at end of file +[9c5d4a2] \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 3960ad6..aef9a75 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -1 +1 @@ -/posts/a-deep-dive-into-ppo-for-language-models/2025-08-03T03:19:53+00:00weekly0.5/2025-08-03T03:19:53+00:00weekly0.5/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/2025-08-03T03:19:53+00:00weekly0.5/posts/2025-08-03T03:19:53+00:00weekly0.5/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/2025-08-03T03:19:53+00:00weekly0.5/posts/useful/2020-10-26T04:47:36+00:00weekly0.5/about/2020-06-16T23:30:17-07:00weekly0.5/categories/weekly0.5/tags/weekly0.5 \ No newline at end of file +/2025-08-03T03:29:23+00:00weekly0.5/posts/2025-08-03T03:29:23+00:00weekly0.5/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/2025-08-03T03:29:23+00:00weekly0.5/posts/a-deep-dive-into-ppo-for-language-models/2025-08-03T03:28:39+00:00weekly0.5/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice/2025-08-03T03:28:39+00:00weekly0.5/posts/useful/2020-10-26T04:47:36+00:00weekly0.5/about/2020-06-16T23:30:17-07:00weekly0.5/categories/weekly0.5/tags/weekly0.5 \ No newline at end of file diff --git a/tags/index.html b/tags/index.html index 93f924f..d6a8dcd 100644 --- a/tags/index.html +++ b/tags/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[23b9adc] \ No newline at end of file +[9c5d4a2] \ No newline at end of file