From 144a1b16928c5e3c68d016f48fa38904a81e04ce Mon Sep 17 00:00:00 2001 From: eric Date: Sun, 3 Aug 2025 02:54:11 +0000 Subject: [PATCH] deploy: 2a163cf7fe6ec5fceb8eab061f4003afef2a59d5 --- 404.html | 2 +- about/index.html | 2 +- categories/index.html | 2 +- index.html | 2 +- index.xml | 4 ++-- posts/a-deep-dive-into-ppo-for-language-models/index.html | 6 +++--- posts/index.html | 2 +- posts/index.xml | 4 ++-- .../index.html | 6 +++--- posts/useful/index.html | 2 +- sitemap.xml | 2 +- tags/index.html | 2 +- 12 files changed, 18 insertions(+), 18 deletions(-) diff --git a/404.html b/404.html index 3dbc046..1c3310f 100644 --- a/404.html +++ b/404.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file diff --git a/about/index.html b/about/index.html index 851501f..6efa9c8 100644 --- a/about/index.html +++ b/about/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file diff --git a/categories/index.html b/categories/index.html index daea4c6..dd3ac2a 100644 --- a/categories/index.html +++ b/categories/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file diff --git a/index.html b/index.html index a714187..a1906e9 100644 --- a/index.html +++ b/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file diff --git a/index.xml b/index.xml index dd1781d..f1c93bb 100644 --- a/index.xml +++ b/index.xml @@ -1,5 +1,5 @@ -Eric X. Liu's Personal Page/Recent content on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 02:45:50 +0000A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sun, 03 Aug 2025 02:45:11 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> -<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 02:45:11 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> +Eric X. Liu's Personal Page/Recent content on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 02:53:37 +0000A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sun, 03 Aug 2025 02:53:33 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> +<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 02:53:33 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> <p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p>Some useful files/posts/useful/Mon, 26 Oct 2020 04:14:43 +0000/posts/useful/<ul> <li><a href="https://ericxliu.me/rootCA.pem" class="external-link" target="_blank" rel="noopener">rootCA.pem</a></li> <li><a href="https://ericxliu.me/vpnclient.ovpn" class="external-link" target="_blank" rel="noopener">vpnclient.ovpn</a></li> diff --git a/posts/a-deep-dive-into-ppo-for-language-models/index.html b/posts/a-deep-dive-into-ppo-for-language-models/index.html index f7395da..e8a8f21 100644 --- a/posts/a-deep-dive-into-ppo-for-language-models/index.html +++ b/posts/a-deep-dive-into-ppo-for-language-models/index.html @@ -1,10 +1,10 @@ A Deep Dive into PPO for Language Models · Eric X. Liu's Personal Page
\ No newline at end of file diff --git a/posts/index.html b/posts/index.html index da1d000..71608f6 100644 --- a/posts/index.html +++ b/posts/index.html @@ -7,4 +7,4 @@ 2016 - 2025 Eric X. Liu -[cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file diff --git a/posts/index.xml b/posts/index.xml index a581b77..1a121c9 100644 --- a/posts/index.xml +++ b/posts/index.xml @@ -1,5 +1,5 @@ -Posts on Eric X. Liu's Personal Page/posts/Recent content in Posts on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 02:45:50 +0000A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sun, 03 Aug 2025 02:45:11 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> -<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 02:45:11 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> +Posts on Eric X. Liu's Personal Page/posts/Recent content in Posts on Eric X. Liu's Personal PageHugoenSun, 03 Aug 2025 02:53:37 +0000A Deep Dive into PPO for Language Models/posts/a-deep-dive-into-ppo-for-language-models/Sun, 03 Aug 2025 02:53:33 +0000/posts/a-deep-dive-into-ppo-for-language-models/<p>Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don&rsquo;t inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).</p> +<p>You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.</p>T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/Sun, 03 Aug 2025 02:53:33 +0000/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/<p>In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the &ldquo;decoder-only&rdquo; model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.</p> <p>But to truly understand the field, we must look at the pivotal models that explored different paths. Google&rsquo;s T5, or <strong>Text-to-Text Transfer Transformer</strong>, stands out as one of the most influential. It didn&rsquo;t just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.</p>Some useful files/posts/useful/Mon, 26 Oct 2020 04:14:43 +0000/posts/useful/<ul> <li><a href="https://ericxliu.me/rootCA.pem" class="external-link" target="_blank" rel="noopener">rootCA.pem</a></li> <li><a href="https://ericxliu.me/vpnclient.ovpn" class="external-link" target="_blank" rel="noopener">vpnclient.ovpn</a></li> diff --git a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html index 5c83756..6c4015f 100644 --- a/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html +++ b/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/index.html @@ -1,10 +1,10 @@ T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive · Eric X. Liu's Personal Page
\ No newline at end of file diff --git a/posts/useful/index.html b/posts/useful/index.html index 7b8d6f0..25e574e 100644 --- a/posts/useful/index.html +++ b/posts/useful/index.html @@ -10,4 +10,4 @@ One-minute read
  • [cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 64f2e2b..3797f4d 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -1 +1 @@ -/posts/a-deep-dive-into-ppo-for-language-models/2025-08-03T02:45:50+00:00weekly0.5/2025-08-03T02:45:50+00:00weekly0.5/posts/2025-08-03T02:45:50+00:00weekly0.5/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/2025-08-03T02:45:50+00:00weekly0.5/posts/useful/2020-10-26T04:47:36+00:00weekly0.5/about/2020-06-16T23:30:17-07:00weekly0.5/categories/weekly0.5/tags/weekly0.5 \ No newline at end of file +/posts/a-deep-dive-into-ppo-for-language-models/2025-08-03T02:53:37+00:00weekly0.5/2025-08-03T02:53:37+00:00weekly0.5/posts/2025-08-03T02:53:37+00:00weekly0.5/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive/2025-08-03T02:53:37+00:00weekly0.5/posts/useful/2020-10-26T04:47:36+00:00weekly0.5/about/2020-06-16T23:30:17-07:00weekly0.5/categories/weekly0.5/tags/weekly0.5 \ No newline at end of file diff --git a/tags/index.html b/tags/index.html index b1c7875..32118c5 100644 --- a/tags/index.html +++ b/tags/index.html @@ -4,4 +4,4 @@ 2016 - 2025 Eric X. Liu -[cbccd87] \ No newline at end of file +[2a163cf] \ No newline at end of file