📚 Auto-publish: Add/update 1 blog posts

Generated on: Wed Aug 20 06:28:39 UTC 2025 Source: md-personal repository
♻️ (content): remove outdated posts about Breville Barista Pro maintenance, PPO for language models, and Transformer mechanics.
2025-08-20 06:28:39 +00:00 · 2025-08-19 23:24:14 -07:00 · 2025-08-20 06:04:36 +00:00 · 2025-08-20 06:02:35 +00:00 · 2025-08-20 04:48:53 +00:00 · 2025-08-20 04:32:59 +00:00
39 changed files with 1541 additions and 13989 deletions
--- a/.cursorrules
+++ b/.cursorrules
@@ -0,0 +1,104 @@
+# Hugo Site Development Rules
+
+## Project Overview
+This is a Hugo static site using the hugo-coder theme with Obsidian markdown compatibility.
+
+## Hugo Best Practices
+
+### Content Creation
+- **DO** place all content files in `content/` directory
+- **DO** use front matter with `title`, `date`, and `draft` fields
+- **DO** set `draft: false` for published content
+- **DO** use lowercase filenames with hyphens (e.g., `my-post.md`)
+- **DON'T** create content files outside the `content/` directory
+
+### Markdown Usage
+- **DO** use standard markdown syntax
+- **DO** use `$$` for block math and `$` for inline math
+- **DO** use `- [ ]` and `- [x]` for task lists
+- **DO** use `==text==` for highlighting
+- **DO** use footnotes with `[^1]` syntax
+- **DON'T** use `$$$$` as special delimiters (not supported)
+- **DON'T** rely on Obsidian-specific features like wiki-links `[[]]`
+
+### Theme Customization
+- **DO** override theme files by creating matching structure in `layouts/`
+- **DO** place custom partials in `layouts/partials/`
+- **DO** use `static/` for static assets (images, CSS, JS)
+- **DON'T** modify files directly in `themes/` directory
+- **DON'T** commit theme modifications
+
+### Configuration
+- **DO** use `config.toml` for site configuration
+- **DO** test configuration changes locally before deploying
+- **DO** enable features in `[markup.goldmark.extensions]` for Obsidian compatibility
+- **DON'T** modify theme configuration files directly
+
+### Development Workflow
+- **DO** run `hugo server` for local development
+- **DO** use `hugo --logLevel info` for detailed build output
+- **DO** test builds with `hugo` before deployment
+- **DON'T** commit the `public/` directory (build output)
+- **DON'T** commit temporary Hugo binaries
+
+### File Organization
+```
+├── content/          # All markdown content
+│   ├── posts/       # Blog posts
+│   └── about.md     # Static pages
+├── layouts/         # Custom theme overrides
+│   └── partials/    # Custom partial templates
+├── static/          # Static assets
+│   └── images/      # Image files
+├── themes/          # Hugo themes (don't modify)
+└── config.toml      # Site configuration
+```
+
+### Math and Special Content
+- **DO** enable math with `math = true` in front matter or site config
+- **DO** use KaTeX-compatible LaTeX syntax
+- **DO** test math rendering after changes
+- **DON'T** assume all LaTeX packages are available
+
+### Performance
+- **DO** optimize images before adding to `static/`
+- **DO** use appropriate image formats (WebP, PNG, JPG)
+- **DO** minimize custom CSS/JS
+- **DON'T** add unnecessary JavaScript libraries
+
+### SEO and Metadata
+- **DO** include descriptive titles and descriptions
+- **DO** use proper heading hierarchy (H1 -> H2 -> H3)
+- **DO** add alt text to images
+- **DON'T** duplicate titles across pages
+
+### Common Pitfalls to Avoid
+- **DON'T** use absolute paths in content (use relative paths)
+- **DON'T** assume Obsidian plugins work in Hugo
+- **DON'T** use Hugo-specific shortcodes without testing
+- **DON'T** modify theme files without creating proper overrides
+- **DON'T** forget to set `draft: false` for published content
+
+### Git Workflow
+- **DO** commit source files (content, config, layouts)
+- **DO** use meaningful commit messages
+- **DON'T** commit build artifacts (`public/`, temporary files)
+- **DON'T** commit sensitive configuration (API keys, etc.)
+
+### Testing Checklist
+Before deployment, verify:
+- [ ] All content renders correctly
+- [ ] Math formulas display properly
+- [ ] Images load correctly
+- [ ] Links work (internal and external)
+- [ ] Site builds without errors
+- [ ] Mobile responsiveness
+- [ ] Dark/light theme switching works
+
+### Emergency Fixes
+If site breaks:
+1. Check `hugo --logLevel info` for build errors
+2. Verify `config.toml` syntax
+3. Check for missing front matter in content files
+4. Ensure all required assets exist in `static/`
+5. Test with `hugo server` locally first
--- a/.drone.yml
+++ b/.drone.yml
@@ -1,28 +0,0 @@
-kind: pipeline
-name: default
-
-steps:
- name: build
-  image: plugins/hugo
-  settings:
-    hugo_version: 0.97.0
-    extended: true
-    minify: true
-    pull: always
-    url: ericxliu.me
-    validate: false
-    output: "./output"
-    mtu: 1450
- name: git-push
-  image: appleboy/drone-git-push:0.2.0-linux-amd64
-  settings:
-    branch: gitea-pages
-    remote: "git@git.ericxliu.me:eric/ericxliu-me.git"
-    force: true
-    commit: true
-    path: "./output"
-    commit_message: "Drone build ${DRONE_COMMIT_SHA:0:7}"
-    author_name: "Eric Liu"
-    author_email: "eric@ericxliu.me"
-    ssh_key:
-      from_secret: ssh_key
--- a/.gitea/workflows/publish.yaml
+++ b/.gitea/workflows/publish.yaml
@@ -0,0 +1,44 @@
+name: Hugo Publish CI
+
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  build-and-deploy:
+    runs-on: ubuntu-latest
+    steps:
+
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          persist-credentials: false
+          submodules: true  # Fetch Hugo themes (true OR recursive)
+          fetch-depth: 0    # Fetch all history for .GitInfo and .Lastmod
+
+      - name: Build site with Hugo
+        uses: peaceiris/actions-hugo@v3
+        with:
+          hugo-version: 'latest'
+          extended: true
+
+      - name: Build
+        run: hugo --minify
+
+      - name: Replace [commit] with short commit hash and hyperlink
+        run: |
+          SHORT_COMMIT=$(git rev-parse --short HEAD)
+          COMMIT_URL="https://git.ericxliu.me/eric/ericxliu-me/commit/$SHORT_COMMIT"
+          find ./public -type f -exec sed -i "s|\[commit\]|<a href=\"$COMMIT_URL\">\[$SHORT_COMMIT\]</a>|g" {} +
+
+      - name: Deploy
+        uses: peaceiris/actions-gh-pages@v4
+        with:
+          personal_token: ${{ secrets.GIT_PAGES_TOKEN }}
+          publish_dir: ./public 
+          publish_branch: gitea-pages
+
+      - name: Reload Kubernetes pods
+        run: |
+          curl -X DELETE https://k8s.local:6443/api/v1/namespaces/hugo/pods/ --header "Authorization: Bearer ${{ secrets.HUGO_K8S_TOKEN }}" --insecure
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,41 @@
-_gen/
+# Generated files by hugo
+/public/
+/resources/_gen/
+/assets/jsconfig.json
+hugo_stats.json
+
+# Executable may be added to repository
+hugo.exe
+hugo.darwin
+hugo.linux
+
+# Temporary lock file while building
+/.hugo_build.lock
+
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Icon must end with two \r
+Icon
+
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -1,37 +0,0 @@
-# This file is a template, and might need editing before it works on your project.
-# To contribute improvements to CI/CD templates, please follow the Development guide at:
-# https://docs.gitlab.com/ee/development/cicd/templates.html
-# This specific template is located at:
-# https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Pages/HTML.gitlab-ci.yml
-
-# Full project: https://gitlab.com/pages/plain-html
-
-variables:
-  GIT_SUBMODULE_STRATEGY: recursive
-
-build-stage:
-  stage: build
-  image: monachus/hugo:latest
-  script:
-    - hugo
-    - ls 
-  artifacts:
-    paths:
-      - public
-
-deploy-stage:
-  stage: deploy
-  image: minio/mc:latest
-  script:
-    - ls 
-    - mkdir .public
-    - cp -r public/* .public
-    - mc alias set minio http://minio.diskstation.local:80 WjaYWk3uthUlotbT Hc3fff7v69nZ6XvcXXpOZ3JJMzcmGc6A
-    - mc cp -r .public/ minio/eric-personal
-  artifacts:
-    paths:
-      - .public
-  dependencies:
-    - build-stage
-  rules:
-    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,6 @@
 [submodule "themes/hugo-coder"]
 	path = themes/hugo-coder
 	url = https://github.com/luizdepra/hugo-coder
+[submodule "themes/hugo-cloak-email"]
+	path = themes/hugo-cloak-email
+	url = https://github.com/martignoni/hugo-cloak-email
--- a/.image_mappings/a-deep-dive-into-ppo-for-language-models.txt
+++ b/.image_mappings/a-deep-dive-into-ppo-for-language-models.txt
@@ -0,0 +1,2 @@
+Pasted image 20250730232756.png|64bfdb4b-678e-4bfc-8b62-0c05c243f6a9.png
+Pasted image 20250816140700.png|.png
--- a/.image_mappings/a-technical-deep-dive-into-the-transformer-s-core-mechanics.txt
+++ b/.image_mappings/a-technical-deep-dive-into-the-transformer-s-core-mechanics.txt
@@ -0,0 +1 @@
+Pasted image 20250819211718.png|.png
--- a/.image_mappings/ppo-for-language-models.txt
+++ b/.image_mappings/ppo-for-language-models.txt
@@ -0,0 +1 @@
+Pasted image 20250816140700.png|.png
--- a/.image_mappings/transformer-s-core-mechanics.txt
+++ b/.image_mappings/transformer-s-core-mechanics.txt
@@ -0,0 +1 @@
+Pasted image 20250819211718.png|.png
--- a/README.md
+++ b/README.md
@@ -0,0 +1,37 @@
+# 🌟 ericxliu.me
+
+Welcome to the repository for my personal website! 🚀
+
+## 🛠️ Built With
+
+This website is built using:
+- [Hugo](https://gohugo.io/) - A fast and modern static site generator
+- [Hugo Coder](https://github.com/luizdepra/hugo-coder/) - A minimalist and elegant Hugo theme
+
+## 🌐 Website
+
+Visit my website at [ericxliu.me](https://ericxliu.me)
+
+## 🚀 Features
+
+- 📱 Responsive design
+- 🎨 Clean and minimalist layout
+- 📝 Blog section for articles and thoughts
+- 👨‍💻 Portfolio showcase
+- 📬 Contact information
+
+## 🛠️ Local Development
+
+To run this website locally:
+
+1. Clone this repository
+2. Install Hugo (extended version)
+3. Navigate to the project directory
+4. Run `hugo server -D`
+5. Open your browser and visit `http://localhost:1313`
+
+## 📄 License
+
+This project is open source and available under the [MIT License](LICENSE).
+
+Thank you for visiting my website repository! 😊
--- a/config.toml
+++ b/config.toml
@@ -1,58 +1,157 @@
-title = "Eric's Personal Page"
-
-theme = "hugo-coder"
-
+title = "Eric X. Liu's Personal Page"
+theme = ["hugo-cloak-email", "hugo-coder"]
 languageCode = "en"
 defaultcontentlanguage = "en"

-paginate = 20
-canonifyurls = true
-
-pygmentsstyle = "b2"
+pygmentsstyle = "bw"
 pygmentscodefences = true
 pygmentscodefencesguesssyntax = true
+enableEmoji = true
+enableTwemoji = true
+enableGitInfo = true
+enableRobotsTXT = true
+
+# Disqus comments configuration
+[services]
+  [services.disqus]
+    shortname = "ericxliu-me"
+
+# Goldmark configuration for Obsidian compatibility
+[markup]
+  defaultMarkdownHandler = "goldmark"
+  
+  [markup.goldmark]
+    [markup.goldmark.extensions]
+      # Enable definition lists (useful for Obsidian-style definitions)
+      definitionList = true
+      # Enable footnotes (common in Obsidian)
+      footnote = true
+      # Enable linkification
+      linkify = true
+      # Enable strikethrough
+      strikethrough = true
+      # Enable tables
+      table = true
+      # Enable task lists (checkboxes)
+      taskList = true
+      # Enable typographer for better typography
+      [markup.goldmark.extensions.typographer]
+        disable = false
+      # Enable math via passthrough for LaTeX
+      [markup.goldmark.extensions.passthrough]
+        enable = true
+        [markup.goldmark.extensions.passthrough.delimiters]
+          # Block math delimiters
+          block = [["$$", "$$"], ["\\[", "\\]"]]
+          # Inline math delimiters
+          inline = [["$", "$"], ["\\(", "\\)"]]
+      # Enable extra extensions for better compatibility
+      [markup.goldmark.extensions.extras]
+        [markup.goldmark.extensions.extras.subscript]
+          enable = true
+        [markup.goldmark.extensions.extras.superscript]
+          enable = true
+        [markup.goldmark.extensions.extras.mark]
+          enable = true
+        [markup.goldmark.extensions.extras.insert]
+          enable = true
+        [markup.goldmark.extensions.extras.delete]
+          enable = true
+    
+    [markup.goldmark.parser]
+      # Enable attributes for better styling
+      [markup.goldmark.parser.attribute]
+        block = true
+        title = true
+      # Auto-generate heading IDs
+      autoHeadingID = true
+      autoHeadingIDType = "github"
+      # Don't wrap standalone images in paragraphs (better for Obsidian compatibility)
+      wrapStandAloneImageWithinParagraph = false
+    
+    [markup.goldmark.renderer]
+      # Allow unsafe HTML (needed for some Obsidian features)
+      unsafe = true
+
+[markup.highlight]
+style = "github-dark"
+
+# Table of contents configuration (compatible with Obsidian heading structure)
+[markup.tableOfContents]
+  startLevel = 1
+  endLevel = 6
+  ordered = false

 [params] # theme parameters
-    author = "Eric Liu"
-    info = "Platform Software & Performance Engineer @Google"
-    description = "Eric Liu's personal website"
-    keywords = "blog,developer,personal" 
+    author = "Eric X. Liu"
+    info = "Software & Performance Engineer @Google"
+    description = "Eric X. Liu - Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities."
+    keywords = "software engineer, performance engineering, Google engineer, tech blog, software development, performance optimization, Eric Liu, engineering blog, mountain biking, Jeep enthusiast, overlanding, camping, outdoor adventures"
    avatarurl = "images/gravatar.png"
-
-    # wether you want to hide copyright and credits in the footer
+    hideFooter = false
    hideCredits = true
    hideCopyright = false
-
+    since = 2016
    rtl= false
-
-    colorscheme = "light"
+    commit="https://git.ericxliu.me/eric/ericxliu-me/commit"
+    colorscheme = "auto"
+    hideColorSchemeToggle = false

    # Series see also post count
    maxSeeAlsoItems = 5

-    # Enable Twemoji
-    enableTwemoji = true
-
    # Custom CSS
    custom_css = []

    # Custom JS
    custom_js = []
+    
+    # Enable math rendering (for LaTeX support including $$$$ blocks)
+    math = true
+
+# Add new SEO-related parameters
+[params.seo]
+    # Enable OpenGraph for better social media sharing
+    opengraph = true
+    # Enable Twitter Cards
+    twitter_cards = true
+    # Your Twitter handle (optional)
+    # twitter_handle = "@yourtwitterhandle"
+    # Default image for social sharing
+    default_image = "images/gravatar.png"
+    # Site name for social sharing
+    site_name = "Eric X. Liu's Personal Page"
+
+# Add structured data for Google Search
+[params.schema]
+    type = "Person"
+    name = "Eric X. Liu"
+    description = "Software & Performance Engineer at Google"
+    sameAs = [
+        "https://www.linkedin.com/in/eric-x-liu-46648b93/",
+        "https://git.ericxliu.me/eric"
+    ]
+
+# Add sitemap configuration
+[sitemap]
+    changefreq = "weekly"
+    filename = "sitemap.xml"
+    priority = 0.5

 # Social links
 [[params.social]]
    name = "Git"
-    icon = "fab fa-gitlab"
+    icon = "fa-brands fa-git fa-2x"
    weight = 1
    url = "https://git.ericxliu.me/eric"
 [[params.social]]
    name = "linkedin"
-    icon = "fab fa-linkedin"
+    icon = "fa-brands fa-linkedin fa-2x"
    weight = 2
-    url = "https://www.linkedin.com/in/eric-liu-46648b93/"
+    url = "https://www.linkedin.com/in/eric-x-liu-46648b93/"
 [[params.social]]
    name = "Personal email"
-    icon = "fas fa-envelope-square"
+    icon = "fa fa-envelope fa-2x"
    weight = 3

 # Menu links
@@ -64,21 +163,25 @@ pygmentscodefencesguesssyntax = true
 	    weight = 1
 	    url  = "/posts/"
 	[[languages.en.menu.main]]
-	    name = "Gitlab"
+	    name = "Chat"
 	    weight = 2
-	    url  = "https://git.ericxliu.me"
+	    url  = "https://chat.ericxliu.me"
 	[[languages.en.menu.main]]
-	    name = "Notebook"
+	    name = "Git"
 	    weight = 3
-	    url  = "https://hub.ericxliu.me"
+	    url  = "https://git.ericxliu.me/user/oauth2/Authenitk"
 	[[languages.en.menu.main]]
-	    name = "Go"
+	    name = "Coder"
 	    weight = 4
-	    url  = "https://go.ericxliu.me/server"
+	    url  = "https://coder.ericxliu.me/api/v2/users/oidc/callback"
 	[[languages.en.menu.main]]
 	    name = "|"
 	    weight = 10
 	[[languages.en.menu.main]]
 	    name = "Sign in"
 	    weight = 11
-	    url  = "https://auth.ericxliu.me"
+	    url  = "https://sso.ericxliu.me"
+
+# Cloudflare Web Analytics configuration
+[params.cloudflare]
+    token = "987638e636ce4dbb932d038af74c17d1"
--- a/content/posts/breville-barista-pro-maintenance.md
+++ b/content/posts/breville-barista-pro-maintenance.md
@@ -0,0 +1,94 @@
+---
+title: "Breville Barista Pro Maintenance"
+date: 2025-08-16
+draft: false
+---
+
+
+Proper maintenance is critical for the longevity and performance of a Breville Barista Pro espresso machine. Consistent cleaning not only ensures the machine functions correctly but also directly impacts the quality of the espresso produced. This guide provides a detailed, technical breakdown of the essential maintenance routines, from automated cycles to daily upkeep.
+
+#### **Understanding the Two Primary Maintenance Cycles**
+
+The Breville Barista Pro has two distinct, automated maintenance procedures: the **Cleaning (Flush) Cycle** and the **Descale Cycle**. It is important to understand that these are not interchangeable, as they address different types of buildup within the machine.
+
+*   **Cleaning Cycle (Flush):** This process is designed to remove coffee oils and granulated residue from the group head, shower screen, and portafilter system.
+*   **Descale Cycle:** This process targets the internal components of the machine, such as the thermocoil and water lines, to remove mineral and limescale deposits from water.
+
+#### **Procedure 1: The Cleaning (Flush) Cycle**
+
+The machine will indicate when a cleaning cycle is needed by displaying a "FLUSH" alert on the LCD screen. This typically occurs after approximately 200 extractions.
+
+**Required Materials:**
+*   1-Cup filter basket
+*   Grey silicone cleaning disc (provided with the machine)
+*   One cleaning tablet
+
+**Step-by-Step Instructions:**
+1.  Insert the 1-cup filter basket into the portafilter.
+2.  Place the grey silicone cleaning disc inside the basket.
+3.  Position one cleaning tablet in the center of the disc.
+4.  Lock the portafilter firmly into the group head.
+5.  Ensure the drip tray is empty and the water tank is filled.
+6.  Press the 'MENU' button and use the 'Grind Amount' dial to navigate to the 'FLUSH' option. Press the dial to select it.
+7.  The '1 CUP' button will illuminate. Press it to initiate the cycle.
+8.  The cleaning process will last approximately five minutes, with the machine backflushing water under pressure. The remaining time will be displayed on the screen.
+9.  Upon completion, the machine will beep and return to its ready state.
+10. Remove the portafilter and discard the water and dissolved tablet residue. Thoroughly rinse the portafilter, cleaning disc, and filter basket.
+11. Re-insert the portafilter (without the disc or tablet) and run a shot of hot water through the group head to rinse any remaining cleaning solution.
+
+#### **Procedure 2: The Descale Cycle**
+
+The machine will alert you when descaling is required. The frequency depends on water hardness and usage but is generally recommended every 2-3 months.
+
+**Required Materials:**
+*   Breville-recommended descaling solution
+*   A large container (minimum 2-liter capacity)
+
+**Step-by-Step Instructions:**
+
+**Part A: Preparation**
+1.  Empty the drip tray and re-insert it.
+2.  Remove the water filter from the water tank.
+3.  Pour the descaling solution into the empty water tank and add fresh water up to the indicated "DESCALE" line.
+4.  Place a large container under the group head, hot water outlet, and steam wand.
+
+**Part B: The Descaling Process**
+1.  Turn the machine on and press the 'MENU' button. Navigate to the 'DESCALE' option and select it by pressing the dial.
+2.  Press the illuminated '1 CUP' button to begin.
+3.  The cycle proceeds in three stages. You must manually advance through them using the steam dial based on the LCD prompts:
+    *   **Group Head (d3):** The machine descales the coffee brewing components.
+    *   **Hot Water (d2):** After a beep, the LCD shows "d2". Turn the steam dial to the hot water position.
+    *   **Steam (d1):** After another beep, the display reads "d1". Turn the dial to the steam position.
+
+**Part C: The Rinse Cycle**
+1.  Once the descaling solution is expended, the machine will beep and prompt for a rinse cycle ("r").
+2.  Empty the large container and rinse the water tank thoroughly.
+3.  Fill the water tank with fresh, cold water to the MAX line and re-insert it.
+4.  Place the empty container back under the outlets and press the '1 CUP' button.
+5.  The rinse cycle will mirror the descaling process, prompting you to engage the group head ("r3"), hot water ("r2"), and steam wand ("r1") in sequence.
+6.  After the rinse is complete, the machine will exit the maintenance mode and return to its ready state.
+
+#### **Routine and Preventative Maintenance Schedule**
+
+In addition to the automated cycles, regular manual cleaning is essential for machine health.
+
+**Daily Tasks:**
+*   **Purge Group Head:** After the final use of the day, run hot water through the group head (without the portafilter) to clear grounds.
+*   **Clean Portafilter & Baskets:** Do not let used coffee grounds sit in the portafilter. Rinse with hot water after every use.
+*   **Clean Steam Wand:** Immediately after texturing milk, wipe the wand with a damp cloth and purge steam for 2-3 seconds to clear internal passages.
+*   **Empty Drip Tray:** Empty and rinse the drip tray regularly.
+
+**Weekly Tasks:**
+*   **Soak Components:** Remove the filter basket from the portafilter. Soak both components in a solution of hot water and a cleaning tablet (or specific espresso cleaner) for 20-30 minutes to dissolve accumulated coffee oils. Rinse thoroughly.
+*   **Clean Grinder:** Empty the bean hopper. Run the grinder to clear any remaining beans, then use a brush and/or vacuum to clean out fines and oil residue from the burrs and chute.
+
+**Periodic Tasks (Every 2-3 Months):**
+*   **Replace Water Filter:** The water filter located inside the water tank should be replaced every 3 months. This reduces the rate of scale buildup.
+*   **Inspect Shower Screen:** Use a brush to gently scrub the shower screen inside the group head to remove any stubborn coffee grounds.
+
+By adhering to this comprehensive maintenance schedule, you can ensure your Breville Barista Pro operates at peak performance and consistently produces high-quality espresso.
+
+***
+
+**Reference:**
+*   Breville Barista Pro Instruction Manual and official manufacturer guidelines.
--- a/content/posts/credit_card.html
+++ b/content/posts/credit_card.html
--- a/content/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro.md
+++ b/content/posts/espresso-theory-application-a-guide-for-the-breville-barista-pro.md
@@ -0,0 +1,129 @@
+---
+title: "Mastering Your Breville Barista Pro: The Ultimate Guide to Dialing In Espresso"
+date: 2025-05-01
+draft: false
+---
+
+Are you ready to transform your home espresso game from good to genuinely great? The Breville Barista Pro is a fantastic machine, but unlocking its full potential requires understanding a few key principles. This guide will walk you through the systematic process of dialing in your espresso, ensuring every shot is delicious and repeatable.
+
+Our overarching philosophy is simple: **isolate and change only one variable at a time.** While numbers are crucial, your palate is the ultimate judge. Dose, ratio, and time are interconnected, but your **grind size** is your most powerful lever.
+
+Let's dive in!
+
+---
+
+### **Part 1: The Foundation — Dose (The Weight of Dry Coffee)**
+
+Your dose is the bedrock of your espresso. It's the weight of your ground coffee, and it should be the first variable you set and then keep **constant** during the initial dialing-in process.
+
+**Why Dose Matters:**
+
+*   **Basket Size is Key:** Your portafilter basket dictates your ideal dose. Too little coffee (under-dosing) creates excessive "headspace," leading to soupy extractions. Too much (over-dosing) causes the coffee puck to touch the shower screen, preventing even water flow and causing channeling.
+*   **Extraction "Work":** A higher dose means more coffee mass, requiring more "work" (a finer grind, more water) to extract properly.
+*   **Coffee Type:**
+    *   **Light Roasts:** Denser and harder to extract. Consider a **slightly lower dose**.
+    *   **Dark Roasts:** More brittle and soluble. You can often use a **slightly higher dose**.
+
+**Application for Your Breville Barista Pro (54mm Portafilter):**
+
+*   **Your Starting Point:** Always begin with **18 grams**. Use a scale for accuracy!
+*   **Adjusting for Roast:** For light roasts, if you're struggling, drop to 17g. For dark roasts, you can try 19g.
+*   **Golden Rule:** Once you choose your starting dose (e.g., 18g), **do not change it** until you've dialed in your grind size.
+
+---
+
+### **Part 2: Defining the Drink — Brew Ratio (Dose vs. Yield)**
+
+The brew ratio defines the relationship between your dry coffee dose and the weight of your liquid espresso yield. Always measure by **weight (grams)**, not volume (mL), as crema can be inconsistent.
+
+**Understanding Ratios:**
+
+*   **Ristretto (1:1 – 1:1.5):** E.g., 18g in → 18g to 27g out. Strong, textured, less extracted.
+*   **Espresso (Normale) (1:1.5 – 1:2.5):** E.g., 18g in → 27g to 45g out. The standard, balanced shot.
+*   **Lungo (1:2.5+):** E.g., 18g in → 45g+ out. Weaker, less textured, more extracted.
+
+**The Fundamental Trade-Off:**
+
+*   **Longer Ratio (more water):** Higher extraction, but lower strength (more diluted).
+*   **Shorter Ratio (less water):** Lower extraction, but higher strength (more concentrated).
+
+**Application for Your Breville Barista Pro:**
+
+*   **Recommended Starting Ratio:** A **1:2 ratio** is the perfect place to begin.
+*   **Practical Numbers:** With your 18g dose, your target yield is **36 grams** of liquid espresso.
+*   **Execution:** Place your cup on a scale and use the manual brew function to stop the shot precisely when the scale reads 36g.
+
+---
+
+### **Part 3: The Diagnostic Tool — Brew Time**
+
+Brew time is not something you set directly; it's the **result** of how much resistance your coffee puck provides against the machine's water pressure. Think of it as a **diagnostic tool**.
+
+**The 25-30 Second Guideline:**
+
+This is a benchmark. If your 1:2 ratio shot falls within this time, your grind size is likely in the correct range for a balanced extraction.
+
+*   **Too Fast (<25s):** Indicates under-extraction (often tastes sour).
+*   **Too Slow (>30s):** Indicates over-extraction (often tastes bitter).
+
+**Taste is King:** Remember, if a shot tastes fantastic at 32 seconds, it's a great shot! The time simply becomes part of your successful recipe for that specific coffee.
+
+**Application for Your Breville Barista Pro:**
+
+*   **Pre-infusion:** The Barista Pro's low-pressure pre-infusion is **part of your total brew time**. Its purpose is to saturate the puck evenly to prevent channeling. Keep it consistent for every shot while dialing in.
+
+---
+
+### **Part 4: The Primary Control — Grind Setting**
+
+This is where the magic (and sometimes frustration) happens. Grind size is your main tool for controlling the resistance of the coffee puck, which directly dictates your brew time.
+
+**The Dual Impact of Grinding Finer:**
+
+1.  **Increases surface area:** Allows for more efficient flavor extraction.
+2.  **Increases resistance:** Slows down water flow and increases contact time.
+
+**The Risk of Grinding Too Fine (Channeling):**
+
+If the grind is too fine, the puck becomes so dense that high-pressure water can't flow evenly. Instead, it "breaks" the puck and punches an easy path (a channel) through a weak spot. This results in a disastrous shot that is simultaneously:
+
+*   **Under-extracted:** Most of the coffee is bypassed.
+*   **Over-extracted:** The water that does flow blasts through the channel, extracting harsh, bitter compounds.
+*   **The Taste:** A channeled shot tastes hollow, weak, sour, *and* bitter all at once.
+
+**The Goal:** You want to **grind as fine as you possibly can *without* causing significant channeling**. This is the sweet spot for maximizing surface area and resistance for high, even extraction.
+
+**Grind Retention (Purging):** Most grinders retain some old grounds. When you change your grind setting, always purge a few grams of coffee to ensure your dose is entirely at the new setting.
+
+**Application for Your Breville Barista Pro:**
+
+*   **Grinder Mechanism:** The "Grind Amount" dial controls the **TIME** the grinder runs, not the weight. When you adjust the fineness, you **must** re-adjust the grind time to ensure you are still getting your target 18g dose.
+*   **Tackling Channeling:** The Barista Pro is prone to channeling. To fight this, focus on excellent **puck prep**: use a WDT (Weiss Distribution Technique) tool to break up clumps and evenly distribute the grounds before tamping levelly.
+
+---
+
+### **The Complete Dialing-In Workflow**
+
+This systematic process will get you to a delicious shot from your Breville Barista Pro efficiently:
+
+1.  **Set Your Constants:**
+    *   **Dose:** **18g**.
+    *   **Ratio:** **1:2** (meaning a **Yield** of **36g**).
+    *   **Pre-infusion:** Use a consistent method (e.g., manual 8-second hold).
+2.  **Make an Initial Grind:**
+    *   Set the grinder to a starting point of **15**.
+    *   Adjust the grind **time** until the grinder dispenses exactly 18g.
+3.  **Pull the First Shot:**
+    *   Brew manually, stopping at **36g** of liquid in the cup. Note the **total brew time**.
+4.  **Taste and Diagnose:**
+    *   **Fast & Sour? (<25s):** Grind is too coarse.
+    *   **Slow & Bitter? (>32s):** Grind is too fine.
+5.  **Make ONE Adjustment - THE GRIND SIZE:**
+    *   If fast/sour, adjust the grind **finer** (e.g., from 15 down to 13).
+    *   If slow/bitter, adjust the grind **coarser** (e.g., from 15 up to 17).
+6.  **Re-adjust and Repeat:**
+    *   After changing the grind setting, **purge** a small amount of coffee.
+    *   Re-weigh your next dose and **adjust the grind time** to get back to exactly 18g.
+    *   Pull another 36g shot. Repeat this process until your shot tastes balanced and the time falls roughly between **25-32 seconds**.
+
+Happy brewing! With patience and this systematic approach, you'll be pulling consistently delicious espresso shots from your Breville Barista Pro in no time.
--- a/content/posts/how-rvq-teaches-llms-to-see-and-hear.md
+++ b/content/posts/how-rvq-teaches-llms-to-see-and-hear.md
@@ -0,0 +1,70 @@
+---
+title: "Beyond Words: How RVQ Teaches LLMs to See and Hear"
+date: 2025-08-07
+draft: false
+---
+
+Large Language Models (LLMs) are masters of text, but the world is not made of text alone. It’s a symphony of sights, sounds, and experiences. The ultimate goal for AI is to understand this rich, multi-modal world as we do. But how do you teach a model that thinks in words to understand a picture of a sunset or the melody of a song?
+
+The answer lies in creating a universal language—a bridge between the continuous, messy world of pixels and audio waves and the discrete, structured world of language tokens. One of the most elegant and powerful tools for building this bridge is **Residual Vector Quantization (RVQ)**.
+
+This article dives deep into RVQ, exploring how it turns raw data into meaningful semantic IDs and how these IDs, in turn, unlock multi-modal understanding in LLMs.
+
+#### **What is Residual Vector Quantization? The Art of Smart Compression**
+
+At its core, Vector Quantization (VQ) is a compression technique. It maps a high-dimensional vector (like an data embedding) to the single closest vector in a predefined dictionary, called a **codebook**. You then only need to store the index of that chosen vector. The problem? To represent complex data accurately, you'd need a codebook with an astronomical number of entries, which is computationally impossible.
+
+This is where **Residual** Vector Quantization shines. Instead of one giant codebook, RVQ uses a series of smaller codebooks in stages.
+
+1.  **Stage 1 (Coarse Quantization):** The input vector is quantized by the first codebook. This finds the broadest, most general category for the data.
+2.  **Calculate the Residual:** The system calculates the error, or "residual," between the original vector and its quantized version from Stage 1. This residual vector represents the information that was lost in the first coarse approximation.
+3.  **Stage 2 (Refinement):** This residual vector is then quantized by the *second* codebook. This stage doesn't re-evaluate the whole vector, but only focuses on correcting the error from the previous stage.
+4.  **Iterate:** This process repeats for several stages, with each subsequent codebook quantizing the residual error from the previous one, adding a finer and finer layer of detail.
+
+The final compressed representation is simply the sequence of indices from each codebook. For example, an ID like `[8, 5, 4, 1]` is produced. The magic of this approach is that it creates a **hierarchical ID**. The first digit `[8]` might represent "Sports," the next `[5]` refines it to "Court Sports," `[4]` to "Beach Volleyball," and the final `[1]` distinguishes a specific match. Videos with similar content will naturally share a longer prefix in their Semantic ID.
+
+#### **Learning What Matters: The Trainable VQ-Autoencoder**
+
+A key insight is that RVQ is not a fixed algorithm but a **trainable neural network component**. Its codebooks are not predefined; they are learned. This learning happens within a **Vector-Quantized Autoencoder (VQ-AE)** architecture.
+
+1.  **Encoder:** A powerful neural network (e.g., a Transformer or CNN) takes the raw data (like video frames and audio) and converts it into a continuous semantic embedding.
+2.  **RVQ Bottleneck:** This embedding is fed into the RVQ module, which quantizes it into the sequence of discrete IDs.
+3.  **Decoder:** The decoder takes these discrete IDs, looks up the corresponding codebook vectors, sums them up to get a reconstructed embedding, and attempts to rebuild the original video/audio.
+
+The entire system is trained end-to-end. The **reconstruction loss** (the difference between the original and reconstructed data) is used to update the parameters of the Encoder, the Decoder, and, most importantly, **the codebook vectors within the RVQ module**. Initially random, the codebook vectors are gradually pushed to become meaningful "anchors" for the core concepts present in the training data.
+
+#### **From Implicit to Explicit: Controlling Semantics with Contrastive Learning**
+
+A standard VQ-AE learns implicit semantics. It gets good at reconstruction, but we can't control *what* concepts it learns. To make the Semantic IDs truly meaningful and aligned with human language, we introduce **contrastive learning**.
+
+The architecture is enhanced with a parallel text encoder (like BERT or CLIP's). The model is then trained with a joint loss function:
+
+`L_total = L_reconstruction + λ * L_contrastive`
+
+*   **Reconstruction Loss** ensures the RVQ codes contain enough information to rebuild the input.
+*   **Contrastive Loss** forces the media embedding (from the video/audio encoder) to be mathematically "close" to the text embedding of its description, and "far" from the embeddings of unrelated text descriptions.
+
+This dual goal forces the model to organize its embedding space according to the semantics of human language. The codebook vectors now learn to represent concepts that are not just useful for reconstruction, but are also tied to explicit textual descriptions.
+
+#### **Integrating with LLMs: Two Powerful Paths to Multi-Modality**
+
+Once we have a contrastively-trained VQ-AE, we can use its output to give LLMs the ability to see and hear. There are two primary strategies for this.
+
+**Path 1: The Tokenizer Approach - Teaching the LLM a New Language**
+
+This path treats the RVQ IDs as a new vocabulary. It’s a two-stage process ideal for high-fidelity content generation.
+
+1.  **Create a Neural Codec:** The trained VQ-AE serves as a powerful "codec." You can take any piece of media (e.g., a song) and use the codec to compress it into a sequence of discrete RVQ tokens (e.g., `[8, 5, 4, 1, 8, 5, 9, 2, ...]`).
+2.  **Train a Generative LLM:** A new Transformer model is trained auto-regressively on a massive dataset of these media-derived tokens. Its sole purpose is to learn the patterns and predict the next token in a sequence.
+
+**Use Case:** This is the architecture behind models like Meta's MusicGen. A user provides a text prompt, which conditions the Transformer to generate a new sequence of RVQ tokens. These tokens are then fed to the VQ-AE's decoder to synthesize the final audio waveform.
+
+**Path 2: The Adapter Approach - Translating for a Language Expert**
+
+This path is used to augment a powerful, pre-trained, text-only LLM without the astronomical cost of retraining it.
+
+1.  **Freeze the LLM:** A massive, pre-trained LLM (like LLaMA) is frozen. Its deep language understanding is preserved.
+2.  **Use the Pre-Quantized Embedding:** Instead of using the discrete RVQ tokens, we take the rich, continuous embedding vector produced by our media encoder *just before* it enters the RVQ module.
+3.  **Train a Small Adapter:** A small, lightweight projection layer (or "adapter") is trained. Its only job is to translate the media embedding into a vector that has the same format and structure as the LLM's own word embeddings. It learns to map visual concepts to their corresponding "word" concepts in the LLM's latent space.
+
+**Use Case:** This is the principle behind models like Google's Flamingo. To answer a question about an image, the image is passed through the media encoder and adapter. The resulting "vision-as-a-word" vector is inserted into the prompt sequence alongside the text tokens. The frozen LLM can now "reason" about the visual input because it has been translated into a format it already understands.
--- a/content/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice.md
+++ b/content/posts/mixture-of-experts-moe-models-challenges-solutions-in-practice.md
@@ -0,0 +1,117 @@
+---
+title: "Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice"
+date: 2025-07-02
+draft: false
+---
+
+Mixture-of-Experts (MoEs) are neural network architectures that allow different parts of the model (called "experts") to specialize in different types of inputs. A "gating network" or "router" learns to dispatch each input (or "token") to a subset of these experts. While powerful for scaling models, MoEs introduce several practical challenges.
+
+### 1. Challenge: Non-Differentiability of Routing Functions
+
+**The Problem:**
+Many routing mechanisms, especially "Top-K routing," involve a discrete, hard selection process. A common function is `KeepTopK(v, k)`, which selects the top `k` scoring elements from a vector `v` and sets others to $-\infty$ or $0$.
+
+$$
+KeepTopK(v, k)_i = \begin{cases} v_i & \text{if } v_i \text{ is in the top } k \text{ elements of } v \\ -\infty & \text{otherwise.} \end{cases}
+$$
+
+This function is **not differentiable**. Its gradient is zero almost everywhere and undefined at the threshold points, making it impossible to directly train the gating network's parameters (e.g., $W_g$) using standard gradient descent.
+
+**Solutions (Stochastic Approximations):**
+To enable end-to-end training, non-differentiable routing decisions must be approximated with differentiable or stochastic methods.
+
+*   **Stochastic Scoring (e.g., Shazeer et al. 2017):**
+    The expert score $H(x)_i = (x \cdot W_g)_i + \text{StandardNormal}() \cdot \text{Softplus}((x \cdot W_{noise})_i)$ introduces Gaussian noise. This makes the scores themselves stochastic, which can be leveraged with other methods.
+
+*   **Gumbel-Softmax Trick (or Concrete Distribution):**
+    This method allows for differentiable sampling from categorical distributions. Instead of directly picking the top-k, Gumbel noise is added to the scores, and a Softmax (with a temperature parameter) is applied. This provides a continuous, differentiable approximation of a discrete choice, allowing gradients to flow back.
+
+*   **REINFORCE (Score Function Estimator):**
+    This is a policy gradient method from reinforcement learning. The routing decision is treated as an action, and the gating network's parameters are updated based on the "reward" (e.g., the model's performance). Gradients are estimated by sampling routing choices and weighting them by their outcomes.
+
+*   **Straight-Through Estimator (STE):**
+    A simpler approximation where, during the backward pass, gradients are treated as if the non-differentiable operation was an identity function or a simple smooth function.
+
+*   **Softmax after TopK (e.g., Mixtral, DBRX, DeepSeek v3):**
+    Instead of `Softmax(KeepTopK(...))`, some models apply a Softmax *only to the scores of the selected TopK experts*, and then assign $0$ to the rest. This provides differentiable weights for the selected experts while still enforcing sparsity.
+
+### 2. Challenge: Uneven Expert Utilization (Balancing Loss)
+
+**The Problem:**
+Left unchecked, the gating network might learn to heavily favor a few experts, leaving others underutilized. This leads to:
+*   **System Inefficiency:** Overloaded experts become bottlenecks, while underutilized experts waste computational resources.
+*   **Suboptimal Learning:** Experts might not specialize effectively if they don't receive diverse data.
+
+**Solution: Heuristic Balancing Losses (e.g., from Switch Transformer, Fedus et al. 2022)**
+An auxiliary loss is added to the total model loss during training to encourage more even expert usage.
+
+$$ \text{loss}_{\text{auxiliary}} = \alpha \cdot N \cdot \sum_{i=1}^{N} f_i \cdot P_i $$
+
+Where:
+*   $\alpha$: A hyperparameter controlling the strength of the auxiliary loss.
+*   $N$: Total number of experts.
+*   $f_i$: The **fraction of tokens *actually dispatched* to expert $i$** in the current batch $B$.
+    $$ f_i = \frac{1}{T} \sum_{x \in B} \mathbf{1}\{\text{argmax } p(x) = i\} $$
+    ($p(x)$ here refers to the output of the gating network, which could be $s_{i,t}$ in the DeepSeek/classic router. The $\text{argmax}$ means it counts hard assignments to expert $i$.)
+*   $P_i$: The **fraction of the router *probability mass* allocated to expert $i$** in the current batch $B$.
+    $$ P_i = \frac{1}{T} \sum_{x \in B} p_i(x) $$
+    ($p_i(x)$ is the learned probability (or soft score) from the gating network for token $x$ and expert $i$.)
+
+**How it works:**
+The loss aims to minimize the product $f_i \cdot P_i$ when $f_i$ and $P_i$ are small, effectively pushing them to be larger (closer to $1/N$). If an expert $i$ is overused (high $f_i$ and $P_i$), its term in the sum contributes significantly to the loss. The derivative with respect to $p_i(x)$ reveals that "more frequent use = stronger downweighting," meaning the gating network is penalized for sending too much traffic to an already busy expert.
+
+**Relationship to Gating Network:**
+*   **$p_i(x)$ (or $s_{i,t}$):** This is the output of the **learned gating network** (e.g., from a linear layer followed by Softmax). The gating network's parameters are updated via gradient descent, influenced by this auxiliary loss.
+*   **$P_i$:** This is *calculated* from the outputs of the learned gating network for the current batch. It's not a pre-defined value.
+
+**Limitation ("Second Best" Scenario):**
+Even with this loss, an expert can remain imbalanced if it's consistently the "second best" option (high $P_i$) but never the *absolute top choice* that gets counted in $f_i$ (especially if $K=1$). This is because $f_i$ strictly counts hard assignments based on `argmax`. This limitation highlights why "soft" routing or "softmax after TopK" approaches can be more effective for truly even distribution.
+
+### 3. Challenge: Overfitting during Fine-tuning
+
+**The Problem:**
+Sparse MoE models, despite only activating a few experts per token, possess a very large total number of parameters. When fine-tuning these models on **smaller datasets**, they are highly prone to **overfitting**. The model's vast capacity allows it to memorize the limited fine-tuning data, leading to poor generalization performance on unseen validation data. This is evident when training loss continues to decrease, but validation loss stagnates or increases.
+
+**Solutions:**
+
+*   **Zoph et al. Solution – Fine-tune non-MoE MLPs:**
+    *   This strategy involves freezing a portion of the MoE model's parameters during fine-tuning, specifically the large expert weights.
+    *   Instead, only the "non-MoE" parameters (e.g., attention layers, adapter layers, or the gating network itself) are updated.
+    *   This reduces the effective number of trainable parameters during fine-tuning, thereby mitigating the risk of overfitting on small datasets. It assumes the experts are already well-pre-trained for general tasks.
+
+*   **DeepSeek Solution – Use Lots of Data (1.4M SFT):**
+    *   This approach tackles the problem by providing the model with a very large and diverse dataset for Supervised Fine-Tuning (SFT).
+    *   With abundant data (e.g., 1.4 million examples covering a wide range of tasks and languages), the model's large capacity can be effectively utilized for specialized learning rather than memorization. The diversity and volume of data prevent individual experts from overfitting to specific examples.
+
+**Conclusion:**
+MoE models offer significant advantages in terms of model capacity and computational efficiency, but their unique sparse activation pattern introduces challenges in training and fine-tuning. Overcoming non-differentiability in routing and ensuring balanced expert utilization are crucial for effective pre-training. During fine-tuning, managing the model's vast parameter count to prevent overfitting on smaller datasets requires either strategic parameter freezing or access to very large and diverse fine-tuning data.
+The **Top-K routing** mechanism, as illustrated in the provided image, is a core component in many modern Mixture-of-Experts (MoE) models. It involves selecting a fixed number (`K`) of experts for each input based on relevance scores.
+
+---
+
+**Traditional Top-K (Deterministic Selection):**
+
+*   **How it works:**
+    1.  Calculate relevance scores (`s_{i,t}`) for each expert `i` and input `t`.
+    2.  Identify the `K` experts with the highest scores.
+    3.  Experts *within* the Top-K are assigned their scores (`g_{i,t} = s_{i,t}`).
+    4.  Experts *outside* the Top-K are assigned a score of `0` (`g_{i,t} = 0`).
+    5.  The output is a weighted sum of the selected experts' outputs.
+*   **Pros:** Predictable, deterministic, selects the "best" experts based on current scores.
+*   **Cons:** Can lead to expert imbalance, where a few popular experts are always chosen, starving others of training.
+
+**Alternative: Sampling from Softmax (Probabilistic Selection):**
+
+*   **How it works:**
+    1.  Calculate relevance scores (`s_{i,t}`) which are treated as probabilities (after softmax).
+    2.  **Randomly sample** `K` unique expert indices from the distribution defined by these probabilities.
+    3.  Selected experts contribute; unselected experts do not.
+*   **Why it's suggested:**
+    *   **Load Balancing:** Prevents expert collapse by ensuring all experts get a chance to be selected, even those with slightly lower scores. This promotes more even training across the entire expert pool.
+    *   **Diversity & Exploration:** Introduces randomness, potentially leading to better generalization and robustness by exploring different expert combinations.
+*   **Pros:** Better load balancing, prevents expert starvation, encourages exploration.
+*   **Cons:** Stochastic (non-deterministic routing), can make debugging harder, might not pick the absolute "best" expert in a single instance (but better for long-term training).
+
+**Key Takeaway:** While deterministic Top-K is simpler and directly picks the "highest-scoring" experts, sampling from the softmax offers a more robust training dynamic by ensuring that all experts receive training data, thereby preventing some experts from becoming unused ("dead experts").
+
+---
--- a/content/posts/ppo-for-language-models.md
+++ b/content/posts/ppo-for-language-models.md
@@ -0,0 +1,108 @@
+---
+title: "A Deep Dive into PPO for Language Models"
+date: 2025-08-02
+draft: false
+---
+
+
+Large Language Models (LLMs) have demonstrated astonishing capabilities, but out-of-the-box, they are simply powerful text predictors. They don't inherently understand what makes a response helpful, harmless, or aligned with human values. The technique that has proven most effective at bridging this gap is Reinforcement Learning from Human Feedback (RLHF), and at its heart lies a powerful algorithm: Proximal Policy Optimization (PPO).
+
+You may have seen diagrams like the one below, which outlines the RLHF training process. It can look intimidating, with a web of interconnected models, losses, and data flows.
+
+![](/images/ppo-for-language-models/.png)
+
+This post will decode that diagram, piece by piece. We'll explore the "why" behind each component, moving from high-level concepts to the deep technical reasoning that makes this process work.
+
+### Translating RL to a Conversation
+
+The first step is to understand how the traditional language of reinforcement learning maps to the world of text generation.
+
+*   **State (`s_t`)**: In a chat setting, the "state" is the context of the conversation so far. It's the initial prompt (`x`) plus all the text the model has generated up to the current moment (`y₁, ..., y_{t-1}`).
+*   **Action (`a_t`)**: The "action" is the model's decision at each step. For an LLM, this means generating the very next token (`y_t`). A full response is a sequence of these actions.blob:https://aistudio.google.com/872e746f-88c1-40ec-8e45-fa0efce97299
+*   **Reward (`r`)**: The "reward" is a numeric score that tells the model how good its full response (`y`) was. This score comes from a separate **Reward Model**, which has been trained on a large dataset of human preference comparisons (e.g., humans rating which of two responses is better). This reward is often only awarded at the end of the entire generated sequence.
+
+Let's make this concrete. If a user provides the prompt **(x)**: *"The best thing about AI is"*, and the model generates the response **(y)**: *"its potential to solve problems."*, here is how it's broken down for training:
+
+*   **State 1**: "The best thing about AI is"
+    *   **Action 1**: "its"
+*   **State 2**: "The best thing about AI is its"
+    *   **Action 2**: " potential"
+*   **State 3**: "The best thing about AI is its potential"
+    *   **Action 3**: " to"
+*   ...and so on for every generated token.
+
+This breakdown transforms a single prompt-response pair into a rich trajectory of state-action pairs, which becomes the raw data for our learning algorithm.
+
+### The Cast of Models: An Actor-Critic Ensemble
+
+The PPO process doesn't rely on a single model but an ensemble where each member has a distinct role.
+
+1.  **The Actor (Policy LM)**: This is the star of the show—the LLM we are actively fine-tuning. Its role is to take a state (the current text) and decide on an action (the next token). We refer to its decision-making process as its "policy" (`π`).
+2.  **The Critic (Value Model)**: This is the Actor's coach. The Critic doesn't generate text. Instead, it observes a state and estimates the *potential future reward* the Actor is likely to receive from that point onward. This estimate is called the "value" (`V(s_t)`). The Critic's feedback helps the Actor understand whether it's in a promising or a dead-end situation, which is a much more immediate learning signal than waiting for the final reward.
+3.  **The Reward Model**: This is the ultimate judge. As mentioned, it's a separate model trained on human preference data that provides the final score for a complete generation. Its judgment is treated as the ground truth for training both the Actor and the Critic.
+
+### The Challenge of Credit Assignment: Generalized Advantage Estimation (GAE)
+
+A key problem in RL is assigning credit. If a 20-token response gets a high reward, was it because of the first token, the last one, or all of them? The Critic helps solve this. By comparing the reward at each step with the Critic's value estimate, we can calculate the **Advantage (`Â`)**.
+
+A simple advantage calculation might be: `Advantage = reward + Value_of_next_state - Value_of_current_state`.
+
+However, this can be noisy. PPO uses a more sophisticated technique called **Generalized Advantage Estimation (GAE)**. The formula looks complex, but the idea is intuitive:
+
+`Â(s_t, a_t) = Σ(γλ)^l * δ_{t+l}`
+where `δ_t = r_t + γV(s_{t+1}) - V(s_t)`
+
+*   **γ (gamma)** is a discount factor (e.g., 0.99), which values immediate rewards slightly more than distant ones.
+*   **λ (lambda)** is a smoothing parameter that balances the trade-off between bias and variance. It creates a weighted average of advantages over multiple future time steps.
+
+In essence, GAE provides a more stable and accurate estimate of how much better a specific action was compared to the policy's average behavior in that state.
+
+### The Heart of PPO: The Quest for Stable Updates
+
+Now we arrive at the core innovation of PPO. We want to update our Actor model to take actions with higher advantages. The naive way to do this is to re-weight our training objective by an **importance sampling ratio**: `(π_new / π_old)`. This corrects for the fact that the data we are learning from was generated by a slightly older version of our policy.
+
+However, this ratio is incredibly dangerous. If the new policy becomes very different from the old one, the ratio can explode, leading to massive, unstable gradient updates that destroy the model.
+
+PPO solves this with its signature **Clipped Surrogate Objective**. The PPO loss function is:
+
+`L_CLIP(θ) = Ê_t [ min( r_t(θ)Â_t, clip(r_t(θ), 1 - ε, 1 + ε)Â_t ) ]`
+
+Let's translate this from math to English:
+*   `r_t(θ)` is the probability ratio `π_new(a_t|s_t) / π_old(a_t|s_t)`.
+*   The goal is to increase the objective by an amount proportional to the advantage `Â_t`.
+*   **The `clip` function is the crucial safeguard.** It forbids the probability ratio from moving outside a small window (e.g., `[0.8, 1.2]`).
+
+This means the algorithm says: "Let's update our policy to favor this good action. But if the required update would change the policy too drastically from the old one, we'll 'clip' the update to a more modest size." This creates a "trust region," ensuring stable, incremental improvements.
+
+### Avoiding Amnesia: The Pretraining Loss
+
+There's one final problem. If we only optimize for the PPO loss, the model might learn to "hack" the reward model by generating repetitive or nonsensical text that gets a high score. In doing so, it could suffer from **catastrophic forgetting**, losing its fundamental grasp of grammar and facts.
+
+To prevent this, we introduce a second loss term. As seen in the diagram, we mix in data from the original **Pretraining Data** (or the dataset used for Supervised Fine-Tuning). We calculate a standard next-token prediction loss (`LM Loss`) on this high-quality data.
+
+The final loss for the Actor is a combination of both objectives:
+
+**Total Loss = Loss_PPO + `λ_ptx` * Loss_LM**
+
+This brilliantly balances two goals:
+1.  The `Loss_PPO` pushes the model towards behaviors that align with human preferences.
+2.  The `Loss_LM` acts as a regularizer, pulling the model back towards its core language capabilities and preventing it from drifting into gibberish.
+
+### The Full Training Loop
+
+Now, we can assemble the entire process into a clear, iterative loop:
+
+1.  **Collect**: The current Actor policy `π_k` generates responses to a batch of prompts. These experiences—`(state, action, probability, reward, value)`—are stored in an **Experience Buffer**.
+2.  **Calculate**: Once the buffer is full, we use the collected data to compute the advantage estimates `Â_t` for every single token-generation step.
+3.  **Optimize**: For a few epochs, we repeatedly sample mini-batches from the buffer and update the Actor and Critic models. The Actor is updated using the combined `PPO-clip Loss` and `LM Loss`. The Critic is updated to improve its value predictions.
+4.  **Flush and Repeat**: After the optimization phase, the entire experience buffer is discarded. The data is now "stale" because our policy has changed. The newly updated policy `π_{k+1}` becomes the new Actor, and we return to step 1 to collect fresh data.
+
+This cycle of collection and optimization allows the language model to gradually and safely steer its behavior towards human-defined goals, creating the helpful and aligned AI assistants we interact with today.
+
+***
+
+**References:**
+
+1.  Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). *Proximal Policy Optimization Algorithms*. arXiv preprint arXiv:1707.06347.
+2.  Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). *High-Dimensional Continuous Control Using Generalized Advantage Estimation*. arXiv preprint arXiv:1506.02438.
+3.  Ouyang, L., et al. (2022). *Training language models to follow instructions with human feedback*. Advances in Neural Information Processing Systems 35.
--- a/content/posts/quantization-in-llms.md
+++ b/content/posts/quantization-in-llms.md
@@ -0,0 +1,118 @@
+---
+title: "Quantization in LLMs"
+date: 2025-08-19
+draft: false
+---
+
+The burgeoning scale of Large Language Models (LLMs) has necessitated a paradigm shift in their deployment, moving beyond full-precision floating-point arithmetic towards lower-precision representations. Quantization, the process of mapping a wide range of continuous values to a smaller, discrete set, has emerged as a critical technique to reduce model size, accelerate inference, and lower energy consumption. This article provides a technical overview of quantization theories, their application in modern LLMs, and highlights the ongoing innovations in this domain.
+
+**The Fundamentals of Quantization**
+
+At its core, quantization seeks to represent model weights and activations using fewer bits. Three primary approaches form the theoretical foundation:
+
+1.  **K-Means-based Quantization (Non-uniform):** This method clusters floating-point weights into a predefined number of groups. Each weight is then replaced by the centroid of its assigned cluster. While effective for storage compression by storing a small "codebook" of centroids and integer indices, its direct computational benefits during inference are limited unless specialized hardware for lookup tables is employed.
+
+2.  **Linear (Affine) Quantization:** The most prevalent form, linear quantization maps a floating-point range to a fixed integer range using a simple linear transformation: `r = S * (q - Z)`. Here, `r` is the real value, `q` is the quantized integer, `S` is the scale factor, and `Z` is the zero-point (offset). This approach directly enables integer arithmetic, which is significantly faster and more energy-efficient on modern hardware.
+
+3.  **Binary and Ternary Quantization (Extreme Low-Bit):** These push quantization to its limits by constraining weights and/or activations to only two (e.g., +1, -1) or three (e.g., +1, 0, -1) values. While offering maximal compression and enabling bitwise operations instead of multiplications, they often incur substantial accuracy degradation for complex LLMs. For instance, BinaryConnect enabled training deep neural networks with binary weights, showing near state-of-the-art results on image classification tasks. XNOR-Net further extended this by binarizing both weights and inputs, achieving significant speedups and memory savings. Ternary Weight Networks (TWNs) and Trained Ternary Quantization (TTQ) improve upon binary methods by introducing a zero value or learnable scaling factors, respectively, mitigating some accuracy loss.
+
+**Quantization Strategies: Bridging Accuracy and Efficiency**
+
+The practical application of quantization involves distinct strategies:
+
+1.  **Post-Training Quantization (PTQ):** This approach applies quantization to an already trained, full-precision model without any further training or fine-tuning.
+    *   **Quantization Granularity:** The precision of quantization can vary across a model.
+        *   **Per-Tensor Quantization** applies a single scale and zero-point to an entire tensor.
+        *   **Per-Channel Quantization** assigns unique scale and zero-point parameters to each output channel of a layer, crucial for handling diverse value distributions.
+        *   **Group Quantization** provides an intermediate granularity, where scales and zero-points are applied to smaller groups of weights within a channel or layer. This balances fine-grained control with hardware efficiency.
+    *   **Dynamic Range Clipping (Calibration):** A critical aspect of PTQ is determining the optimal range (`r_min`, `r_max`) for quantization, especially for activations, which often exhibit outliers. Methods include:
+        *   **Min-Max:** Simply using the observed minimum and maximum values.
+        *   **Exponential Moving Averages (EMA):** Tracking ranges using a smoothed average during a calibration run.
+        *   **Kullback-Leibler (KL) Divergence Minimization:** Selecting clipping thresholds that minimize the information loss between the original and quantized distributions.
+        *   **Mean Square Error (MSE) Minimization:** Optimizing scale and zero-point parameters to minimize the reconstruction error. Adaptive rounding techniques, such as AdaRound, further refine this by optimizing rounding decisions for individual weights.
+
+2.  **Quantization-Aware Training (QAT):** This method integrates the quantization process directly into the training or fine-tuning loop. By simulating the effects of low-precision arithmetic during training, the model learns to be robust to quantization noise. The **Straight-Through Estimator (STE)** is commonly used to approximate gradients for the non-differentiable quantization operations, enabling backpropagation. QAT generally yields higher accuracy than PTQ, particularly for aggressive low-bit quantization.
+
+**Emerging Techniques for Modern LLMs**
+
+The scale and complexity of LLMs necessitate advanced quantization strategies:
+
+1.  **One-Shot Post-Training Quantization (e.g., GPTQ, AWQ):** These techniques aim to achieve near-QAT accuracy with PTQ's convenience, requiring only a small, unlabelled calibration dataset and no full retraining. GPTQ quantizes weights layer-by-layer by minimizing output MSE, leveraging Hessian-aware information. AWQ identifies and scales "important" weights based on activation magnitudes before quantization. These methods have been instrumental in enabling 4-bit LLM inference on consumer-grade hardware.
+
+2.  **Sparsity-Quantization Hybrid (e.g., SpQR):** These approaches combine model pruning (removing redundant connections) with quantization to achieve even greater compression. SpQR prunes weights and then quantizes the remaining non-zero weights, often with special handling for critical outlier weights.
+
+3.  **Quantization for Efficient Fine-tuning (e.g., QLoRA):** QLoRA quantizes the base LLM weights (e.g., to 4-bit) and freezes them, then fine-tunes only small, low-rank adapter modules in full precision. This drastically reduces the memory requirements for fine-tuning large models on limited hardware.
+
+4.  **Hardware-Optimized Quantization Formats:** Beyond bit-width, specialized floating-point formats and efficient kernels are being developed. MXFP4 (Microscaling FP4), NVIDIA's FP8 (E4M3/E5M2), and GGUF's K-quants are examples of block-wise floating-point formats and hierarchical quantization schemes optimized for high performance on modern accelerators like NVIDIA's Blackwell GPUs. These formats offer superior dynamic range compared to fixed-point integers at very low bit-widths.
+
+**Multi-Level Scaling in Group Quantization: A Deeper Dive**
+
+Modern group quantization approaches often employ multi-level scaling to achieve an optimal balance between precision and compression. Consider a generalized formula for reconstructing a real value `r` from a quantized value `q`:
+
+`r = (q - z) * s_l0 * s_l1 * ...`
+
+where `z` is the zero-point (often 0 for symmetric quantization), and `s_l0`, `s_l1` are scale factors at different hierarchical levels. The "Effective Bit Width" reflects the average number of bits per weight after accounting for both the quantized value and its associated scales.
+
+Let's dissect a representative table of such schemes:
+
+| Quantization Approach | Data Type (q) | L0 Group Size | L0 Scale Data Type | L1 Group Size | L1 Scale Data Type | Effective Bit Width |
+| :-------------------- | :------------ | :-------------- | :----------------- | :-------------- | :----------------- | :------------------ |
+| Per-Channel Quant     | INT4          | Per Channel     | FP16               | -               | -                  | 4                   |
+| VSQ                   | INT4          | 16              | UINT4              | Per Channel     | FP16               | 4 + 4/16 = 4.25     |
+| MX4                   | S1M2          | 2               | E1M0               | 16              | E8M0               | 3 + 1/2 + 8/16 = 4  |
+| MX6                   | S1M4          | 2               | E1M0               | 16              | E8M0               | 5 + 1/2 + 8/16 = 6  |
+| MX9                   | S1M7          | 2               | E1M0               | 16              | E8M0               | 8 + 1/2 + 8/16 = 9  |
+
+*   **Data Types Explanation:**
+    *   `INT4`: Standard 4-bit integer.
+    *   `UINT4`: 4-bit *unsigned* integer.
+    *   `FP16`: 16-bit floating-point number.
+    *   `S1M2`: A custom 3-bit floating-point-like format (1 sign bit, 2 mantissa bits), with its exponent effectively derived from shared scales.
+    *   `S1M4`: A custom 5-bit format (1 sign bit, 4 mantissa bits).
+    *   `S1M7`: A custom 8-bit format (1 sign bit, 7 mantissa bits).
+    *   `E1M0`: A custom 1-bit exponent-only floating-point scale (1 exponent bit, 0 mantissa bits).
+    *   `E8M0`: A custom 8-bit exponent-only floating-point scale (8 exponent bits, 0 mantissa bits).
+
+*   **Row-by-Row Analysis:**
+    1.  **Per-Channel Quant:** This represents a baseline. Each individual value (`q`) is stored as a 4-bit integer. A single 16-bit FP16 scale (`s_l0`) is applied *per channel*. Since a channel contains many weights, the overhead of the 16-bit scale is amortized, making the effective bit width approximately 4 bits per weight.
+    2.  **VSQ (Per-Vector Scaled Quantization):** This scheme introduces a two-level scaling hierarchy. The core quantized value (`q`) is a 4-bit integer. A finer-grained 4-bit unsigned integer scale (`s_l0` in `UINT4`) is applied to groups of 16 quantized values. A coarser 16-bit FP16 scale (`s_l1`) is applied per channel. The effective bit width is calculated as: (4 bits for `q`) + (4 bits for `s_l0` / 16 elements) = 4 + 0.25 = 4.25 bits/weight. The `FP16 s_l1` scale overhead per channel is negligible, hence not included in the fraction.
+    3.  **MX4 (Mixed-Precision with Microexponents, 4-bit effective):** This is a key example of specialized floating-point quantization. The base quantized value (`q`) uses a compact 3-bit `S1M2` format. A 1-bit `E1M0` scale (`s_l0`) is applied to very small groups of 2 `q` values. A coarser 8-bit `E8M0` scale (`s_l1`) is applied to groups of 16 `q` values. The effective bit width is: (3 bits for `q`) + (1 bit for `s_l0` / 2 elements) + (8 bits for `s_l1` / 16 elements) = 3 + 0.5 + 0.5 = 4 bits/weight. This allows for a wider dynamic range, typical of floating-point numbers, while maintaining a very low average bit-width.
+    4.  **MX6:** Similar to MX4, but uses a 5-bit `S1M4` format for `q`. The effective bit width becomes: 5 + 0.5 + 0.5 = 6 bits/weight, offering higher precision at the cost of slight increase in size.
+    5.  **MX9:** Uses an 8-bit `S1M7` format for `q`. The effective bit width is: 8 + 0.5 + 0.5 = 9 bits/weight, providing near-INT8 precision while retaining the floating-point-like dynamic range benefits.
+
+These multi-level, mixed-precision, floating-point quantization schemes represent a significant advancement, enabling LLMs to run efficiently on diverse hardware while maintaining high accuracy, especially for managing the ubiquitous outlier values in LLM activations and weights.
+
+**Current Trends and Future Outlook**
+
+The field of LLM quantization is characterized by rapid innovation.
+*   **Linear (Affine) Quantization** remains the foundational principle, with most advancements focusing on refining its application.
+*   **Per-channel** and especially **Group/Block-wise Quantization** are indispensable for LLMs due to their heterogeneous weight distributions.
+*   **Post-Training Quantization (PTQ)**, particularly advanced one-shot methods like GPTQ and AWQ, are highly relevant for efficient deployment of LLMs without the extensive resources required for QAT.
+*   **Quantization-Aware Training (QAT)** is the benchmark for achieving peak accuracy at very low bit-widths, particularly when PTQ falls short.
+*   **Mixed-Precision Quantization** is crucial for balancing accuracy and efficiency across the massive, varying layers of LLMs.
+*   **Hardware-optimized quantization formats** (like MXFP4, FP8) represent a significant step towards co-designing models and silicon for maximum performance.
+
+Conversely, methods like pure K-means quantization (where computation requires fetching float centroids) and general-purpose pure binary/ternary quantization are less commonly adopted as primary strategies for high-accuracy LLM inference, primarily due to the greater accuracy challenges and lack of widespread hardware acceleration for these specific paradigms compared to optimized integer or block-floating-point operations. The trajectory indicates a continuous push for lower effective bit-widths, driven by clever scaling strategies, specialized data formats, and a hardware-aware approach to model optimization.
+
+---
+**References**
+
+ Courbariaux, M., Bengio, Y., & David, J. P. (2015). BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. *NeurIPS Proceedings*.
+
+ Dai, S., Venkatesan, R., Ren, H., Zimmer, B., Dally, W. J., & Khailany, B. (2021). VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. *arXiv preprint arXiv:2102.04503*.
+
+ Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. *European Conference on Computer Vision (ECCV)*.
+
+ Zhu, C., Han, S., Mao, H., & Dally, W. J. (2017). Trained Ternary Quantization. *International Conference on Learning Representations (ICLR)*.
+
+ Migacz, S. (2017). 8-bit Inference with TensorRT. *NVIDIA GTC Presentation*.
+
+ Krishnamoorthi, R. (2018). Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. *arXiv preprint arXiv:1806.08342*.
+
+ Li, F., Liu, B., Wang, X., Zhang, B., & Yan, J. (2016). Ternary Weight Networks. *arXiv preprint arXiv:1605.04711*.
+
+ Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*.
+
+ Nagel, M., van Baalen, T., Blankevoort, T., & Louizos, C. (2019). Data-Free Quantization Through Weight Equalization and Bias Correction. *Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)*.
+
+ Han, S., Mao, H
--- a/content/posts/secure-boot-dkms-and-mok-on-proxmox-debian.md
+++ b/content/posts/secure-boot-dkms-and-mok-on-proxmox-debian.md
@@ -0,0 +1,111 @@
+---
+title: "Fixing GPU Operator Pods Stuck in Init: Secure Boot, DKMS, and MOK on Proxmox + Debian"
+date: 2025-08-09
+draft: false
+---
+
+I hit an issue where all GPU Operator pods on one node were stuck in Init after migrating from Legacy BIOS to UEFI. The common error was NVIDIA components waiting for “toolkit-ready,” while the toolkit init container looped with:
+- nvidia-smi failed to communicate with the NVIDIA driver
+- modprobe nvidia → “Key was rejected by service”
+
+That message is the tell: Secure Boot is enabled and the kernel refuses to load modules not signed by a trusted key.
+
+### Environment
+- Proxmox VM (QEMU/KVM) 8.4.9
+- Debian 12 (bookworm), kernel 6.1
+- GPU: NVIDIA Tesla V100 (GV100GL)
+- NVIDIA driver installed via Debian packages (nvidia-driver, nvidia-kernel-dkms)
+
+### Root Cause
+- Secure Boot enabled (verified with `mokutil --sb-state`)
+- NVIDIA DKMS modules were built, but the signing key was not trusted by the UEFI shim/firmware
+- VM booted via the fallback “UEFI QEMU HARDDISK” path (not shim), so MOK requests didn’t run; no MOK screen
+
+### Strategy
+Keep Secure Boot on; get modules trusted. That requires:
+1) Ensure the VM boots via shim (so MOK can work)
+2) Make sure DKMS signs modules with a MOK key/cert
+3) Enroll that MOK into the firmware via shim’s MokManager
+
+### Step 1 — Boot via shim and persist EFI variables
+In Proxmox (VM stopped):
+- BIOS: OVMF (UEFI)
+- Add EFI Disk (stores OVMF VARS; required for MOK)
+- Machine: q35
+- Enable Secure Boot (option shows only with OVMF + EFI Disk)
+
+Inside Debian:
+- Ensure ESP is mounted at `/boot/efi`
+- Install signed boot stack:
+  ```bash
+  sudo apt install shim-signed grub-efi-amd64-signed efibootmgr mokutil
+  sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian
+  sudo update-grub
+  ```
+- Create/verify a boot entry that points to shim:
+  ```bash
+  sudo efibootmgr -c -d /dev/sda -p 15 -L "debian" -l '\EFI\debian\shimx64.efi'
+  sudo efibootmgr -o 0002,0001,0000     # make shim (0002) first
+  sudo efibootmgr -n 0002               # BootNext shim for the next reboot
+  ```
+Tip: If NVRAM resets or fallback path is used, copy as a fallback:
+```bash
+sudo mkdir -p /boot/efi/EFI/BOOT
+sudo cp /boot/efi/EFI/debian/shimx64.efi /boot/efi/EFI/BOOT/BOOTX64.EFI
+sudo cp /boot/efi/EFI/debian/{mmx64.efi,grubx64.efi} /boot/efi/EFI/BOOT/
+```
+
+### Step 2 — Make DKMS sign NVIDIA modules with a MOK
+Debian already generated a DKMS key at `/var/lib/dkms/mok.key`. Create an X.509 cert in DER format:
+```bash
+sudo openssl req -new -x509 \
+  -key /var/lib/dkms/mok.key \
+  -out /var/lib/dkms/mok.der \
+  -outform DER \
+  -subj "/CN=DKMS MOK/" \
+  -days 36500
+```
+Enable DKMS signing:
+```bash
+sudo sed -i 's|^mok_signing_key=.*|mok_signing_key=/var/lib/dkms/mok.key|' /etc/dkms/framework.conf
+sudo sed -i 's|^mok_certificate=.*|mok_certificate=/var/lib/dkms/mok.der|' /etc/dkms/framework.conf
+```
+Rebuild/install modules (signs them now):
+```bash
+sudo dkms build nvidia/$(modinfo -F version nvidia) -k $(uname -r) --force
+sudo dkms install nvidia/$(modinfo -F version nvidia) -k $(uname -r) --force
+```
+
+### Step 3 — Enroll the MOK via shim (MokManager)
+Queue the cert and set a longer prompt timeout:
+```bash
+sudo mokutil --revoke-import
+sudo mokutil --import /var/lib/dkms/mok.der
+sudo mokutil --timeout 30
+sudo efibootmgr -n 0002  # ensure next boot goes through shim
+```
+Reboot to the VM console (not SSH). In the blue MOK UI:
+- Enroll MOK → Continue → Yes → enter password → reboot
+
+If arrow keys don’t work in Proxmox noVNC:
+- Use SPICE (virt-viewer), or
+- From the Proxmox host, send keys:
+  - `qm sendkey <VMID> down`, `qm sendkey <VMID> ret`, `qm sendkey <VMID> esc`
+
+### Verification
+```bash
+sudo mokutil --test-key /var/lib/dkms/mok.der   # “already enrolled”
+sudo modprobe nvidia
+nvidia-smi
+kubectl -n gpu-operator get pods -o wide
+```
+Once the module loads, GPU Operator pods on that node leave Init and become Ready.
+
+### Key Insights
+- “Key was rejected by service” during `modprobe nvidia` means Secure Boot rejected an untrusted module.
+- Without shim in the boot path (or without a persistent EFI vars disk), `mokutil --import` won’t surface a MOK screen.
+- DKMS will not sign modules unless configured; set `mok_signing_key` and `mok_certificate` in `/etc/dkms/framework.conf`.
+- If you cannot or don’t want to use MOK, the pragmatic dev choice is to disable Secure Boot in OVMF. For production, prefer shim+MOK.
+
+### References
+- Proxmox Secure Boot setup (shim + MOK, EFI vars, DKMS): [Proxmox docs](https://pve.proxmox.com/wiki/Secure_Boot_Setup#Setup_instructions_for_shim_+_MOK_variant)
--- a/content/posts/supabase-deep-dive.md
+++ b/content/posts/supabase-deep-dive.md
@@ -0,0 +1,165 @@
+---
+title: "Supabase Deep Dive: It's Not Magic, It's Just Postgres"
+date: 2025-08-03
+draft: false
+---
+
+In the world of Backend-as-a-Service (BaaS), platforms are often treated as magic boxes. You push data in, you get data out, and you hope the magic inside scales. While this simplicity is powerful, it can obscure the underlying mechanics, leaving developers wondering what's really going on.
+
+Supabase enters this space with a radically different philosophy: **transparency**. It provides the convenience of a BaaS, but it’s built on the world's most trusted relational database: PostgreSQL. The "magic" isn't a proprietary black box; it's a carefully assembled suite of open-source tools that enhance Postgres, not hide it.
+
+This deep dive will deconstruct that suite. We will move beyond the basics to explore the architectural patterns, security models, and development workflows that allow you to build robust, scalable applications. We will cover:
+
+*   **The Supabase Blueprint:** A procedural guide to designing your application.
+*   **The Pillars of Supabase:** A detailed look at Auth, Storage, Functions, and Realtime.
+*   **Transactional Realtime:** How Supabase guarantees data consistency in a live environment.
+*   **Best Practices:** The practical knowledge you need before writing a single line of code.
+
+### The Guiding Philosophy: Your Database as the Source of Truth
+
+The most critical shift when adopting Supabase is to see your database as more than just a data store. It is your **single source of truth**. This means your database schema is responsible for:
+
+*   **Structure:** The tables and columns that define your data.
+*   **Relationships:** The foreign keys that link tables together.
+*   **Integrity:** The constraints (`NOT NULL`, `UNIQUE`) that ensure your data is always valid.
+*   **Security:** The access control rules that define who can do what.
+
+By leveraging PostgreSQL's native power, you get **full ACID compliance** (Atomicity, Consistency, Isolation, Durability) out of the box. You don't need to worry about application-level code to prevent orphan records or inconsistent states; the database guarantees it for you.
+
+### The Supabase Design Blueprint: A Procedural Guide
+
+To build a scalable application, follow a structured design process that moves from abstract ideas to concrete implementation.
+
+#### Phase 1: Conceptual Modeling (The Blueprint)
+Before touching the Supabase dashboard, map out your application on paper.
+1.  **Identify the "Nouns":** These are your core data objects, which will become your database tables. For a project management app, they are `projects`, `tasks`, `users`, `comments`.
+2.  **Define the "Verbs":** These are the user actions. "A user *creates* a task." "A user *assigns* a task to another user." These actions will inform your security policies and APIs.
+3.  **Map Relationships:** How do the nouns connect? A `task` belongs to one `project`. A `user` can have many `tasks`. A `project` can have many `users` (a many-to-many relationship, requiring a `project_users` join table).
+
+#### Phase 2: The Foundation (Schema & Migrations)
+Translate your model into SQL. For any serious project, use the **Supabase CLI** to manage this process.
+1.  **Develop Locally:** Run a full Supabase stack on your machine with `supabase start`.
+2.  **Create Migration Files:** Write your `CREATE TABLE` statements in SQL files. Define columns, data types, and foreign key `REFERENCES` to enforce your relationships.
+3.  **Version Control:** Commit these migration files to Git. Your database schema is now version-controlled alongside your application code.
+4.  **Deploy:** Use `supabase db push` to apply your migrations to your live production database. This workflow is safe, repeatable, and professional.
+
+#### Phase 3: The Security Layer (Row Level Security)
+This is not an optional step. RLS is the cornerstone of Supabase security.
+1.  **Deny by Default:** For any table holding user data, immediately enable RLS. This blocks all access until you explicitly grant it.
+   ```sql
+   ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
+   ```
+2.  **Write "Allow" Policies:** Create policies based on your user stories. Policies are SQL rules that the database enforces on every single query.
+   ```sql
+   -- Users can see tasks in projects they are a member of.
+   CREATE POLICY "Allow read access to tasks in user's projects"
+   ON tasks FOR SELECT
+   USING (
+     EXISTS (
+       SELECT 1 FROM project_users
+       WHERE project_users.project_id = tasks.project_id
+       AND project_users.user_id = auth.uid()
+     )
+   );
+
+   -- Users can only insert tasks for themselves.
+   CREATE POLICY "Allow users to create their own tasks"
+   ON tasks FOR INSERT
+   WITH CHECK ( auth.uid() = tasks.assignee_id );
+   ```
+   The `auth.uid()` function is a special Supabase utility that securely returns the ID of the logged-in user making the request.
+
+#### Phase 4: The APIs (Data Access)
+With your data structured and secured, you can now build the access points.
+*   **For Simple CRUD:** Use Supabase's auto-generated API. It's convenient, respects all your RLS policies, and is perfect for simple reads and writes on a single table.
+  ```javascript
+  const { data, error } = await supabase.from('tasks').select('*');
+  ```
+*   **For Complex Logic:** Use PostgreSQL Functions (RPC). Encapsulate complex `JOIN`s or multi-step transactions into a single, callable function. This reduces network chattiness and keeps your business logic secure on the server.
+  ```sql
+  -- A function to get a task and its project name in one call
+  CREATE OR REPLACE FUNCTION get_task_with_project(task_id_input int)
+  RETURNS TABLE (task_title text, project_name text) AS $$
+  BEGIN
+    RETURN QUERY
+      SELECT tasks.title, projects.name
+      FROM tasks
+      JOIN projects ON tasks.project_id = projects.id
+      WHERE tasks.id = task_id_input;
+  END;
+  $$ LANGUAGE plpgsql;
+  ```
+  ```javascript
+  // Called simply from the frontend
+  const { data, error } = await supabase.rpc('get_task_with_project', { task_id_input: 123 });
+  ```
+
+### A Tour of the Core Services
+
+Beyond the database, Supabase provides a suite of essential tools.
+
+#### Authentication
+A complete user management system that integrates directly with your database. When a user signs up, a corresponding entry is created in the managed `auth.users` table, which you can then reference in your own tables.
+```javascript
+// Sign up a new user and handle social logins with ease
+const { data, error } = await supabase.auth.signUp({ email, password });
+const { data, error } = await supabase.auth.signInWithOAuth({ provider: 'github' });
+```
+
+#### Storage
+A simple, S3-compatible object store for managing files like user avatars or documents. It's integrated with Postgres and RLS, allowing you to write fine-grained access policies on files and folders (buckets).
+```javascript
+// Upload a user avatar to a public 'avatars' bucket
+const { error } = await supabase.storage
+  .from('avatars')
+  .upload(`public/${userId}.png`, file);
+```
+
+#### Edge Functions vs. Database Functions
+It's critical to know when to use which.
+*   **Database Functions (SQL):** For data-intensive logic *inside* your database.
+*   **Edge Functions (TypeScript/Deno):** For connecting to the outside world. Use them to call third-party APIs (like Stripe for payments) or run computations that are not well-suited for SQL. This is where you use your secret `service_role` key, as the function runs in a trusted server environment.
+
+### The Realtime Engine: A Pub/Sub System for Postgres
+
+Supabase's Realtime engine is a powerful feature for building live, interactive experiences.
+
+#### How it Works: Logical Replication
+It's not magic; it leverages a core PostgreSQL feature.
+1.  When you enable Realtime on a table, Supabase creates a **Publication** for it.
+2.  The Realtime server subscribes to this publication via a **Logical Replication Slot**.
+3.  When a transaction is **successfully committed** to your database, the change is written to Postgres's Write-Ahead Log (WAL).
+4.  The WAL change is then sent to the Realtime server through the replication slot.
+5.  The server converts this database event into a JSON payload and broadcasts it over a WebSocket to all subscribed clients.
+
+#### Transactional Integrity
+The most important guarantee of this system is its relationship with database transactions. An event is **only broadcast *after* a transaction is fully and successfully committed.** If a transaction is rolled back due to an error, the replication slot receives nothing, and no Realtime event is ever sent. This means you can trust that every Realtime message you receive corresponds to data that is permanently and consistently stored in your database.
+
+#### Use Cases and Limitations
+*   **Use For:** Small, JSON-based messages like chat messages, live notifications, activity feeds, and presence indicators ("who's online"). Use the `broadcast` feature for ephemeral data like cursor positions that you don't need to save.
+*   **Do NOT Use For:** Large, continuous data streams. It is **not** a replacement for WebRTC for video/audio calls. The system is designed for small, infrequent payloads.
+
+```javascript
+const channel = supabase.channel('public:messages');
+
+// Subscribe to new rows in the 'messages' table
+channel
+  .on(
+    'postgres_changes',
+    { event: 'INSERT', schema: 'public', table: 'messages' },
+    (payload) => {
+      console.log('New message received!', payload.new);
+      // Update your UI here
+    }
+  )
+  .subscribe();
+```
+
+### Final Words of Advice
+
+*   **Frontend Freedom:** Supabase is frontend-agnostic, but meta-frameworks like **Next.js** and **SvelteKit** offer a "golden path" with Auth Helpers that simplify server-side rendering and data fetching.
+*   **Embrace the CLI:** Use the Supabase CLI for a professional, safe, and repeatable development workflow. Don't manage your production schema by clicking in the UI.
+*   **Know Your Keys:** Use the public `anon` key in the browser. Guard the secret `service_role` key and only use it in secure server environments like Edge Functions.
+*   **Indexes Matter:** For fast queries on large tables, `CREATE INDEX` on frequently queried columns. Performance is not automatic.
+
+By understanding these principles, you can leverage Supabase not as a simple BaaS, but as a powerful, transparent, and scalable platform for building next-generation applications on the solid foundation of PostgreSQL.
--- a/content/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive.md
+++ b/content/posts/t5-the-transformer-that-zigged-when-others-zagged-an-architectural-deep-dive.md
@@ -0,0 +1,81 @@
+---
+title: "An Architectural Deep Dive of T5"
+date: 2025-06-01
+draft: false
+---
+
+
+In the rapidly evolving landscape of Large Language Models, a few key architectures define the dominant paradigms. Today, the "decoder-only" model, popularized by the GPT series and its successors like LLaMA and Mistral, reigns supreme. These models are scaled to incredible sizes and excel at in-context learning.
+
+But to truly understand the field, we must look at the pivotal models that explored different paths. Google's T5, or **Text-to-Text Transfer Transformer**, stands out as one of the most influential. It didn't just introduce a new model; it proposed a new philosophy. This article dives deep into the architecture of T5, how it fundamentally differs from modern LLMs, and the lasting legacy of its unique design choices.
+
+### The Core Philosophy: Everything is a Text-to-Text Problem
+
+The genius of T5 lies in its unifying framework. Instead of building different models or fine-tuning procedures for various NLP tasks, T5 reframes every task as a text-to-text problem. The model takes a string as input and generates a string as output, regardless of the underlying objective.
+
+This is accomplished by adding a **task prefix** to the input. These prefixes are not conversational prompts like a GPT "system prompt"; they are learned triggers that the model is explicitly fine-tuned to recognize.
+
+| Task | T5 Input | Expected T5 Output |
+| :--- | :--- | :--- |
+| Translation | `translate English to German: The cat is cute.` | `Die Katze ist süß.` |
+| Summarization | `summarize: [A long news article...]` | `[A concise summary.]` |
+| Classification | `cola sentence: The boys is walking.` | `unacceptable` |
+| Similarity | `stsb sentence1: The car is red. sentence2: The auto is crimson.` | `4.8` |
+
+This elegant approach turns even classification into a generation task, where the model learns to generate the text of the correct label.
+
+### The Engine: A Two-Window Encoder-Decoder Architecture
+
+To execute this text-to-text mission, T5 uses the original Transformer's **encoder-decoder architecture**. This is the most significant point of divergence from modern decoder-only LLMs. The inference process works in two distinct stages:
+
+#### Stage 1: The Encoder (The "Understanding" Window)
+When T5 receives an input like `summarize: [article text]`, the entire string is fed into the **encoder**.
+
+*   **Bidirectional Context:** The encoder processes the input bidirectionally. Every token can see every other token in the input text simultaneously. This allows the model to build a deep, holistic understanding of the entire prompt and its context.
+*   **Static Representation:** The encoder's final output is not text. It's a set of numerical representations (hidden states) that encapsulates the meaning and intent of the input. This representation is generated once and remains static for the entire generation process.
+
+#### Stage 2: The Decoder (The "Writing" Window)
+The decoder is responsible for generating the output string token by token.
+
+*   **Autoregressive Generation:** It begins with a `start-of-sequence` token and generates the output one word at a time.
+*   **Cross-Attention:** At each step, the decoder does two things: it looks at the text it has generated so far (its own "decoder context"), and crucially, it uses a mechanism called **cross-attention** to look back at the static representation created by the encoder. This allows the decoder's generation to be guided by the encoder's complete understanding of the prompt.
+*   **Growing Context:** The decoder's context window grows with each token it generates until it produces an `end-of-sequence` token, signaling that the task is complete.
+
+This two-window system is a powerful design, especially for tasks that require a full understanding of a source document before generating a new one (like translation or summarization).
+
+### Architectural Divergence: T5 vs. The Modern LLM Playbook
+
+Beyond its core architecture, T5 made several specific design choices that contrast with today's standards.
+
+#### 1. Positional Embeddings: Relative (RPE) vs. Rotary (RoPE)
+How a model knows the order of words is critical.
+
+*   **T5's Approach (RPE):** T5 uses a form of **Relative Positional Embedding**. Instead of adding a position signal to the word embeddings, it adds a learned bias directly to the attention scores based on the relative distance between tokens. It's a clever way to encode position that is independent of sequence length.
+*   **The Modern Standard (RoPE):** Most modern LLMs (LLaMA, PaLM, Mistral) use **Rotary Positional Embeddings**. As detailed in the CS336 slides, RoPE works by mathematically *rotating* the Query and Key vectors based on their absolute position. This method has proven exceptionally effective for long sequences and is considered the current state-of-the-art.
+
+#### 2. The Feed-Forward Network: An Extreme Experiment
+The Feed-Forward Network (FFN) inside each Transformer block is typically 4 times the model's hidden dimension (`d_model`). The original T5 11B model took a radical departure from this rule.
+
+*   **T5 11B's Choice:** It used a small hidden dimension (`d_model = 1024`) but an astoundingly large FFN dimension (`d_ff = 65,536`), a **64-times multiplier**. The rationale was that modern accelerators (like Google's TPUs) are highly efficient at large, dense matrix multiplications.
+*   **The Modern Standard:** This experiment was not widely adopted. Later models, including T5's own successor **T5 v1.1**, reverted to the standard 4x multiplier (or ~2.66x when using GLU activations) for a better balance of parameters and performance.
+
+#### 3. Denoising: Span Corruption vs. Iterative Diffusion
+While T5's pre-training is called "denoising," it's conceptually different from the denoising in modern diffusion models.
+
+*   **T5's Denoising:** This is **span corruption**. The model is shown a sentence with chunks of text masked out and learns to predict exactly what was removed in a single step. It's a fill-in-the-blanks task to learn rich language representations.
+*   **Diffusion Denoising:** This is a multi-step generative process. A clean text is gradually corrupted with noise, and the model learns to reverse this process step-by-step, allowing it to generate high-fidelity text from pure noise.
+
+### Where T5 Was Ahead of its Time
+
+Despite its differences, the "T5 v1.1" variant pioneered several techniques that are now standard practice in the most advanced LLMs:
+
+*   **RMSNorm:** It was one of the first major models to adopt Root Mean Square Normalization instead of LayerNorm, a choice now used by LLaMA, Mistral, and others for its efficiency and stability.
+*   **Pre-Normalization:** T5 applies the normalization layer *before* the attention and FFN blocks, a critical technique for enabling stable training of very deep networks.
+*   **No Bias Terms:** T5 v1.1 removed the bias parameters from its normalization and FFN layers, a small but important optimization for memory and stability that modern models follow.
+*   **Gated Activations (GeGLU):** While the original T5 used ReLU, T5 v1.1 adopted a Gated Linear Unit (GeGLU), presaging the move to GLU-family activations (like SwiGLU) that is now ubiquitous.
+
+### Conclusion: The Lasting Legacy
+
+T5 represents a different evolutionary branch in the Transformer family tree. While the field has largely converged on the decoder-only architecture for its scalability in general-purpose models, T5's design remains a masterclass in purpose-built engineering.
+
+Its text-to-text framework was revolutionary, its encoder-decoder structure is still a go-to for tasks like translation, and its refined T5 v1.1 architecture laid the groundwork for many of the stability and efficiency tricks we see in today's state-of-the-art models. T5 is more than just a model; it's a crucial case study in the architectural trade-offs that continue to shape the future of artificial intelligence.
--- a/content/posts/transformer-s-core-mechanics.md
+++ b/content/posts/transformer-s-core-mechanics.md
@@ -0,0 +1,93 @@
+---
+title: "Transformer's Core Mechanics"
+date: 2025-04-01
+draft: false
+---
+
+The Transformer architecture is the bedrock of modern Large Language Models (LLMs). While its high-level success is widely known, a deeper understanding requires dissecting its core components. This article provides a detailed, technical breakdown of the fundamental concepts within a Transformer block, from the notion of "channels" to the intricate workings of the attention mechanism and its relationship with other advanced architectures like Mixture of Experts.
+
+### 1. The "Channel": A Foundational View of `d_model`
+
+In deep learning, a "channel" can be thought of as a feature dimension. While this term is common in Convolutional Neural Networks for images (e.g., Red, Green, Blue channels), in LLMs, the analogous concept is the model's primary embedding dimension, commonly referred to as `d_model`.
+
+An input text is first tokenized, and each token is mapped to a vector of size `d_model` (e.g., 4096). Each of the 4096 dimensions in this vector can be considered a "channel," representing a different semantic or syntactic feature of the token.
+
+As this data, represented by a tensor of shape `[batch_size, sequence_length, d_model]`, progresses through the layers of the Transformer, these channels are continuously transformed. However, a critical design choice is that the output dimension of every main sub-layer (like the attention block or the FFN block) is also `d_model`. This consistency is essential for enabling **residual connections**, where the input to a block is added to its output (`output = input + SubLayer(input)`). This technique is vital for training the extremely deep networks common today.
+
+### 2. The Building Blocks: Dimensions of Key Layers
+
+A Transformer layer is primarily composed of two sub-layers: a Multi-Head Attention block and a position-wise Feed-Forward Network (FFN). The parameters for these are stored in several key weight matrices. Understanding their dimensions is crucial.
+
+Let's define our variables:
+*   `d_model`: The core embedding dimension.
+*   `d_ff`: The inner dimension of the FFN, typically `4 * d_model`.
+*   `h`: The number of attention heads.
+*   `d_head`: The dimension of each attention head, where `d_model = h * d_head`.
+
+The dimensions of the weight matrices are as follows:
+
+| Layer                         | Weight Matrix | Input Vector Shape | Output Vector Shape | **Weight Matrix Dimension** |
+| ----------------------------- | ------------- | ------------------ | ------------------- | ------------------------- |
+| **Attention Projections**     |               |                    |                     |                           |
+| Query                         | `W_Q`         | `d_model`          | `d_model`           | **`[d_model, d_model]`**  |
+| Key                           | `W_K`         | `d_model`          | `d_model`           | **`[d_model, d_model]`**  |
+| Value                         | `W_V`         | `d_model`          | `d_model`           | **`[d_model, d_model]`**  |
+| Output                        | `W_O`         | `d_model`          | `d_model`           | **`[d_model, d_model]`**  |
+| **Feed-Forward Network**      |               |                    |                     |                           |
+| Layer 1 (Up-projection)       | `W_ff1`       | `d_model`          | `d_ff`              | **`[d_model, d_ff]`**     |
+| Layer 2 (Down-projection)     | `W_ff2`       | `d_ff`             | `d_model`           | **`[d_ff, d_model]`**     |
+
+### 3. Deconstructing Multi-Head Attention (MHA)
+
+The core innovation of the Transformer is Multi-Head Attention. It allows the model to weigh the importance of different tokens in the sequence from multiple perspectives simultaneously.
+![](/images/transformer-s-core-mechanics/.png)
+#### 3.1. The "Why": Beyond a Single Attention
+A single attention mechanism would force the model to average all types of linguistic relationships into one pattern. MHA avoids this by creating `h` parallel subspaces. Each "head" can specialize, with one head learning syntactic dependencies, another tracking semantic similarity, and so on. This creates a much richer representation.
+
+#### 3.2. An Encoding/Decoding Analogy
+A powerful way to conceptualize the attention calculation is as a two-stage process:
+1.  **Encoding Relationships:** The first part of the calculation, `softmax(Q @ K.T)`, can be seen as an encoding step. It does not use the actual "content" of the tokens (the `V` vectors). Instead, it uses the Queries and Keys to build a dynamic "relationship map" between tokens in the sequence. This map, a matrix of attention scores, answers the question: "For each token, how important is every other token right now?"
+2.  **Decoding via Information Retrieval:** The second part, `scores @ V`, acts as a decoding step. It uses the relationship map to retrieve and synthesize information. For each token, it creates a new vector by taking a weighted sum of all the `V` vectors in the sequence, using the scores as the precise mixing recipe. It decodes the relational structure into a new, context-aware representation.
+
+#### 3.3. The "How": A Step-by-Step Flow
+The MHA process is designed for maximum computational efficiency.
+1.  **Initial Projections:** The input vectors (shape `[seq_len, d_model]`) are multiplied by `W_Q`, `W_K`, and `W_V`. These matrices are all `[d_model, d_model]` not to create one large query, but to **efficiently compute the vectors for all `h` heads at once**. The single large output vector is then reshaped into `h` separate vectors, each of size `d_head`.
+2.  **Attention Score Calculation:** For each head `i`, a score matrix is calculated: `scores_i = softmax( (Q_i @ K_i.T) / sqrt(d_head) )`. Note that `Q_i` and `K_i` have dimensions `[seq_len, d_head]`, so the resulting `scores_i` matrix has a dimension of **`[seq_len, seq_len]`**.
+3.  **Weighted Value Calculation:** The scores are used to create a weighted sum of the Value vectors for each head: `output_i = scores_i @ V_i`. Since `scores_i` is `[seq_len, seq_len]` and `V_i` is `[seq_len, d_head]`, the resulting `output_i` has a dimension of **`[seq_len, d_head]`**. This is the final output of a single head.
+4.  **Concatenation and Final Projection:** The outputs of all `h` heads are concatenated along the last dimension. This produces a single large matrix of shape `[seq_len, h * d_head]`, which is equivalent to `[seq_len, d_model]`. This matrix is then passed through the final output projection layer, `W_O` (shape `[d_model, d_model]`), to produce the attention block's final output. The `W_O` matrix learns the optimal way to mix the information from all the specialized heads into a single, unified representation.
+
+### 4. Optimizing Attention: GQA and MQA
+
+During inference, storing the Key and Value vectors for all previous tokens (the KV Cache) is a major memory bottleneck. **Grouped-Query Attention (GQA)** and **Multi-Query Attention (MQA)** are architectural modifications that address this by allowing multiple Query heads to share the same Key and Value heads.
+
+Let's use a concrete example, similar to Llama 2 7B:
+*   `d_model` = 4096
+*   `h` = 32 Q heads
+*   `d_head` = 128
+*   `g` = 8 KV head groups for GQA
+
+The key insight is that only the dimensions of the `W_K` and `W_V` matrices change, which in turn reduces the size of the KV cache. The `W_Q` and `W_O` matrices remain `[4096, 4096]`.
+
+| Attention Type      | No. of Q Heads | No. of KV Heads | `W_K` & `W_V` Dimension   | Relative KV Cache Size |
+| ------------------- | -------------- | --------------- | ----------------------- | ---------------------- |
+| **MHA** (Multi-Head)| 32             | 32              | `[4096, 32*128]` = `[4096, 4096]` | 1x (Baseline)          |
+| **GQA** (Grouped)   | 32             | 8               | `[4096, 8*128]` = `[4096, 1024]`  | 1/4x                   |
+| **MQA** (Multi-Query)| 32             | 1               | `[4096, 1*128]` = `[4096, 128]`   | 1/32x                  |
+
+GQA provides a robust balance, significantly reducing the memory and bandwidth requirements for the KV cache with negligible impact on model performance, making it a popular choice in modern LLMs.
+
+### 5. MHA vs. Mixture of Experts (MoE): A Clarification
+
+While both MHA and MoE use the concept of "experts," they are functionally and architecturally distinct.
+
+*   **MHA:** The "experts" are the **attention heads**. All heads are active for every token to build a rich representation within the attention layer. This is akin to a board meeting where every member analyzes and contributes to every decision.
+*   **MoE:** The "experts" are full **Feed-Forward Networks**. A routing network selects a small subset of these FFNs for each token. This is a scaling strategy to increase a model's parameter count for greater capacity while keeping the computational cost fixed. It replaces the standard FFN block, whereas MHA *is* the attention block.
+
+By understanding these technical details, from the basic concept of a channel to the sophisticated interplay of heads and experts, one can build a more complete and accurate mental model of how LLMs truly operate.
+
+---
+### References
+
+1.  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. *Advances in neural information processing systems*, 30.
+2.  Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. *arXiv preprint arXiv:1701.06538*.
+3.  Ainslie, J., Ontanon, J., Cakka, E., Dosovitskiy, A., & Le, Q. V. (2023). GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. *arXiv preprint arXiv:2305.13245*.
--- a/content/posts/useful.md
+++ b/content/posts/useful.md
@@ -8,5 +8,4 @@ categories = []
 externalLink = ""
 series = []
 +++
-* [rootCA.pem](https://ericxliu.me/rootCA.pem)
-* [vpnclient.ovpn](https://ericxliu.me/vpnclient.ovpn)
+* [rootCA.pem](/rootCA.crt)
--- a/ericxliu-me.code-workspace
+++ b/ericxliu-me.code-workspace
@@ -0,0 +1,7 @@
+{
+	"folders": [
+		{
+			"path": "."
+		}
+	]
+}
--- a/layouts/partials/footer.html
+++ b/layouts/partials/footer.html
@@ -0,0 +1,22 @@
+{{ if not .Site.Params.hideFooter | default false }}
+  <footer class="footer">
+    <section class="container">
+      {{ with .Site.Params.footerContent | safeHTML }}
+        <p>{{ . }}</p>
+      {{ end }}
+      {{ if not .Site.Params.hideCopyright }}
+        ©
+        {{ if (and (.Site.Params.since) (lt .Site.Params.since now.Year)) }}
+          {{ .Site.Params.since }} -
+        {{ end }}
+        {{ now.Year }}
+        {{ with .Site.Params.author }} {{ . }} {{ end }}
+      {{ end }}
+      {{ if not .Site.Params.hideCredits }}
+        {{ if not .Site.Params.hideCopyright }} · {{ end }}
+        {{ i18n "powered_by" }} <a href="https://gohugo.io/">Hugo</a> & <a href="https://github.com/luizdepra/hugo-coder/">Coder</a>.
+      {{ end }}
+      [commit]
+    </section>
+  </footer>
+{{ end }}
--- a/layouts/robots.txt
+++ b/layouts/robots.txt
@@ -0,0 +1 @@
+User-agent: *
--- a/static/images/a-deep-dive-into-ppo-for-language-models/.png
+++ b/static/images/a-deep-dive-into-ppo-for-language-models/.png
--- a/static/images/a-deep-dive-into-ppo-for-language-models/64bfdb4b-678e-4bfc-8b62-0c05c243f6a9.png
+++ b/static/images/a-deep-dive-into-ppo-for-language-models/64bfdb4b-678e-4bfc-8b62-0c05c243f6a9.png
--- a/static/images/a-technical-deep-dive-into-the-transformer-s-core-mechanics/.png
+++ b/static/images/a-technical-deep-dive-into-the-transformer-s-core-mechanics/.png
--- a/static/images/gravatar.png
+++ b/static/images/gravatar.png
--- a/static/images/ppo-for-language-models/.png
+++ b/static/images/ppo-for-language-models/.png
--- a/static/images/transformer-s-core-mechanics/.png
+++ b/static/images/transformer-s-core-mechanics/.png
--- a/static/rootCA.crt
+++ b/static/rootCA.crt
@@ -0,0 +1,34 @@
+-----BEGIN CERTIFICATE-----
+MIIF2DCCA8CgAwIBAgIUMxAajDuiWUFtwePBQChCPyqvyIowDQYJKoZIhvcNAQEL
+BQAwcjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRowGAYDVQQKDBFlcmljeGxp
+dS5tZSwgSW5jLjEXMBUGA1UEAwwOZXJpY3hsaXUubG9jYWwxITAfBgkqhkiG9w0B
+CQEWEm1hc3RlckBlcmljeGxpdS5tZTAeFw0yNDAxMDgwMzA0NDFaFw0yNjAxMDgw
+MzA0NDFaMHIxCzAJBgNVBAYTAlVTMQswCQYDVQQIDAJDQTEaMBgGA1UECgwRZXJp
+Y3hsaXUubWUsIEluYy4xFzAVBgNVBAMMDmVyaWN4bGl1LmxvY2FsMSEwHwYJKoZI
+hvcNAQkBFhJtYXN0ZXJAZXJpY3hsaXUubWUwggIiMA0GCSqGSIb3DQEBAQUAA4IC
+DwAwggIKAoICAQDedDTBe0+qRV1r+kRvMZzFkensiKMpL4T9bRbAbNFfS8QufHp9
+wJoMh5xW4XPJtqkYdYnnoefaZS9a9DMHjw1+f7lL0vzIfzSO5JWTZQSAsi0yeqDn
+j1l8ShYrPZvQR+NUht9qAztbhIcBy3FFVOFFMZjZaYIwF1C3QBv5h2/yfgw0uad/
+rOEw1G1Z/xlj7K+rvm59+vzduASfFY6NMG0PFzY1jRnWZ4diiqWJEM02EAevosbW
+Xg1CFRkoe+s088QXl4WZLxpHvsiKdvKjaaKXrQieAYL2Kl3DOziN7P659q0Bk2tm
+yp0B81QZV24mhg5WCuwrteiOJz51vck/T+hWDFKjPwa+GjGpqGiXjJMBfS/MyGMf
+mdnPdcMeKQo2Mx4hpl/h116xFY60Tzto/PI4Kb4VBTKkN0hu7BLDSU4l8PkiSSAd
+0E2Kzg4P9BQgvVc/BhoR7oKebf2TCeTVN+gC9HRsBdzBA3mtp60Qd9XBFAkbDqZq
+nusA8KEG10az4cXaMIohAsRh9AVz4tHxTOq2dgw9AE8EEfQzgcMQl4hV4TkYFubC
+t/gm16yEvsPBMFjptLu4S7mOpSdaJylOXVcMZ6PgeGAlrbuYunblYtdyKVyNVFeX
+ca6RPAbDthWSqrbzigCvSeqhRpPmEq5p51BFGA+QK2b1Bj7dF0yiDO5zbwIDAQAB
+o2YwZDAfBgNVHSMEGDAWgBQEK7HddEflCZ9DL9VEIBXzB9dQFTAOBgNVHQ8BAf8E
+BAMCAgQwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHQ4EFgQUBCux3XRH5QmfQy/V
+RCAV8wfXUBUwDQYJKoZIhvcNAQELBQADggIBAKF16Ps4AccXsNDRqQANF/kcNZ2y
+SKB3cNsOfWxKfgppkl43z9cimgGGbNn0mVGjaOzXdXHEEQ0Uuv3tkvgQA2KraaTy
+wLG5+RQKIVRaOgWufXbL76JV6mMf8v3o8/o5EL+uC/2KxpDH0N1BOJ0hJB2/hbra
+kHPuYobj1SWtPeO5lRdZed05kdiAWH7e3/PmKgH13tZLnnzCHRC1YNkk2Cdhp082
+XL5zUtDdbWAm6UgM4Reg4MKZMZzmYDn+1/wW6D5oO5ZXlJF2QqjqfTXn6fKJWM9d
+JK3O5vx+LquAMu1G9gkqmTZntQQ3ZDGs9bMfWchgWPWN1ignJgmqnIgIbvdAHhdL
+DOz3WE53vpcUY35TOs/YgIj81vAZuhuaYQZcTL4H34c3ShdVi6RY3Y+yPxM9MjRc
+zqzEMg4KTnK7Es+t4Yep7vOQRo3WN1A+lXsRf+n2XBTCTwFOCury64AjMQn5H0yb
+aZGvvf3UnIdUrJjPGjF9W/uIpy0TDpsKo/qizAdQ5c18p2ihVO8mHHZhJnIQW9er
+p8M0m6/woalM94apYNdY6YAbsej5gNktx+z2ptdPNmE3k3OevDFqRNSLh29Rr2vM
+CfO6MjR4Bkilw5A67jQFQnLF6Y9TYqW0HlEvdODNvO9aR5RSwaNTGJBcjynrsL3v
+IG73ZMQl6utPkbKh
+-----END CERTIFICATE-----
--- a/static/rootCA.pem
+++ b/static/rootCA.pem
@@ -1,33 +0,0 @@
-----BEGIN CERTIFICATE-----
-MIIFoDCCA4igAwIBAgIUJzlDGIEJdOQ0Shd1P0RJP5aangAwDQYJKoZIhvcNAQEL
-BQAwYTELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMREwDwYDVQQHDAhTYW4gSm9z
-ZTEUMBIGA1UECgwLZXJpY3hsaXUubWUxHDAaBgkqhkiG9w0BCQEWDV9AZXJpY3hs
-aXUubWUwHhcNMjExMTIyMTgzMzQyWhcNMjQwOTExMTgzMzQyWjBhMQswCQYDVQQG
-EwJVUzELMAkGA1UECAwCQ0ExETAPBgNVBAcMCFNhbiBKb3NlMRQwEgYDVQQKDAtl
-cmljeGxpdS5tZTEcMBoGCSqGSIb3DQEJARYNX0BlcmljeGxpdS5tZTCCAiIwDQYJ
-KoZIhvcNAQEBBQADggIPADCCAgoCggIBAN50NMF7T6pFXWv6RG8xnMWR6eyIoykv
-hP1tFsBs0V9LxC58en3AmgyHnFbhc8m2qRh1ieeh59plL1r0MwePDX5/uUvS/Mh/
-NI7klZNlBICyLTJ6oOePWXxKFis9m9BH41SG32oDO1uEhwHLcUVU4UUxmNlpgjAX
-ULdAG/mHb/J+DDS5p3+s4TDUbVn/GWPsr6u+bn36/N24BJ8Vjo0wbQ8XNjWNGdZn
-h2KKpYkQzTYQB6+ixtZeDUIVGSh76zTzxBeXhZkvGke+yIp28qNpopetCJ4BgvYq
-XcM7OI3s/rn2rQGTa2bKnQHzVBlXbiaGDlYK7Cu16I4nPnW9yT9P6FYMUqM/Br4a
-MamoaJeMkwF9L8zIYx+Z2c91wx4pCjYzHiGmX+HXXrEVjrRPO2j88jgpvhUFMqQ3
-SG7sEsNJTiXw+SJJIB3QTYrODg/0FCC9Vz8GGhHugp5t/ZMJ5NU36AL0dGwF3MED
-ea2nrRB31cEUCRsOpmqe6wDwoQbXRrPhxdowiiECxGH0BXPi0fFM6rZ2DD0ATwQR
-9DOBwxCXiFXhORgW5sK3+CbXrIS+w8EwWOm0u7hLuY6lJ1onKU5dVwxno+B4YCWt
-u5i6duVi13IpXI1UV5dxrpE8BsO2FZKqtvOKAK9J6qFGk+YSrmnnUEUYD5ArZvUG
-Pt0XTKIM7nNvAgMBAAGjUDBOMB0GA1UdDgQWBBQEK7HddEflCZ9DL9VEIBXzB9dQ
-FTAfBgNVHSMEGDAWgBQEK7HddEflCZ9DL9VEIBXzB9dQFTAMBgNVHRMEBTADAQH/
-MA0GCSqGSIb3DQEBCwUAA4ICAQDKVGKjd1v6vecfNLZZ4+bqw4nwzzVwOdOWb2j+
-zqPmYT/ZzCCxeiWLIaYtOQWXR4eSzULWYAGauecVlVYydbRbDC6LXp/1NrfQuNpp
-6kd9JRGGdnNrW+0tEfJiXnEpOTwKncI1u6B0pvND8Gy6sxgjamyiKAh1vy0IZYJk
-2T7PXxljqGxFZXZ5Ese/ogPn5KRGPkOmbW/BQXWC//3Qe39J6lxy2/HwfZ9pa+AQ
-TxcJ/2OiDgBprMPJrHBiqvjoI9kp8vk3JhAQmbVM+8bpAIiiW8dPiEBDtROe/Wk5
-UuiebFQNIebaIG+nEruUR28Df3Q52k6dY4MWLVNqB9lKKCqnbYtxDUIQrFCSHAEq
-IdeOTEPjpkBr6UWwEunk32Mq6mdqmf5zBNaS64Wva43SLx+p/MIIacCYxOH7CHJX
-r6XO/tR95cO4N3LdA/aJYpY0M35tFftFKI/AD5vEwshgYDw9QU1fu3Wljw3wYSVx
-8YPKKwRkEBslEBmqf9YooDtGw3bLkQbJml0uMgxXOYI/VD95azvguq1lmcSdTTPu
-f1GC0YnpQnXT6gPHNLoMhGiQUTlwHp2GKdaW0Xb9DEOLurzBZ9FIQsvrgclpJ49x
-avp4Sgk3wLVue5iOKqlZL5fQIjckQEVR8vieKnZgGx6amVS9a5gB0GbAhkD06Y4p
-M3O6VQ==
-----END CERTIFICATE-----
--- a/static/site.webmanifest
+++ b/static/site.webmanifest
@@ -0,0 +1,20 @@
+{
+  "name": "Eric X. Liu's Personal Page",
+  "short_name": "Eric Liu",
+  "description": "Software & Performance Engineer at Google. Sharing insights about software engineering, performance optimization, tech industry experiences, mountain biking adventures, Jeep overlanding, and outdoor activities.",
+  "start_url": "/",
+  "display": "standalone",
+  "background_color": "#ffffff",
+  "theme_color": "#000000",
+  "icons": [
+    {
+      "src": "/images/gravatar.png",
+      "sizes": "192x192",
+      "type": "image/png",
+      "purpose": "any maskable"
+    }
+  ],
+  "categories": ["technology", "engineering", "blog"],
+  "lang": "en",
+  "orientation": "portrait-primary"
+}
--- a/static/vpnclient.ovpn
+++ b/static/vpnclient.ovpn
@@ -1,140 +0,0 @@
-verb 3
-nobind
-dev tun
-client
-remote 24.4.129.175 1194 udp
-fast-io
-compress lzo
-auth-nocache
-remote-cert-tls server
-<tls-crypt>
-----BEGIN OpenVPN Static key V1-----
-d188baecfc63820df3a11c50aa887c4e7236ff8021049038aec03f4f2a46376b
-aee8d80d06dbd812b84962937bed7003fdf64c264e9b7423925dbce4dd38b4e0
-a3bdfe6e656550a63430338c0dd4bcd4c694221c7561fa9e6da3efd0334a57ee
-5926acc05f768339b4712bf005d7eeb27f2da8dc8f4861b718b6683eb42869c0
-e11a1ac6c36daea5c79d7e08830de1c6f0a55207bb39e9c0420db34b3a631975
-5cfcef448f6664fde5d40e31e381503a6a724eebd7cfd76fe6d7108edc83b5ab
-ea1e66af70837d15a9d8ba58c82018b4cd669deb2323ba60d7c7ea8a398483aa
-2dec8aa6890dc2f60ff5be1a5c2a6a2fe95efa27f75c38735335e7f6f39b256e
-----END OpenVPN Static key V1-----
-</tls-crypt>
-<ca>
-----BEGIN CERTIFICATE-----
-MIIFNTCCAx2gAwIBAgIJAJBOAeknPeLqMA0GCSqGSIb3DQEBCwUAMDExCzAJBgNV
-BAYTAkdCMQ8wDQYDVQQIDAZMb25kb24xETAPBgNVBAoMCFdXVyBMdGQuMB4XDTE5
-MDEwNjA3NTIxOVoXDTI5MDEwMzA3NTIxOVowMTELMAkGA1UEBhMCR0IxDzANBgNV
-BAgMBkxvbmRvbjERMA8GA1UECgwIV1dXIEx0ZC4wggIiMA0GCSqGSIb3DQEBAQUA
-A4ICDwAwggIKAoICAQDpq5CFMT1VWb2MeaHXi4FpCLDXwnzaS+3qGCa3COdNg2BD
-tkQOPJNTgVhGn5XcfSDZnnVpXXrPDAqEDCUVVZj/2Mup/LseNr4miY+QcojyRETh
-Ecq0FVqgRvW2zRxqWxEPpLyzZGOcwAcW2jGO8XWPsqN4wAWO3WlpYT3unVK833Cx
-0wkdFIPbkEE1xKaJiskNYGgDuHu4tzGhOHSKOMzo7HvMaYsNgNChx/x4HYyTgkCf
-4r5zo4+CnOqRZ58STiV9AsOgg6mR8m6h/E9GNWpU0VWKm8hnklP+TiMnW/AKby0B
-hUKXFJMyhrNBOQXyj1LTqM5Q97+SNstOfqutKZgdD8mZcL4ec+DzelCH4Gyc15Yx
-gII/z3YwBUw/SGh+diCtWY2eJAHDkDFMGgiidVSzeKjRAgCDGi5+SYymLzLDyQey
-BpgbxunC2zHKsEhH1ZfOxyEOsW7UzgN6axQQ5DdzzKc1ke6OBl0YD1pRsoWEXudi
-b8LlNNI4oOaMiW3gsptJGPCOvXBrMm7wuzLrXMMRD0bh969KBJ7YQjUVkrAOsGTq
-DnqoXILa0ljsdazxe2Xk8GqrGAQ1XIvO7elbUlV/0nlAj4nzzx1m8f0n9nZ1aEZe
-Mv46+si6K/DgdUyGqcxOw6iZ00Fj6ha4yx7HJjZHHwFBXqJEPtdXYJKYa1AmHwID
-AQABo1AwTjAdBgNVHQ4EFgQUeBJp2fBea1UzyKirF1VYDsYddiwwHwYDVR0jBBgw
-FoAUeBJp2fBea1UzyKirF1VYDsYddiwwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0B
-AQsFAAOCAgEAfN7KjdqydCuSH6GCziu7Jc+SxLKLqMQc27/SBRWJ54JwDsfRglH3
-zze9j0f/auLKNirbxQG8/CeJO7BtxsPHk2NfKnXUMyIfRH4jlSmuuy0YLH1N3F19
-5GKGyt/ufc4a19l7M8ZseFMee8GXn6uHpVtN88GMKqQOu0AGnxv379ulI/RQ7iC2
-wkFpkT8Anzwd+jxMi5iNYbsHGd1uCyzY1bbNORY/fdX7A27xNjLe2cJc68OUOJQe
-XyfVlH2JyY+qEAXmv5gABafLFOsGmGHaQxZj4+zIdvDX6DGVIKCK7eixwVnKDwHm
-b9yF4ivMWk5gaY0sjezD7bnN2vAN1zXvpmmSu2tc/kOzGXZKoGEUn/4j+tWvvhPn
-wrTonT9soGmm7/LVyG/z950lylZV3XRw/0ZVQeCtQj+b+SjozNjTutzgWiAJ4njm
-Jyaqrj6vHB6vOPySk6AYyu1qTaJsniHR62Hv6WG/eZQalcXJZ8BuwAgdpcgPwdVU
-4IaKyiCjHg7dnrAwPURHfmlvosq+J+8PdD0O2L2aYUUtBS2TezgedSLXBYD4xZFa
-85zsZMlEurHM9o93vfjihyMxUla46o6uNyl32ebaPvLxEj/MyGOwkzAWa0qxy74J
-aQjWl+dWivXNFfE/yD/7yVF+X9YdlSFGCRyIfkUwy9hxLqkUdXeFgwE=
-----END CERTIFICATE-----
-</ca>
-<cert>
-----BEGIN CERTIFICATE-----
-MIIE5TCCAs2gAwIBAgICEAEwDQYJKoZIhvcNAQELBQAwMTELMAkGA1UEBhMCR0Ix
-DzANBgNVBAgMBkxvbmRvbjERMA8GA1UECgwIV1dXIEx0ZC4wHhcNMTkwMTA2MDc1
-MzQ5WhcNMjkwMTAzMDc1MzQ5WjAUMRIwEAYDVQQDDAl2cG5jbGllbnQwggIiMA0G
-CSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQCmpYnmgBet9aZInpN9DRi6EHjye699
-GFCVuBc4zw8PoZieC/jt/hQ8jrwQC8KVU7g2nuAZNfX1wXJ0hLDKAZWSJhvlAeN1
-/oA9oe/kikatUcijJnfipFJlhPJ8kru+UXH8ypwHoxbxd/2u+KDTjg3dJqNPGUak
-2KlSxHDbS0OWEhBIdn4A2iD7HpDPBO805KfAWVtfQ6Pvy5XHHNm6s796x91hes3Q
-ONY1TLKE+tHFMoTXn3C0a4/DU35Kj1JrZDdER+DrNmxGhInq2CXEGsyw8MwXYHLS
-1H8jvuX8u7yS2QX2cjwzc4PcJFPjEPTtUqPb1Ob+xOYb0bGOEE64+7xNeA+Mk5XL
-i08DJEFAU0hSCnHtz1y/JrXScKaHgtVzDm2TsXal9jO7ikVi3zhC6EcO0T4PFBX0
-lACXUDpXv50WI3ftoIACtO+paxM+wuXCt9ZhC6BZbRBs9EtVUdBDXAsSp+yhb9Jr
-GZzY+GIfmiPY027Cd0FYumspCHgBvUM4D9rCDVSiwr1DqNHqzlrjbwW3mCHLtmjE
-qU5Jg9DBUB3J01AERUgBYE8O8BkmSkuKi0mwgi9LdQ4SbizqVreMip+kXFAwRMg8
-Pw1h/cDUd9G1ZM+bZzHrjp4rdDHK8NAhDgJvxhGuhuwVFpA/LhvJ4tJBWjhJn7j5
-IEHpNWA1xNp1IwIDAQABoyQwIjALBgNVHQ8EBAMCB4AwEwYDVR0lBAwwCgYIKwYB
-BQUHAwIwDQYJKoZIhvcNAQELBQADggIBAFcqTpXcEe83shOnp+nOvGscMT0PwSNo
-ojy5xR9UhHs3ijyJ3DeaCO4xh7V8PTzCTpg3NOs8+19/nAhSr+QBWQKwrQhQ7Uub
-zv8AMXJ4tU1ZAyx0lX8FzUe/GsI8muqosK8F09jnTgGk05yCca9kDVzffGk1mivx
-d6ANRdUkprZV1VPA/eKXBQYstbeYBitPql5anmh54fEvXt7S1SdPATXI/eTzaxtP
-2KyPl7OZDA+mvS0qPFcY+MB2fjjdoyl74BShCJyI5sBCfN6WY6hNQ7meVWa6bCLQ
-EgvrZqh0lkWhy3mKcTL8eZJeF2SoYHQCSY75gQM0gdCODHTvkJJknLVmtzHHTJL3
-gVbdqFo/OiGGpD4XKpChNv//1kkRrwPBG4YDXu2/vsPoZKRVgjpQbNol+z+Ee1y3
-MqGz09aGxC98KUuxrUYwT8fbVVyLm0Fu2O8u+Qz3s6dPkWqD94YGxh2pq9SD5aTl
-/92LaIyqfMlWXj38yDUxjsENsTDtsSrx5cw6+BpB+VMmSuXlIYgE4khiEnYzCXbj
-5rduGmz4t4rhZZaa3n3L+G0sCQUqmnYNAjEMYcKIZvkTI3GoW3s0FROeUL1zLir2
-mdvWmQHTq39p/iBWmMTP/YofQPv8P1TWbKWaKalAf4+fLXHiTL7KHFw4YXXKE1iA
-GI6Ngos0UzHR
-----END CERTIFICATE-----
-</cert>
-<key>
-----BEGIN ENCRYPTED PRIVATE KEY-----
-MIIJljBABgkqhkiG9w0BBQ0wMzAbBgkqhkiG9w0BBQwwDgQITIzpduKGeaYCAggA
-MBQGCCqGSIb3DQMHBAjjV2SrWFzjJQSCCVBzWLc0eFF9x2OmoOddeslRnNd0DdiZ
-eOjsK93BkhlSpBJVLJY7x6DD3JJokgDCFl/sWHjU4zn4C5UogaqhrAIYeFpjx4w4
-4adU8bb3K4WOHAbQk5f+76HWr8LlCb4Ws2x0e8OlVwRBNJKAfumAoODE2ZZ0qitt
-5FeBix1XecaSpFl8J4BoytFD1R1Pf1KHL/iZ9Vh0SGGqE+ElDPOd8+PfsIKy09ZD
-/kMiiItnCF3NwG+s69GJJbGhIPdaas/yqVjtKOdQ8y7VBbrERzZ1mOVWN8zQj4cl
-5whPtMBgSYgkM9UcDNaQbqn/q4yXjPF+mWuZ6EyD4yNwpbroHEna5SAbaMiIHUah
-gXbSDrbFAMPqbpnpG5pUO9xM8YI8VYhlJU6MtzTjNwkmwyCzhV9WAEMBrMO3ZQGc
-FmG9HchAdKO75K7bHaaAZbBvt2LMMg+cvMSFiojKvOKXrC3ntQzHCrGL0IPEmdYK
-r1SIBD42zjsYPEY/MD0aV6eP/8DHbTMhF1oaQxxGLotv2+yVzoI+MTBaBFanQ+41
-7LBSB8oK8uq11vVz4LIp4xC+uyH+qoKE59mPG6QoRKC5GSjIj6J9hbDOS0DcHILg
-S0ebY2s2pSpVfcLZJslzSoI3ArNCs0fdkhZ2wD9/kX9BRAtZQSOsAcNVXc2sQsCj
-zeIZ9V7HlNGdZnG2CAPI5RWe8RSzz1T+IF9rUUD43Hi0csQ6y3IFQEXJtmXArVXo
-F6WoxqpXF3IdvLcnTDX1CK+h+QztSRysiRvWCPbISv84BIlx6OEVu+c0D82D+AUz
-Wf6DRsXIzqFKly/MZNsYG7Sx0t1eHaKaw+SCsWLRdiFsdmL+LUAcqVsJNCshKp6H
-Qlg3w0g9eU//qt2HnE0dx597PeSnyjRYSswt2R5dSaDh6x9KUeXc+kcTJTwxQ66c
-gSopFZyoGOxHLGoCBZV1qGGKbUVnbX8hy3eunVRNVsgOBFhmMYy1kaWajOGIfVSz
-jErclJpCJjuJDnK5L9ipLpQtb2VbVIgVbzwQ+p6AGBU39YO7R/ql4/DUyvo35mMx
-X9tr8uGYRWxkbBJSKZ6FNG0jUI++7goT66vMWb9Sn3Xsczj1J9INMeY4OGwXGZ+3
-VZrVsPMed0IJ4NIYJ0FRVhv7Y04aexJmvHqLUeRdJLk4l9kJNHoKJoleT8IUhThl
-nqP76jFabL3jX9fUpDxPNNoMiz+en4L8bX3dnLlvo8xeLnUaqT63Y+CgRVyVZJSI
-7PUZwGBWFHpuboLTYMgaQK6+UOp/rqtDFAkBXRD0ncSL2KYcy6I5IN9YDcYvTqvU
-N5TsVjftGKCCZAFyCkVqVjfV3uKJAiK4LHJe9J94Aq3lWeaw//gg1UjWrXCRwKuZ
-hO0kOEN7tw7YxOSOEzyQ3+j3TnWrToF/9QrCfY/+tOvwAVmLTD1e/dNTCt/SboaY
-2FGSI0TmPRSewxCT2L9hBgM1wtDdgSofVAwxW9qK+/0JPZm+C6gGc2ipZNdDH4uN
-+5j0zKZ98u7w6xRW23wCV+cnJ7IvtpXZvChFUnwnq9WanJakr/zsNsuhGpVOnD46
-QOZzO3U8VwXwK1yMas0oN7XSTwf0vGZYBaCtKkF7PrLlVeOYaj50jQNXUvfLt+bQ
-c64apwATE3JK2FcaV32m8UPz3bF04uuMIxBldH3Mvyp0X+MXaLERiuefUZwwppp5
-yFYuy+z03asOYeQrG8LSsTGNOgJXPu1Or32GMHlil1s74uOodA5T8XHEmX4Fxuik
-ok2itZL3yo/Sl73AC9yeSr7R9+Hf7SUTBt8AVeNPhmNDSi1AbyYhT1y0G7Dqwxvb
-oG5ZQyKPlquf8a5Xzodq6lPdXJwi8ZLmAuBelAg6A4MJZzMMrhOzQXyiMHVNtwdk
-c3LzES9bKEWgJR1CGR2RWrxUfqV6Y+uC+r5nPU/DZSOjJ0u7kWWvdQiySakxhQhO
-qkT/+PeYcspzB2juDA5kq9s5Votyl05nHoM7L3UdRxzA4IhKawL9lRWu0Q/0Gn4C
-axG2hUi61rxsx/epruIBz/01dlxw6xUpZBviArGxx/Z+QD0e5tKiuNCAOGQ5mm61
-UtopO6vBq6oS4O/xO/xHhYlcLKJ2D3C8v1JTDQXIQ1OY6IaqIXRiaB4/bz9eCByV
-8tEVW8/zs03M9zcxNuRL4tuBU0yhUCTCtGgQGUMnOYl4d1ZvZvUHb3oJqEI3AVOJ
-/tTVni6P3V4TdaF17EyndLZbIz9mRp3Tai4lsZXbRpevKzQfkFvg3vRZKj5Fymp6
-4SypXMKvgAQ/R6m9T8L0/rT90jf4GHhKhbSYXkJmAZQ/yz9eyjP9SFgHk3P0/z0x
-pa8oc50PAumDpz73pLFwYGffZb7yAAb+Uv4bjBgw+UytIWsbWJBmGAZpVOxKCFEX
-NGcflCfANB7FGv72a6fm3cf4IeqS3KGGQalIzOwwtIDWe2SVkp2LSx5JeFFCIjDP
-dKynm1tczPfoL/tUzcoRqI61zVpb3pAzKmrsWnSgA5Zl+LPZq23g5QRjCNeu1xkh
-GXMeXvQ0Q1VfLG4iw7j6zx52qiFy0HTQ8FK6cSA/nJN6/fE/2p+buAKxP5qjsFCP
-+/QQB681rfKGrQV1yh8TKuJ04h5gdxF4sC6cliHdw8daA2y4rQorBjM7F1EF8VmZ
-NcC1cEclv/E3QwOkBJsaom2rw7LkeOHLjqorGAf5eazO2AFZXVVG5yWrNyZWnaYf
-LYrXCk/4yLSexVEgiC81uSQL2uhvkatrUdDi4zV9mMrHKR10w8LVEuXSkS8IK3h5
-ln+HDc+rqUZG0ChHaF/GJ5VpQ6BLcMYNaoc75AuYU2rlSvMWnaR9UdiNVx3nrxld
-/SvNn8K+lFiKCr0J0DiVDztCpGOq4k2JSlCr+C+YxvipRr+VZOzpxx4RvkRFKAq6
-ix0demDcAk+YB6OZP3JAEy/yoiK/f61KiRpv0VVnHRFKyBv6MIyZmXkn5SesXF5C
-aBAV1zRdnV4EHXZy3qKIdvDP/5qp/6WcNI4edkAwr9bl+BqMe+0dy6QcsU9dLeQa
-OcpDZqHOxCXYTtiSIVM5WvSfPI5j6OdXCsrDU0VZOiiKegnGKNhz8Hn1aLZpGmoU
-TkqhRGpXchHSXNsGwT9AWlSJCnEF1dT0OOJzYbIbcwLa3WcKXHADpgfLJJ/KXHDJ
-buf/Epyjpi6dgg==
-----END ENCRYPTED PRIVATE KEY-----
-</key>
--- a/themes/hugo-cloak-email
+++ b/themes/hugo-cloak-email
--- a/themes/hugo-coder
+++ b/themes/hugo-coder