ericxliu-me/blog-draft.md at 45629c54088f3ec706b5736f86cea5081866d33d

Files

Automated Publisher 45629c5408

Hugo Publish CI / build-and-deploy (push) Successful in 1m29s

Details

Generated on: Wed Feb  4 06:18:45 UTC 2026
Source: md-personal repository

2026-02-04 06:18:45 +00:00

6.0 KiB

Raw Blame History

title, date, draft

title	date	draft
Deployment Lessons and My Take on Self-Hosting OpenClaw	2026-02-03	false

Deploying autonomous agents like OpenClaw on a self-hosted Kubernetes cluster offers significantly more control and integration potential than cloud-hosted alternatives. However, moving from a standard SaaS model to running your own intelligence infrastructure introduces several deployment challenges.

Here are the practical lessons learned, organized by the layers of the agentic stack: Environment, Runtime, and Capabilities.

Layer 1: The Environment – Breaking the Sandbox

To move beyond being a chatbot, an agent needs to be able to affect its world. Deep integration starts with networking.

Code execution agents often need to spin up temporary servers—for previews, React apps, or documentation sites. In a standard Kubernetes Pod, these dynamic ports (like 3000, 8080, etc.) are isolated inside the container network namespace.

To securely expose these arbitrary ports, I deployed a lightweight Nginx sidecar alongside the main OpenClaw agent. This avoids the complexity and latency of dynamically updating Ingress resources.

The Nginx configuration handling the routing logic:

server {
    listen 80;
    server_name ~^(?<port>\d+)\.agent\.mydomain\.com$;

    location / {
        proxy_pass http://localhost:$port;
        proxy_set_header Host $host;
    }
}

This configuration uses a regex-based server block to capture the port from the subdomain (e.g., 3000.agent.mydomain.com) and proxies traffic to that port on localhost. Since containers in the same Pod share a network namespace, localhost connectivity is seamless.

For this to work effectively, the agent must be aware of its environment. I updated OpenClaw's system prompts to understand this pattern: "If you start a server on port X, the external URL is https://X.agent.mydomain.com". This allows the agent to provide valid, clickable links for its generated applications.

Layer 2: The Runtime – Agility and Persistence

Once the agent allows for external connectivity, the next challenge is agility. Self-hosting often requires customizations that haven't yet been merged upstream.

Self-hosting often requires customizations that haven't yet been merged upstream. For example, I needed a custom OAuth flow for Google's internal APIs.

Instead of maintaining a forked Docker image, I used a Kubernetes ConfigMap to inject the necessary TypeScript plugin at runtime. The file is mounted directly into the container at /app/extensions/google-antigravity-auth/index.ts.

kind: ConfigMap
metadata:
  name: openclaw-patch-antigravity
data:
  index.ts: |
    import { createHash, randomBytes } from "node:crypto";
    // ... custom OAuth implementation ...
    export default antigravityPlugin;

This approach allows for rapid iteration on patches without rebuilding container images for every change.

However, two operational realities became clear during this process:

Debugging is Standard: When the agent fails (e.g., your custom patch throws an error), it behaves like any other application. Standard debugging tools like kubectl logs and strace remain the most effective way to diagnose issues.
Persistent Storage Matches Tooling: Just as code needs injection, tools need persistence. I had to explicitly mount a volume for Homebrew (.linuxbrew) so that tools installed by me or the agent didn't vanish on pod restart. Agents need long-term memory on their filesystem as much as in their context window.

Layer 3: The Capabilities – Skills over Abstractions

With the infrastructure (Layer 1) and runtime (Layer 2) established, we move to the application logic: how the agent actually does work.

While the industry chases complex abstractions like the Model Context Protocol (MCP), I found that simple, text-based "Skills" offer a superior workflow. I recently created a Gitea skill simply by exposing the tea CLI documentation to the agent.

This approach aligns with the UNIX philosophy: small, simple tools that do one thing well. MCP servers often clutter the context window and impose significant development overhead. A well-structured "Skill"—essentially a localized knowledge base for a CLI—is cleaner and faster to implement. I predict that these lightweight Skills will eventually replace heavy MCP integrations for the majority of use cases.

There is one current limitation: Gemini models lack specific post-training for these custom skills. The agent doesn't always intuitively know when to reach for a specific tool. Also, remember that granting the agent access to CLI tools like kubectl or tea (Gitea CLI) enables it to perform operations directly, transforming it from a text generator to a system operator. My agent can now open Pull Requests on my self-hosted Gitea instance, effectively becoming a contributor to its own config repo.

The Payoff: Why This Complexity Matters

Why go through this trouble of sidecars, config patches, and custom skills?

My previous AI workflows relied on standard chatbots via interfaces like Open-WebUI. The friction in that model is the "all-or-nothing" generation. LLMs are stochastic; regenerating an entire file to change three lines is inefficient and risky.

OpenClaw's or agentic tools (such as Cursor or Antigravity) killer feature is partial editing. The ability to iteratively improve a stable codebase or document without regenerating the entire file is the missing link for AI-assisted development. We need to treat code as a living document, not a chat response.

When combined with tools like Obsidian that I already use as my second brain for persistent knowledge management, this model provides both the long-term memory and the granular control necessary for complex projects.

References

OpenClaw Documentation: https://docs.openclaw.org
Kubernetes Flux CD: https://fluxcd.io/
Nginx Regex Server Names: https://nginx.org/en/docs/http/server_names.html

6.0 KiB Raw Blame History Unescape Escape