deploy: 45629c5408
This commit is contained in:
@@ -10,7 +10,7 @@ This post documents the final setup, the hotfix script that keeps LiteLLM honest
|
||||
<a class=heading-link href=#why-open-webui-broke><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><ol><li><strong>Wrong API surface.</strong> <code>/v1/chat/completions</code> still rejects <code>type: "web_search"</code> with <code>Invalid value: 'web_search'. Supported values are: 'function' and 'custom'.</code></li><li><strong>LiteLLM tooling gap.</strong> The OpenAI TypedDicts in <code>litellm/types/llms/openai.py</code> only allow <code>Literal["function"]</code>. Even if the backend call succeeded, streaming would crash when it saw a new tool type.</li><li><strong>Open WebUI assumptions.</strong> The UI eagerly parses every tool delta, so when LiteLLM streamed the raw <code>web_search_call</code> chunk, the UI tried to execute it, failed to parse the arguments, and aborted the chat.</li></ol><p>Fixing all three required touching both the proxy configuration and the LiteLLM transformation path.</p><h2 id=step-1--route-gpt5-through-the-responses-api>Step 1 – Route GPT‑5 Through the Responses API
|
||||
<a class=heading-link href=#step-1--route-gpt5-through-the-responses-api><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>LiteLLM’s Responses bridge activates whenever the backend model name starts with <code>openai/responses/</code>. I added a dedicated alias, <code>gpt-5.2-search</code>, that hardcodes the Responses API plus web search metadata. Existing models (reasoning, embeddings, TTS) stay untouched.</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-yaml data-lang=yaml><span style=display:flex><span><span style=color:#8b949e;font-style:italic># proxy-config.yaml (sanitized)</span><span style=color:#6e7681>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>LiteLLM’s Responses bridge activates whenever the backend model name starts with <code>openai/responses/</code>. I added a dedicated alias, <code>gpt-5.2-search</code>, that hardcodes the Responses API plus web search metadata. Existing models (reasoning, embeddings, TTS) stay untouched.</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none><code class=language-yaml data-lang=yaml><span style=display:flex><span><span style=color:#8b949e;font-style:italic># proxy-config.yaml (sanitized)</span><span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#7ee787>model_list</span>:<span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span>- <span style=color:#7ee787>model_name</span>:<span style=color:#6e7681> </span><span style=color:#a5d6ff>gpt-5.2-search</span><span style=color:#6e7681>
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span><span style=color:#7ee787>litellm_params</span>:<span style=color:#6e7681>
|
||||
@@ -25,7 +25,7 @@ This post documents the final setup, the hotfix script that keeps LiteLLM honest
|
||||
</span></span></span><span style=display:flex><span><span style=color:#6e7681> </span><span style=color:#7ee787>country</span>:<span style=color:#6e7681> </span><span style=color:#a5d6ff>US</span><span style=color:#6e7681>
|
||||
</span></span></span></code></pre></div><p>Any client (Open WebUI included) can now request <code>model: "gpt-5.2-search"</code> over the standard <code>/v1/chat/completions</code> endpoint, and LiteLLM handles the Responses API hop transparently.</p><h2 id=step-2--mask-web_search_call-chunks-inside-litellm>Step 2 – Mask <code>web_search_call</code> Chunks Inside LiteLLM
|
||||
<a class=heading-link href=#step-2--mask-web_search_call-chunks-inside-litellm><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>Even with the right API, LiteLLM still needs to stream deltas Open WebUI can digest. My <a href=https://ericxliu.me/hotfix.py class=external-link target=_blank rel=noopener>hotfix.py</a> script copies the LiteLLM source into <code>/tmp/patch/litellm</code>, then rewrites two files. This script runs as part of the Helm release’s init hook so I can inject fixes directly into the container filesystem at pod start. That saves me from rebuilding and pushing new images every time LiteLLM upstream changes (or refuses a patch), which is critical while waiting for issue #13042 to land. I’ll try to upstream the fix, but this is admittedly hacky, so timelines are uncertain.</p><ol><li><strong><code>openai.py</code> TypedDicts</strong>: extend the tool chunk definitions to accept <code>Literal["web_search"]</code>.</li><li><strong><code>litellm_responses_transformation/transformation.py</code></strong>: intercept every streaming item and short-circuit anything with <code>type == "web_search_call"</code>, returning an empty assistant delta instead of a tool call.</li></ol><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=display:flex><span><span style=color:#8b949e;font-style:italic># Excerpt from hotfix.py</span>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>Even with the right API, LiteLLM still needs to stream deltas Open WebUI can digest. My <a href=https://ericxliu.me/hotfix.py class=external-link target=_blank rel=noopener>hotfix.py</a> script copies the LiteLLM source into <code>/tmp/patch/litellm</code>, then rewrites two files. This script runs as part of the Helm release’s init hook so I can inject fixes directly into the container filesystem at pod start. That saves me from rebuilding and pushing new images every time LiteLLM upstream changes (or refuses a patch), which is critical while waiting for issue #13042 to land. I’ll try to upstream the fix, but this is admittedly hacky, so timelines are uncertain.</p><ol><li><strong><code>openai.py</code> TypedDicts</strong>: extend the tool chunk definitions to accept <code>Literal["web_search"]</code>.</li><li><strong><code>litellm_responses_transformation/transformation.py</code></strong>: intercept every streaming item and short-circuit anything with <code>type == "web_search_call"</code>, returning an empty assistant delta instead of a tool call.</li></ol><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none><code class=language-python data-lang=python><span style=display:flex><span><span style=color:#8b949e;font-style:italic># Excerpt from hotfix.py</span>
|
||||
</span></span><span style=display:flex><span>tool_call_chunk_original <span style=color:#ff7b72;font-weight:700>=</span> (
|
||||
</span></span><span style=display:flex><span> <span style=color:#a5d6ff>'class ChatCompletionToolCallChunk(TypedDict): # result of /chat/completions call</span><span style=color:#79c0ff>\n</span><span style=color:#a5d6ff>'</span>
|
||||
</span></span><span style=display:flex><span> <span style=color:#a5d6ff>' id: Optional[str]</span><span style=color:#79c0ff>\n</span><span style=color:#a5d6ff>'</span>
|
||||
@@ -37,7 +37,7 @@ This post documents the final setup, the hotfix script that keeps LiteLLM honest
|
||||
</span></span><span style=display:flex><span><span style=color:#ff7b72;font-weight:700>...</span>
|
||||
</span></span><span style=display:flex><span><span style=color:#ff7b72>if</span> tool_call_chunk_original <span style=color:#ff7b72;font-weight:700>in</span> content:
|
||||
</span></span><span style=display:flex><span> content <span style=color:#ff7b72;font-weight:700>=</span> content<span style=color:#ff7b72;font-weight:700>.</span>replace(tool_call_chunk_original, tool_call_chunk_patch, <span style=color:#a5d6ff>1</span>)
|
||||
</span></span></code></pre></div><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=display:flex><span>added_block <span style=color:#ff7b72;font-weight:700>=</span> <span style=color:#a5d6ff>""" elif output_item.get("type") == "web_search_call":
|
||||
</span></span></code></pre></div><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none><code class=language-python data-lang=python><span style=display:flex><span>added_block <span style=color:#ff7b72;font-weight:700>=</span> <span style=color:#a5d6ff>""" elif output_item.get("type") == "web_search_call":
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> # Mask the call: Open WebUI should never see tool metadata
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> action_payload = output_item.get("action")
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff> verbose_logger.debug(
|
||||
@@ -57,7 +57,7 @@ This post documents the final setup, the hotfix script that keeps LiteLLM honest
|
||||
</span></span></span><span style=display:flex><span><span style=color:#a5d6ff>"""</span>
|
||||
</span></span></code></pre></div><p>These patches ensure LiteLLM never emits a <code>tool_calls</code> delta for <code>web_search</code>. Open WebUI only receives assistant text chunks, so it happily renders the model response and the inline citations the Responses API already provides.</p><h2 id=step-3--prove-it-with-curl-and-open-webui>Step 3 – Prove It with cURL (and Open WebUI)
|
||||
<a class=heading-link href=#step-3--prove-it-with-curl-and-open-webui><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>I keep a simple smoke test (<code>litellm_smoke_test.sh</code>) that hits the public ingress with and without streaming. The only secrets are placeholders here, but the structure is the same.</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-bash data-lang=bash><span style=display:flex><span><span style=color:#8b949e;font-weight:700;font-style:italic>#!/usr/bin/env bash
|
||||
<span class=sr-only>Link to heading</span></a></h2><p>I keep a simple smoke test (<code>litellm_smoke_test.sh</code>) that hits the public ingress with and without streaming. The only secrets are placeholders here, but the structure is the same.</p><div class=highlight><pre tabindex=0 style=color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none><code class=language-bash data-lang=bash><span style=display:flex><span><span style=color:#8b949e;font-weight:700;font-style:italic>#!/usr/bin/env bash
|
||||
</span></span></span><span style=display:flex><span>set -euo pipefail
|
||||
</span></span><span style=display:flex><span>
|
||||
</span></span><span style=display:flex><span>echo <span style=color:#a5d6ff>"Testing non-streaming..."</span>
|
||||
@@ -86,4 +86,4 @@ This post documents the final setup, the hotfix script that keeps LiteLLM honest
|
||||
2016 -
|
||||
2026
|
||||
Eric X. Liu
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/6100dca">[6100dca]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
<a href="https://git.ericxliu.me/eric/ericxliu-me/commit/45629c5">[45629c5]</a></section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script><script defer src=https://static.cloudflareinsights.com/beacon.min.js data-cf-beacon='{"token": "987638e636ce4dbb932d038af74c17d1"}'></script></body></html>
|
||||
Reference in New Issue
Block a user