fix(showcase): @ag-ui 0.0.55 currency, reasoning ports, google-adk demos#5348
Open
jpr5 wants to merge 37 commits into
Open
fix(showcase): @ag-ui 0.0.55 currency, reasoning ports, google-adk demos#5348jpr5 wants to merge 37 commits into
jpr5 wants to merge 37 commits into
Conversation
(cherry picked from commit d22f7e6ab585fa77c31d7f6b567a296b3e815a29)
(cherry picked from commit befb301b086e87b1d3317f78301fac1232945240)
(cherry picked from commit 776e91445a7ca47f584f9ca75a22754b1302c91b)
(cherry picked from commit 4aa62ca567d8150c50309ca5290dcdc91f2f6c71)
(cherry picked from commit 30540f6c7ae1bef7b1ada449e9d0c83782b438d0)
(cherry picked from commit 9996a01c17432d2a5fc508884156ffad11269c4b)
(cherry picked from commit 6254bdad0b54a7930334129c715a1f18176fb094)
(cherry picked from commit aa9f7e5e99085383ce60859c9c37692211ba8279)
(cherry picked from commit 09c22e070ac60b936a4a91a0fd3f34879604de77)
(cherry picked from commit 4b27db5d781f68e1658955bcd23f667e63d400b3)
…i-crews) (cherry picked from commit 0286f413d2ceb52744c917eb9fe9fdc5f28011f2)
…oid) (cherry picked from commit 7c3d02d92e55d1e9a5cb89d182658620c0eed99f)
…g-ai) Add a dedicated Spring/Java ReasoningController (/reasoning/) that reimplements the ag2 reasoning_agent.py BEHAVIOR: it makes a direct streaming chat-completions call, reads the native delta.reasoning_content channel (with a <reasoning>...</reasoning> regex fallback), and emits RUN_STARTED -> REASONING_MESSAGE_START/CONTENT/END -> TEXT_MESSAGE_* -> RUN_FINISHED so the CopilotKit reasoning slot mounts [data-testid="reasoning-block"]. Spring AI's ChatClient drops delta.reasoning_content and the AG-UI Java SDK has no REASONING_MESSAGE_* event types (only THINKING_*, which @ag-ui/client drops), so the controller manages its own SseEmitter and writes the reasoning frames as raw JSON matching the @ag-ui/client 0.0.55 wire schema. Header forwarding (x-aimock-context) rides the existing WebClientConfig exchange filter. Wire route reasoning-custom/-default (plus legacy aliases) to /reasoning/, mirroring ag2's reasoningAgentNames, and bump @ag-ui/client ^0.0.43 -> 0.0.55 for REASONING_MESSAGE_* decode support. (cherry picked from commit 4d183371c489013f0a7bcce1a447078164974aef)
(cherry picked from commit 9527ec763f9093d96fe614c79c00769ceb20a1a8)
…adk (parity) (cherry picked from commit 269cce835e9e92743b565ffd6ace91f73419c6dc)
…parity) (cherry picked from commit 6cf8fb2f3f7008be82fc2dd012456907b2572265)
…s aimock customEvents + ADK interrupt route) (cherry picked from commit aac16af8ee32172e6886a83bf66931c54091f180)
…ure ceiling The currency bumps (@ag-ui/* → 0.0.55) retire 9 exact-pin FAIL lines, so the validate-pins ratchet baseline drops 57 → 48 (new FAIL-set hash). The W3 google-adk demo ports (hitl / gen-ui-interrupt / threadid) add per-demo fixtures that reuse google-adk's standard prebuilt-probe pills, raising the known-duplicate-match-key ceiling 288 → 290 (runtime-disambiguated by demo route, same documented pattern as prior bumps).
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
📣 Social Copy GeneratorGenerate social media copies (Twitter/X, LinkedIn, Blog Post) for this PR using Claude.
|
Contributor
Production Digest-Pinning AuditAll 28 services digest-pinned. Run 2026-06-09 17:29:48 PDT — 0 finding(s). |
…langroid/spring-ai routes
The POST catch block in three integration runtime routes returned raw
error internals to HTTP clients via `{ error: err.message, stack: err.stack }`,
leaking internal paths, dependency versions, and stack traces, and used an
unsafe `as Error` cast. Ported the hardened ag2 reference shape: safe
`error instanceof Error ? error : new Error(String(error))` normalization,
structured server-side `console.error` (message + stack under a generated
`errorId`), and a client response of only `{ error: "internal runtime error", errorId }`.
Client-visible shape changes (POST /api/copilotkit catch block):
- showcase/integrations/crewai-crews/src/app/api/copilotkit/route.ts:
`{ error: err.message, stack: err.stack }` -> `{ error: "internal runtime error", errorId }`
- showcase/integrations/langroid/src/app/api/copilotkit/route.ts:
`{ error: err.message, stack: err.stack }` -> `{ error: "internal runtime error", errorId }`
- showcase/integrations/spring-ai/src/app/api/copilotkit/route.ts:
`{ error: err.message, stack: err.stack }` -> `{ error: "internal runtime error", errorId }`
…ms-agent-dotnet/python (was latest) Both ms-agent-dotnet and ms-agent-python pinned the @copilotkit/web-inspector > @copilotkit/core override to the "latest" dist-tag, violating the showcase exact-pin discipline (canonicalCopilotKitVersion=1.59.4). validate-pins.ts does not scan override blocks, so this silently escaped the gate and broke reproducible builds. Changed all 4 occurrences to "1.59.4": - ms-agent-dotnet/package.json: overrides["@copilotkit/web-inspector"]["@copilotkit/core"] - ms-agent-dotnet/package.json: pnpm.overrides["@copilotkit/web-inspector>@copilotkit/core"] - ms-agent-python/package.json: overrides["@copilotkit/web-inspector"]["@copilotkit/core"] - ms-agent-python/package.json: pnpm.overrides["@copilotkit/web-inspector>@copilotkit/core"] Lockfiles already resolved @copilotkit/core to 1.59.4 (the latest tag pointed there at lock time) and do not persist the override map, so no regeneration was needed.
…ng content
_extract_user_input in the ag2, crewai-crews, and langroid reasoning agents
documented a str return but passed AG-UI message content straight through.
Multimodal content can be a list of parts, which would flow unmodified into
the single caller in each file (_run_reasoning_agent ->
messages=[{"role": "user", "content": user_input}]) sent to the OpenAI
chat-completions API. Coerce: str passes through, a list joins its text
parts (dict or attr form), anything else falls back to str().
Callers (one per file):
- ag2/src/agents/reasoning_agent.py:104 -> chat message at :119
- crewai-crews/src/agents/reasoning_agent.py:108 -> chat message at :123
- langroid/src/agents/reasoning_agent.py:103 -> chat message at :118
… and reasoning ports
…ic client errors, protocol-correct terminal events
The four reasoning implementations (ag2/crewai-crews/langroid Python +
spring-ai Java) had divergent, unsafe error paths. This aligns them to
the protocol semantics proven from the installed @ag-ui/client
verifyEvents state machine.
Protocol evidence (@ag-ui/client verifyEvents, function `L`, dist mjs):
- On RUN_ERROR the verifier sets the errored flag (`c=!0`). The guard at
the top of every subsequent event throws:
"Cannot send event type '<t>': The run has already errored with
'RUN_ERROR'. No further events can be sent."
=> RUN_ERROR is TERMINAL; a RUN_FINISHED (or anything) after it is a
protocol violation.
- RUN_FINISHED explicitly rejects open frames (checks the text-message
map `a.size`, tool-call map `o.size`, step map `u.size`). RUN_ERROR
does NOT run those checks, but the apply layer (`I`) otherwise leaves a
half-built REASONING/TEXT message in client state when a *_START has no
matching *_END. So the clean contract is: close any open frame with its
matching *_END, then emit RUN_ERROR as the sole terminal event.
Python x3 (ag2, crewai-crews, langroid — kept byte-identical except the
pre-existing FastAPI title literal):
- except Exception: log server-side via
`print(f"[reasoning] run failed: {exc!r}", file=sys.stderr, flush=True)`
+ `traceback.print_exc(file=sys.stderr)` (previously NO server log).
- emit a generic client message
`agent run failed: {type(exc).__name__} (see server logs)` instead of
the raw `str(exc)` (which can carry provider URLs / request details
into the SSE stream to the browser).
- track `reasoning_msg_id` / `text_msg_id`; close the open frame with its
matching *_END before RUN_ERROR. No RUN_FINISHED is emitted.
- `except asyncio.CancelledError: raise` is preserved.
Java (spring-ai ReasoningController.runReasoning catch):
- removed the RUN_FINISHED that followed RUN_ERROR (protocol violation
per the evidence above) — now matches the Python siblings (RUN_ERROR
only).
- track `reasoningMsgId` / `textMsgId`; close the open frame with its
matching *_END before RUN_ERROR.
- existing generic message + `log.error("Reasoning run failed", e)` kept.
Affected call sites: the three Python `_run_reasoning_agent` generators
and the Java `runReasoning` async task — the four terminal/catch blocks
that emit RUN_ERROR.
Verify: `python3 -m py_compile` clean on all three Python files; the
crewai-crews pytest suite (which import-mounts the reasoning sub-app)
passes 102/102. Java: no JDK/maven available locally; string/comment-
aware brace-balance check passes and the edit is javac-parseable.
…hread hop (spring-ai) ReasoningController.run dispatched runReasoning via CompletableFuture.runAsync onto a pre-existing ForkJoinPool.commonPool() worker. AimockHeaderContext is an InheritableThreadLocal, which only copies the parent value at child-thread CREATION time — a pooled worker predates the request, so it snapshots an empty map and the outbound chat-completions WebClient filter reads no x-aimock-context. On the D6 verification path this yields aimock strict-mode 503. Capture the headers on the request thread and re-establish them on the worker via AimockHeaderContext.capture()/runWith(...), mirroring the canonical PropagatingLocalAgent idiom. Added java.util.Map import. Call-site enumeration: - ReasoningController.run (~L129): the one runAsync dispatch site — FIXED. - No other thread hops in ReasoningController: runReasoning runs entirely on the runAsync worker; stream.toIterable() blocks on that same worker; the WebClient exchange filter reads AimockHeaderContext.get() at exchange time on the now-bound worker. No further hazard. - Sibling @RestControllers route async dispatch through AgUiService -> PropagatingLocalAgent, which already does capture/runWith; this controller was the lone bespoke runAsync that bypassed that path.
…of commonPool ReasoningController.run() dispatched runReasoning() via CompletableFuture.runAsync(Runnable) (call site ~line 129), which uses ForkJoinPool.commonPool(). runReasoning() then blocks its worker thread for the entire streaming chat-completions call at stream.toIterable() (~line 189). With the common pool sized to the CPU count, concurrent reasoning requests could exhaust it and starve unrelated parallel work in the JVM. Add a dedicated bounded executor (fixed pool of 4 named daemon threads) as a controller field and pass it to the two-arg CompletableFuture.runAsync(Runnable, Executor) at the run() call site so blocking reasoning runs no longer occupy commonPool workers. Daemon threads keep JVM shutdown clean; a @PreDestroy shuts the pool down on context teardown. Behavior is otherwise identical (no reactive refactor).
…easoning stream Per-chunk JSON parse errors in the streaming loop were swallowed at debug level (off by default). A systematic format change would drop every chunk, leaving fullText/nativeReasoning empty while the run "succeeded" with an empty assistant turn and zero operator signal. Track a parse-failure counter (and last error) in the loop. After the loop, if no usable content was produced (empty fullText AND empty nativeReasoning) AND parseFailures > 0, emit one log.warn with the failure count and last error. The per-chunk debug line is unchanged; the success and partial-parse paths are untouched (a stream that produced any content stays as before). Call site: streamReasoning(...) SSE chunk loop in ReasoningController — the sole consumer of the chat-completions stream and the only producer of fullText/nativeReasoning, so this is the only place the empty-output signal can be surfaced.
…thon (phantom dep)
…nts (parity with agno reference)
The four custom reasoning backends built the chat-completions request from
only the LAST user message, discarding all prior turns so follow-up
questions lost their context. The agno reference threads full history via
Agno's Agent; these now match.
Call sites switched from single-turn extraction to full-history mapping:
- showcase/integrations/ag2/src/agents/reasoning_agent.py
- showcase/integrations/crewai-crews/src/agents/reasoning_agent.py
- showcase/integrations/langroid/src/agents/reasoning_agent.py
`_extract_user_input` -> `_to_chat_messages` (+ `_coerce_content`):
system prompt first, then every prior user/assistant turn in order;
tool/system input messages skipped. The three files stay token-for-
token identical (only docstrings/fixture-name comments differ).
- showcase/integrations/spring-ai/.../ReasoningController.java
`extractUserInput` -> `buildRequestBody(List<BaseMessage>)`; same
shape, built on the request thread and passed to the async worker.
CRITICAL invariants preserved:
- A single user-message input yields EXACTLY `[{system}, {user: <text>}]`
(byte-equal to the old path) so aimock D6 fixtures replay unchanged.
- Empty / no-user-message input yields `[{system}, {user: ""}]` (an empty
user turn), matching prior behaviour.
Adds red-green coverage (crewai-crews/tests/python/test_reasoning_history.py)
pinning the single-turn byte-equality, multi-turn ordering, tool/system
skipping, empty-input fallback, and multimodal/None content coercion.
…se, generic RUN_ERROR, no RUN_FINISHED)
… warn, accurate comments
…ment accuracy + reasoning parity test
…ing docstring mount path
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combined landing of three browser-verified showcase work waves (37 commits, per-integration grouping preserved), plus a full multi-round code review with all mandatory fixes folded in.
Wave 1 — @Ag-ui currency (9/9 integrations). Bump
@ag-ui/*frontend deps to exact0.0.55for: llamaindex, agno, claude-sdk-python, pydantic-ai, strands, claude-sdk-typescript, ms-agent-python, ms-agent-dotnet, google-adk. Exact pins land across 13package.jsonfiles.Wave 2 — reasoning emission ports (4/4, ref #76). Port reasoning-message emission to ag2, crewai-crews, langroid, and spring-ai (each also bumped to
@ag-ui0.0.55). Adds per-integrationreasoning_agent(Python) /ReasoningController(spring-ai Java) and wires the copilotkit route.Wave 3 — google-adk demo parity (3/4). Port
hitl,threadid-frontend-tool-roundtrip, andgen-ui-interruptdemos to google-adk for parity with the gold reference, plus d6/aimock fixtures. Theinterrupt-headlessdemo is intentionally keptnot_supported(needs aimockcustomEventssupport + an ADK interrupt route — tracked as a known upstream gap).Code review
A 5-round unbiased multi-agent CR loop ran against the combined diff and converged at zero mandatory findings; the bucket-(c) promotion audit came back clean. Key fixes folded into the branch:
errorIdcorrelation between client response and server logs (also applied to google-adk).RUN_ERROR, and never emitRUN_FINISHEDafterRUN_ERROR; verified against@ag-ui/clientverifyEventssemantics, with red-green tests.x-aimock-contextpropagation across spring-ai's async hop, so fixture replay stays correct through the thread boundary.@copilotkit/shared/@ag-uideps that were imported but undeclared in claude-sdk-python and ms-agent-dotnet/python are now in theirpackage.json.@copilotkit/core"latest"overrides pinned to1.59.4.validate-pinsbaseline drops 57 → 48 (currency bumps retire 9 exact-pin FAIL lines).The final commit is the
style: auto-fix formattingbot commit (formatting on the new Python test files).Test plan
showcase/scriptsvitest — 1780/1780 pass.validate-parityMUST checks — 19/19 pass, 0 fail.validate-pinsratchet — exits 0 against the updated baseline (FAIL-set hash matched).gen-ui-customcells verified live (google-adk + agno).build-check, Python unit tests (3.10 + 3.12), commitlint, format, oxlint, Validate Showcase, production-pinning lint.Follow-ups
Consolidated follow-up ledger (bucket c/d items): https://www.notion.so/copilotkit/37b3aa38185281e5b871d0b907aaef71