Skip to content

fix(showcase): @ag-ui 0.0.55 currency, reasoning ports, google-adk demos#5348

Open
jpr5 wants to merge 37 commits into
mainfrom
feat/showcase-fleet-agui-currency-reasoning
Open

fix(showcase): @ag-ui 0.0.55 currency, reasoning ports, google-adk demos#5348
jpr5 wants to merge 37 commits into
mainfrom
feat/showcase-fleet-agui-currency-reasoning

Conversation

@jpr5

@jpr5 jpr5 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Combined landing of three browser-verified showcase work waves (37 commits, per-integration grouping preserved), plus a full multi-round code review with all mandatory fixes folded in.

Wave 1 — @Ag-ui currency (9/9 integrations). Bump @ag-ui/* frontend deps to exact 0.0.55 for: llamaindex, agno, claude-sdk-python, pydantic-ai, strands, claude-sdk-typescript, ms-agent-python, ms-agent-dotnet, google-adk. Exact pins land across 13 package.json files.

Wave 2 — reasoning emission ports (4/4, ref #76). Port reasoning-message emission to ag2, crewai-crews, langroid, and spring-ai (each also bumped to @ag-ui 0.0.55). Adds per-integration reasoning_agent (Python) / ReasoningController (spring-ai Java) and wires the copilotkit route.

Wave 3 — google-adk demo parity (3/4). Port hitl, threadid-frontend-tool-roundtrip, and gen-ui-interrupt demos to google-adk for parity with the gold reference, plus d6/aimock fixtures. The interrupt-headless demo is intentionally kept not_supported (needs aimock customEvents support + an ADK interrupt route — tracked as a known upstream gap).

Code review

A 5-round unbiased multi-agent CR loop ran against the combined diff and converged at zero mandatory findings; the bucket-(c) promotion audit came back clean. Key fixes folded into the branch:

  • Client error-leak hardening in the crewai, langroid, and spring-ai reasoning routes — internal exception details no longer leak to the client, with errorId correlation between client response and server logs (also applied to google-adk).
  • Protocol-correct reasoning error paths — on failure the emitters now close any open frames, emit a generic RUN_ERROR, and never emit RUN_FINISHED after RUN_ERROR; verified against @ag-ui/client verifyEvents semantics, with red-green tests.
  • x-aimock-context propagation across spring-ai's async hop, so fixture replay stays correct through the thread boundary.
  • Bounded reasoning executor (no unbounded thread growth) and parse-failure observability (failures surface in logs instead of being swallowed).
  • Full multi-turn history threading in the reasoning agents — prior turns are now forwarded to the LLM, with single-turn byte-equality preserved so existing aimock fixtures stay valid; covered by tests.
  • Phantom deps declared@copilotkit/shared/@ag-ui deps that were imported but undeclared in claude-sdk-python and ms-agent-dotnet/python are now in their package.json.
  • @copilotkit/core "latest" overrides pinned to 1.59.4.
  • Reasoning parity token-identity test across the 3 ported Python integrations (ag2/crewai-crews/langroid) so their reasoning streams can't silently drift.
  • Ratchets tightened/updated: shadow-collision ceiling ratcheted down 151 → 123 (actual count), duplicate-fixture ceiling 288 → 290 (new google-adk fixtures reuse standard prebuilt-probe pills, runtime-disambiguated by demo route), and the validate-pins baseline drops 57 → 48 (currency bumps retire 9 exact-pin FAIL lines).

The final commit is the style: auto-fix formatting bot commit (formatting on the new Python test files).

Test plan

  • Per-integration browser verification (each touched integration's demos exercised end-to-end) with aimock fixtures added/updated.
  • crewai pytest — 113/113 pass.
  • showcase/scripts vitest — 1780/1780 pass.
  • validate-parity MUST checks — 19/19 pass, 0 fail.
  • validate-pins ratchet — exits 0 against the updated baseline (FAIL-set hash matched).
  • D6 gen-ui-custom cells verified live (google-adk + agno).
  • CI fully green — per-integration Depot Docker build-check, Python unit tests (3.10 + 3.12), commitlint, format, oxlint, Validate Showcase, production-pinning lint.

Follow-ups

Consolidated follow-up ledger (bucket c/d items): https://www.notion.so/copilotkit/37b3aa38185281e5b871d0b907aaef71

jpr5 added 18 commits June 9, 2026 11:30
(cherry picked from commit d22f7e6ab585fa77c31d7f6b567a296b3e815a29)
(cherry picked from commit befb301b086e87b1d3317f78301fac1232945240)
(cherry picked from commit 776e91445a7ca47f584f9ca75a22754b1302c91b)
(cherry picked from commit 4aa62ca567d8150c50309ca5290dcdc91f2f6c71)
(cherry picked from commit 30540f6c7ae1bef7b1ada449e9d0c83782b438d0)
(cherry picked from commit 9996a01c17432d2a5fc508884156ffad11269c4b)
(cherry picked from commit 6254bdad0b54a7930334129c715a1f18176fb094)
(cherry picked from commit aa9f7e5e99085383ce60859c9c37692211ba8279)
(cherry picked from commit 09c22e070ac60b936a4a91a0fd3f34879604de77)
(cherry picked from commit 4b27db5d781f68e1658955bcd23f667e63d400b3)
…i-crews)

(cherry picked from commit 0286f413d2ceb52744c917eb9fe9fdc5f28011f2)
…oid)

(cherry picked from commit 7c3d02d92e55d1e9a5cb89d182658620c0eed99f)
…g-ai)

Add a dedicated Spring/Java ReasoningController (/reasoning/) that
reimplements the ag2 reasoning_agent.py BEHAVIOR: it makes a direct
streaming chat-completions call, reads the native delta.reasoning_content
channel (with a <reasoning>...</reasoning> regex fallback), and emits
RUN_STARTED -> REASONING_MESSAGE_START/CONTENT/END -> TEXT_MESSAGE_*
-> RUN_FINISHED so the CopilotKit reasoning slot mounts
[data-testid="reasoning-block"].

Spring AI's ChatClient drops delta.reasoning_content and the AG-UI Java
SDK has no REASONING_MESSAGE_* event types (only THINKING_*, which
@ag-ui/client drops), so the controller manages its own SseEmitter and
writes the reasoning frames as raw JSON matching the @ag-ui/client 0.0.55
wire schema. Header forwarding (x-aimock-context) rides the existing
WebClientConfig exchange filter. Wire route reasoning-custom/-default (plus
legacy aliases) to /reasoning/, mirroring ag2's reasoningAgentNames, and
bump @ag-ui/client ^0.0.43 -> 0.0.55 for REASONING_MESSAGE_* decode support.

(cherry picked from commit 4d183371c489013f0a7bcce1a447078164974aef)
(cherry picked from commit 9527ec763f9093d96fe614c79c00769ceb20a1a8)
…adk (parity)

(cherry picked from commit 269cce835e9e92743b565ffd6ace91f73419c6dc)
…parity)

(cherry picked from commit 6cf8fb2f3f7008be82fc2dd012456907b2572265)
…s aimock customEvents + ADK interrupt route)

(cherry picked from commit aac16af8ee32172e6886a83bf66931c54091f180)
…ure ceiling

The currency bumps (@ag-ui/* → 0.0.55) retire 9 exact-pin FAIL lines, so the
validate-pins ratchet baseline drops 57 → 48 (new FAIL-set hash). The W3
google-adk demo ports (hitl / gen-ui-interrupt / threadid) add per-demo
fixtures that reuse google-adk's standard prebuilt-probe pills, raising the
known-duplicate-match-key ceiling 288 → 290 (runtime-disambiguated by demo
route, same documented pattern as prior bumps).
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
chat-with-your-data Ready Ready Preview, Comment Jun 10, 2026 12:28am
docs Ready Ready Preview, Comment Jun 10, 2026 12:28am
form-filling Ready Ready Preview, Comment Jun 10, 2026 12:28am
research-canvas Ready Ready Preview, Comment Jun 10, 2026 12:28am
travel Ready Ready Preview, Comment Jun 10, 2026 12:28am

Request Review

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

📣 Social Copy Generator

Generate social media copies (Twitter/X, LinkedIn, Blog Post) for this PR using Claude.

  • Generate social media copies

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Production Digest-Pinning Audit

All 28 services digest-pinned.

Run 2026-06-09 17:29:48 PDT — 0 finding(s).

jpr5 and others added 18 commits June 9, 2026 12:56
…langroid/spring-ai routes

The POST catch block in three integration runtime routes returned raw
error internals to HTTP clients via `{ error: err.message, stack: err.stack }`,
leaking internal paths, dependency versions, and stack traces, and used an
unsafe `as Error` cast. Ported the hardened ag2 reference shape: safe
`error instanceof Error ? error : new Error(String(error))` normalization,
structured server-side `console.error` (message + stack under a generated
`errorId`), and a client response of only `{ error: "internal runtime error", errorId }`.

Client-visible shape changes (POST /api/copilotkit catch block):
- showcase/integrations/crewai-crews/src/app/api/copilotkit/route.ts:
  `{ error: err.message, stack: err.stack }` -> `{ error: "internal runtime error", errorId }`
- showcase/integrations/langroid/src/app/api/copilotkit/route.ts:
  `{ error: err.message, stack: err.stack }` -> `{ error: "internal runtime error", errorId }`
- showcase/integrations/spring-ai/src/app/api/copilotkit/route.ts:
  `{ error: err.message, stack: err.stack }` -> `{ error: "internal runtime error", errorId }`
…ms-agent-dotnet/python (was latest)

Both ms-agent-dotnet and ms-agent-python pinned the @copilotkit/web-inspector >
@copilotkit/core override to the "latest" dist-tag, violating the showcase
exact-pin discipline (canonicalCopilotKitVersion=1.59.4). validate-pins.ts does
not scan override blocks, so this silently escaped the gate and broke
reproducible builds. Changed all 4 occurrences to "1.59.4":

  - ms-agent-dotnet/package.json: overrides["@copilotkit/web-inspector"]["@copilotkit/core"]
  - ms-agent-dotnet/package.json: pnpm.overrides["@copilotkit/web-inspector>@copilotkit/core"]
  - ms-agent-python/package.json: overrides["@copilotkit/web-inspector"]["@copilotkit/core"]
  - ms-agent-python/package.json: pnpm.overrides["@copilotkit/web-inspector>@copilotkit/core"]

Lockfiles already resolved @copilotkit/core to 1.59.4 (the latest tag pointed
there at lock time) and do not persist the override map, so no regeneration was
needed.
…ng content

_extract_user_input in the ag2, crewai-crews, and langroid reasoning agents
documented a str return but passed AG-UI message content straight through.
Multimodal content can be a list of parts, which would flow unmodified into
the single caller in each file (_run_reasoning_agent ->
messages=[{"role": "user", "content": user_input}]) sent to the OpenAI
chat-completions API. Coerce: str passes through, a list joins its text
parts (dict or attr form), anything else falls back to str().

Callers (one per file):
- ag2/src/agents/reasoning_agent.py:104 -> chat message at :119
- crewai-crews/src/agents/reasoning_agent.py:108 -> chat message at :123
- langroid/src/agents/reasoning_agent.py:103 -> chat message at :118
…ic client errors, protocol-correct terminal events

The four reasoning implementations (ag2/crewai-crews/langroid Python +
spring-ai Java) had divergent, unsafe error paths. This aligns them to
the protocol semantics proven from the installed @ag-ui/client
verifyEvents state machine.

Protocol evidence (@ag-ui/client verifyEvents, function `L`, dist mjs):
- On RUN_ERROR the verifier sets the errored flag (`c=!0`). The guard at
  the top of every subsequent event throws:
  "Cannot send event type '<t>': The run has already errored with
  'RUN_ERROR'. No further events can be sent."
  => RUN_ERROR is TERMINAL; a RUN_FINISHED (or anything) after it is a
  protocol violation.
- RUN_FINISHED explicitly rejects open frames (checks the text-message
  map `a.size`, tool-call map `o.size`, step map `u.size`). RUN_ERROR
  does NOT run those checks, but the apply layer (`I`) otherwise leaves a
  half-built REASONING/TEXT message in client state when a *_START has no
  matching *_END. So the clean contract is: close any open frame with its
  matching *_END, then emit RUN_ERROR as the sole terminal event.

Python x3 (ag2, crewai-crews, langroid — kept byte-identical except the
pre-existing FastAPI title literal):
- except Exception: log server-side via
  `print(f"[reasoning] run failed: {exc!r}", file=sys.stderr, flush=True)`
  + `traceback.print_exc(file=sys.stderr)` (previously NO server log).
- emit a generic client message
  `agent run failed: {type(exc).__name__} (see server logs)` instead of
  the raw `str(exc)` (which can carry provider URLs / request details
  into the SSE stream to the browser).
- track `reasoning_msg_id` / `text_msg_id`; close the open frame with its
  matching *_END before RUN_ERROR. No RUN_FINISHED is emitted.
- `except asyncio.CancelledError: raise` is preserved.

Java (spring-ai ReasoningController.runReasoning catch):
- removed the RUN_FINISHED that followed RUN_ERROR (protocol violation
  per the evidence above) — now matches the Python siblings (RUN_ERROR
  only).
- track `reasoningMsgId` / `textMsgId`; close the open frame with its
  matching *_END before RUN_ERROR.
- existing generic message + `log.error("Reasoning run failed", e)` kept.

Affected call sites: the three Python `_run_reasoning_agent` generators
and the Java `runReasoning` async task — the four terminal/catch blocks
that emit RUN_ERROR.

Verify: `python3 -m py_compile` clean on all three Python files; the
crewai-crews pytest suite (which import-mounts the reasoning sub-app)
passes 102/102. Java: no JDK/maven available locally; string/comment-
aware brace-balance check passes and the edit is javac-parseable.
…hread hop (spring-ai)

ReasoningController.run dispatched runReasoning via
CompletableFuture.runAsync onto a pre-existing ForkJoinPool.commonPool()
worker. AimockHeaderContext is an InheritableThreadLocal, which only
copies the parent value at child-thread CREATION time — a pooled worker
predates the request, so it snapshots an empty map and the outbound
chat-completions WebClient filter reads no x-aimock-context. On the D6
verification path this yields aimock strict-mode 503.

Capture the headers on the request thread and re-establish them on the
worker via AimockHeaderContext.capture()/runWith(...), mirroring the
canonical PropagatingLocalAgent idiom. Added java.util.Map import.

Call-site enumeration:
- ReasoningController.run (~L129): the one runAsync dispatch site — FIXED.
- No other thread hops in ReasoningController: runReasoning runs entirely
  on the runAsync worker; stream.toIterable() blocks on that same worker;
  the WebClient exchange filter reads AimockHeaderContext.get() at
  exchange time on the now-bound worker. No further hazard.
- Sibling @RestControllers route async dispatch through AgUiService ->
  PropagatingLocalAgent, which already does capture/runWith; this
  controller was the lone bespoke runAsync that bypassed that path.
…of commonPool

ReasoningController.run() dispatched runReasoning() via
CompletableFuture.runAsync(Runnable) (call site ~line 129), which uses
ForkJoinPool.commonPool(). runReasoning() then blocks its worker thread
for the entire streaming chat-completions call at stream.toIterable()
(~line 189). With the common pool sized to the CPU count, concurrent
reasoning requests could exhaust it and starve unrelated parallel work
in the JVM.

Add a dedicated bounded executor (fixed pool of 4 named daemon threads)
as a controller field and pass it to the two-arg
CompletableFuture.runAsync(Runnable, Executor) at the run() call site so
blocking reasoning runs no longer occupy commonPool workers. Daemon
threads keep JVM shutdown clean; a @PreDestroy shuts the pool down on
context teardown. Behavior is otherwise identical (no reactive refactor).
…easoning stream

Per-chunk JSON parse errors in the streaming loop were swallowed at debug
level (off by default). A systematic format change would drop every chunk,
leaving fullText/nativeReasoning empty while the run "succeeded" with an
empty assistant turn and zero operator signal.

Track a parse-failure counter (and last error) in the loop. After the loop,
if no usable content was produced (empty fullText AND empty nativeReasoning)
AND parseFailures > 0, emit one log.warn with the failure count and last
error. The per-chunk debug line is unchanged; the success and partial-parse
paths are untouched (a stream that produced any content stays as before).

Call site: streamReasoning(...) SSE chunk loop in ReasoningController — the
sole consumer of the chat-completions stream and the only producer of
fullText/nativeReasoning, so this is the only place the empty-output signal
can be surfaced.
…nts (parity with agno reference)

The four custom reasoning backends built the chat-completions request from
only the LAST user message, discarding all prior turns so follow-up
questions lost their context. The agno reference threads full history via
Agno's Agent; these now match.

Call sites switched from single-turn extraction to full-history mapping:
  - showcase/integrations/ag2/src/agents/reasoning_agent.py
  - showcase/integrations/crewai-crews/src/agents/reasoning_agent.py
  - showcase/integrations/langroid/src/agents/reasoning_agent.py
      `_extract_user_input` -> `_to_chat_messages` (+ `_coerce_content`):
      system prompt first, then every prior user/assistant turn in order;
      tool/system input messages skipped. The three files stay token-for-
      token identical (only docstrings/fixture-name comments differ).
  - showcase/integrations/spring-ai/.../ReasoningController.java
      `extractUserInput` -> `buildRequestBody(List<BaseMessage>)`; same
      shape, built on the request thread and passed to the async worker.

CRITICAL invariants preserved:
  - A single user-message input yields EXACTLY `[{system}, {user: <text>}]`
    (byte-equal to the old path) so aimock D6 fixtures replay unchanged.
  - Empty / no-user-message input yields `[{system}, {user: ""}]` (an empty
    user turn), matching prior behaviour.

Adds red-green coverage (crewai-crews/tests/python/test_reasoning_history.py)
pinning the single-turn byte-equality, multi-turn ordering, tool/system
skipping, empty-input fallback, and multimodal/None content coercion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant