Skip to main content

Chat-loop Governance

Policy rules, audit-chain capture, and commerce-memory predicates now fire on every chat-loop tool call. What that changes for customers whose policies were tuned for direct-execute traffic, and how to verify.

5 min read · updated

Chat-loop Governance

This page documents the AgentGate hooks that now fire on the chat-loop session.send() path. The behavior shipped in May 2026 — see the changelog for the dated release entries. If you've been writing policies and audit assertions against session.execute()-only traffic, read this before your next test pass.

The chat-loop is the natural-language tool-use path: session.send("Charge R$5 via Pix") runs a Claude tool-use loop on the backend, picks the tools, and dispatches each one through the runtime. The direct-execute path is session.execute("asaas/create_payment", { value: 100 }) — your code names the tool, the runtime dispatches it.

Until recently, the AgentGate stack — the policy engine, the audit chain, and commerce-memory capture — fired on every session.execute() call but not on the inner tool calls a chat loop drove. That gap is closed. Every tool dispatch from session.send() now runs through the same hooks in the same order as a direct execute. This page covers what fires, when it fires, how it interacts with rules you've written for the direct-execute path, and what to check before your next release.

What this is, and what it isn't

This is a bugfix, not a new feature. The runtime was already advertising the AgentGate stack as the layer between your agent and the upstream provider — see Test Mode for the full ordering of deny-list, policy, mock store, and upstream. The chat-loop path was bypassing that ordering for tool calls dispatched inside session.send(). Customers writing policy rules to deny a tool, queue an approval, or cap a daily spend reasonably expected the rule to fire whether the call came from session.execute() or from inside session.send(). It did not. Now it does.

The hooks themselves are unchanged. The deny-list is the same non-overridable list (Projects → safety rails). The policy engine is the same allowed / approval-required / deny evaluation. The audit chain writes the same events. Commerce-memory capture fires on the same predicates. The change is where in the request lifecycle the hooks run — they now wrap every tool dispatch the chat loop makes, not just the outer session.send() call.

What fires on every chat-loop tool call

For each tool the chat loop picks during session.send(), the runtime evaluates in this fixed order — the same order documented for session.execute():

  1. Deny-list. Fund-transfer caps, NF-e issuance for contested carts, wallet-policy overrides, bulk outbound thresholds, cross-tenant agent-to-agent commitments. Non-overridable. A test cannot mock past these, and an autonomy-level setting cannot disable them.
  2. Policy engine. Your rules at /dashboard/policies evaluate against the tool name plus the input the LLM picked. A deny rule on *delete_* fires regardless of who picked the call. An approval-required rule writes a row to pending_approvals and returns an approval_required tool_result block to the loop — the loop sees the structured envelope and can recover or surface it.
  3. Session mock store (test-mode sessions only). Strict-mode lookup; static or stateful fixture return; counter advance on success. See Test Mode for the full mock-store contract.
  4. Upstream provider call (if not denied, queued, or mocked).
  5. Audit chain append. Every completed tool call writes a tool_call.succeeded or tool_call.failed event into audit_events. The chain HEAD advances; the next anchoring cycle commits it. Existing audit-export queries pick the chat-loop events up without changes — same event names, same shape.
  6. Commerce-memory capture. When the per-tool capture predicate matches (a successful charge, a fulfilled shipment, an approved KYC verification), the runtime writes the canonical commerce-memory row. The predicates are unchanged; what's new is that chat-loop traffic now reaches them.

How chat-loop rules interact with direct-execute rules

Policy rules are tool-and-input scoped, not call-site scoped. A rule that matches asaas/create_payment matches whether the call originated from session.execute("asaas/create_payment", ...) or from inside session.send("Charge R$5 via Pix"). You don't need to duplicate rules; the same rule fires on both paths.

The interaction worth knowing about: a rule tuned to allow direct executes (because your code controls the inputs) but deny chat-loop calls (because the LLM is the dispatcher) doesn't exist in the rule grammar today. If you need that distinction, the conservative pattern is approval-required on the high-blast-radius tool — the chat loop hits the approval queue and your operators decide per call, while direct executes can be configured to skip approval at the SDK call site when your code has its own gate. Talk to support if this is the shape you need.

What to check before your next release

Three checks cover the surface area:

  • Approval queue volume. If you have an approval-required rule on tools the chat loop picks, check /dashboard/approvals after the next chat-loop test run. The queue grows on chat-loop dispatch the same way it grows on direct execute. The approval-replay flow is unchanged — approve the row, the runtime executes the originally-deferred call.

  • Audit-chain growth. Audit-chain volume goes up roughly proportionally to chat-loop tool calls per day. Storage cost on the managed tier is unchanged for the inclusive plans; if you have a custom retention contract, the recent SIEM-export rows include the new events. The tool_call.succeeded / tool_call.failed event names are the same — existing SIEM rules don't need to change.

  • Commerce-memory entries. If a chat-loop call hits a tool the capture predicate matches (Asaas charges, Stone settlements, Melhor Envio fulfillments), a commerce-memory row writes per successful call. Check /dashboard/commerce-memory for entries you might attribute to a workflow you previously thought lived only on the direct-execute path.

Verifying with a fixture session

The cleanest way to confirm is a test-mode session that mocks the relevant tool and asserts on the round-trip. The chat-loop runs the AgentGate hooks identically to direct execute, so a fixture against session.send proves both paths in one round-trip:

import { CodeSpar } from "@codespar/sdk";

const cs = new CodeSpar({ apiKey: process.env.CODESPAR_API_KEY });

const session = await cs.create("user_test", {
  servers: ["asaas"],
  mocks: {
    "asaas/create_payment": { id: "pay_test_42", status: "PENDING" },
  },
});

const result = await session.send("Charge R$5 via Pix");

// The chat loop picked asaas/create_payment, the policy engine
// evaluated it, the mock store returned the fixture, the audit chain
// wrote a tool_call.succeeded event, and commerce-memory captured
// the row — all before this line resolves.
const charge = result.tool_calls.find((tc) => tc.tool_name === "asaas/create_payment");
expect(charge?.output?.id).toBe("pay_test_42");

See Test Mode for the full mocks contract and the type-narrowed guards (isPolicyDenied, isApprovalRequired, etc.) for parsing the tool_result envelopes. See Sessions for the session.send() lifecycle and Tools for canonical tool names.

Where to read more

  • Projects — environment field, safety rails, and how org-level scoping interacts with per-project policies.
  • Test Mode — the strict-mode mock store, the five tool_result codes, and how mocks coexist with policy evaluation.
  • Changelog — dated entries for the three behavior changes (policy on chat loop, audit-chain on chat loop, commerce-memory on chat loop).
Edit on GitHub

Last updated on