Skip to main content
All posts
Engineering

Session-first: How We Designed the CodeSpar SDK

Sessions as immutable context containers. 6 meta tools instead of 99. Managed auth that handles OAuth and token refresh. Framework adapters for Claude, OpenAI, and Vercel AI SDK. Here's every design decision and why we made it.

FC
Fabiano Cruz
Co-founder, CodeSpar
2026.04.14
18 min

When we started building the CodeSpar SDK, we had 57 MCP servers with 453 tools published on npm. Developers could install any of them and wire them into their agents manually. But the developer experience was painful. You had to discover which servers existed, figure out which tools to use, manage authentication for each provider, and handle the orchestration yourself.

We needed an SDK that made all of this invisible. One import, one session, one line to send a message to a commerce agent.

agent.ts
import { CodeSpar } from "@codespar/sdk";

const codespar = new CodeSpar({ apiKey: "csk_..." });
const session = await codespar.create("user_123", { preset: "brazilian" });
const messages = await session.send("Charge R$150 via Pix and issue the NF-e");

Five lines. That is the target DX. Here is how we got there.

Sessions as context containers

The core abstraction is the Session. When you call codespar.create(userId, options), the SDK creates an immutable context container that bundles together:

  • A specific user (scoped by userId, isolated per organization)
  • A commerce preset (which MCP servers to connect)
  • Auth credentials for each provider (managed, auto-refreshed)
  • Policy rules (budget limits, rate limits, approval gates, time windows)
  • An MCP endpoint for framework-agnostic access
  • Billing context (every tool call is tracked as a billing unit)

Once created, a session does not change. Need different servers? Create a new session. This immutability simplifies debugging, makes sessions safe to pass between functions, and eliminates an entire class of state management bugs.

Sessions are also the billing unit. Every tool call within a session is logged to the session_tool_calls table with input/output payloads, duration, and status. These records drive metered billing via Stripe. This means you can give each customer their own session and bill them precisely for what they used.

session-lifecycle.ts
// Sessions are cheap to create
const session = await codespar.create("user_123", {
  preset: "brazilian",
  budgetLimit: 5000,
});

// Session ID is a stable reference
console.log(session.id);  // "ses_a1b2c3..."

// Check remaining budget
const budget = await session.budget();
console.log(budget.remaining);  // 5000 (nothing spent yet)

// Get the MCP endpoint (for Claude Desktop, Cursor, etc.)
const { url, headers } = session.mcp;
// url: "https://api.codespar.dev/v1/sessions/ses_a1b2c3/mcp"

// Sessions persist server-side - reconnect anytime
const restored = await codespar.session("ses_a1b2c3");

6 meta tools, not 99

The Brazilian preset connects to 6 MCP servers with 99 raw tools between them. Passing all 99 tool definitions to an LLM would consume a significant portion of the context window and confuse the model with too many choices. We measured this: Claude Sonnet with 99 tools spends 40% more tokens reasoning about which tool to call and makes incorrect selections 3x more often than with 6 well-scoped meta tools.

Instead, we expose 6 meta tools that route dynamically to the correct underlying server:

meta-tools.ts
codespar_discover  // Search products, services, payment methods
codespar_checkout  // Cart, pricing, payment processing
codespar_pay       // Direct payment with policy enforcement
codespar_invoice   // NF-e/CFDI issuance, invoicing
codespar_ship      // Shipping quotes, labels, tracking
codespar_notify    // WhatsApp, email, SMS notifications

When an agent calls codespar_pay, the MetaToolExecutor resolves which MCP server handles payments for the current preset, finds the best matching tool on that server using keyword scoring against the input parameters, and executes it. The agent never sees the 99 raw tools. It sees 6 clear categories.

The right number of tools for an agent is the smallest number that covers all the use cases.

Each meta tool definition uses input_schema (not inputSchema) to match the MCP specification. The schema is intentionally loose -- a description string and a flexible params object -- so the LLM can express intent naturally and the MetaToolExecutor handles the mapping to the specific API parameters the underlying server expects.

meta-tool-definition.ts
// How a meta tool is defined internally (MCP-compliant)
{
  name: "codespar_pay",
  description: "Process a payment. Supports Pix, credit card, boleto, and debit.",
  input_schema: {
    type: "object",
    properties: {
      action: {
        type: "string",
        description: "What to do: charge, refund, check_status, list_methods"
      },
      params: {
        type: "object",
        description: "Parameters for the action (amount, currency, method, etc.)"
      }
    },
    required: ["action"]
  }
}

Managed auth

Each MCP server requires different authentication. Zoop uses OAuth2. Nuvem Fiscal uses API keys. Melhor Envio uses OAuth2 with refresh tokens. Managing credentials for 6+ providers per session is a pain developers should not have to deal with.

The SDK's AuthManager handles this:

auth.ts
// API key auth (simple providers like Nuvem Fiscal)
await session.authorize("nuvem_fiscal", { token: "..." });

// OAuth2 flow (providers like Zoop, Melhor Envio)
const url = codespar.authManager.getConnectUrl("zoop");
// redirect user -> handle callback
await codespar.authManager.handleOAuthCallback("zoop", code, userId);

// Multi-credential auth (providers like Omie, Stark Bank)
await session.authorize("omie", {
  appKey: "...",
  appSecret: "...",
});

await session.authorize("stark_bank", {
  privateKey: "...",
  projectId: "...",
});

Tokens are stored per-user and auto-refreshed when they expire. The AuthStore interface is pluggable, so you can use the default in-memory store for development or swap in PostgreSQL, Redis, or any other backend for production.

The key design decision: auth is per-user, not per-session. When a user authorizes Zoop once, every subsequent session for that user inherits the credential. This means the OAuth flow happens once during onboarding, and every session after that just works.

auth-flow.ts
// Typical production flow:
// 1. During onboarding, user connects their providers
const connectUrl = codespar.authManager.getConnectUrl("zoop", {
  redirectUri: "https://yourapp.com/callback/zoop",
  userId: "user_123",
});
// Redirect user to connectUrl...

// 2. Handle the OAuth callback
app.get("/callback/zoop", async (req, res) => {
  await codespar.authManager.handleOAuthCallback(
    "zoop",
    req.query.code as string,
    req.query.state as string  // contains userId
  );
  res.redirect("/dashboard?connected=zoop");
});

// 3. Every future session inherits the credential automatically
const session = await codespar.create("user_123", { preset: "brazilian" });
// Zoop OAuth tokens are already available - no authorize() call needed

// 4. Token refresh happens silently
// If the Zoop token expires mid-session, the AuthManager
// refreshes it before the next API call. Zero developer code.

Framework adapters

The SDK works with any AI framework through adapter packages. Each package converts session tools into the format the framework expects. The adapters are thin: they depend only on @codespar/sdk for types and the session interface. No heavy dependencies.

adapters.ts
// Vercel AI SDK - includes execute() functions
import { getTools } from "@codespar/vercel";
const tools = await getTools(session);
// Returns: Record<string, CoreTool> with execute() wired to session

// Claude API (Anthropic SDK) - native Tool format
import { getTools } from "@codespar/claude";
const tools = await getTools(session);
// Returns: Tool[] with { name, description, input_schema }

// OpenAI - function calling format
import { getTools } from "@codespar/openai";
const tools = await getTools(session);
// Returns: { type: "function", function: { name, description, parameters } }[]

// MCP - no adapter needed, use the session endpoint directly
const { url, headers } = session.mcp;
// Works with Claude Desktop, Cursor, VS Code, any MCP client

The await getTools(session) call is async because it fetches the current tool definitions from the session's connected MCP servers. Tools can change between sessions (different presets, different servers), so the adapter always fetches fresh definitions rather than caching stale ones.

Note: @codespar/claude is for the Anthropic API directly. @codespar/vercel is for the Vercel AI SDK (which can use Anthropic, OpenAI, or any provider underneath). @codespar/mcp generates MCP configuration for IDE-based clients.

Real-time streaming with session.sendStream()

For user-facing applications, waiting 8-15 seconds for a Complete Loop to finish before showing any UI update is unacceptable. session.sendStream() returns a Server-Sent Events stream that emits events as each tool call starts, progresses, and completes:

streaming.ts
const stream = await session.sendStream(
  "Process order #7721: charge R$1,249, issue NF-e, ship via PAC."
);

for await (const event of stream) {
  switch (event.type) {
    case "text_delta":
      // The model's reasoning, streamed token by token
      process.stdout.write(event.data.text);
      break;

    case "tool_call_start":
      // A meta tool is about to execute
      console.log(`Starting: ${event.data.tool_name}`);
      // event.data: { tool_name, call_id, input }
      break;

    case "tool_call_complete":
      // A meta tool finished
      console.log(`Done: ${event.data.tool_name} (${event.data.duration_ms}ms)`);
      // event.data: { tool_name, call_id, output, duration_ms, status }
      break;

    case "error":
      // A tool call or policy check failed
      console.error(`Error: ${event.data.code}`);
      // event.data: { code, message, tool_name?, recoverable }
      break;

    case "done":
      // All tool calls complete, final response ready
      console.log("Session finished:", event.data.usage);
      // event.data: { usage: { tool_calls, tokens, duration_ms } }
      break;
  }
}

The stream protocol is compatible with the standard SSE format, so it works with any HTTP client. The CodeSpar dashboard sandbox uses this exact API to show real-time tool execution progress to users.

One important detail: sendStream() handles the full agentic loop internally. The model receives the tools, reasons about which to call, executes them via MCP, reads the results, and continues until the task is complete. You do not need to implement a tool execution loop on the client side -- the SDK manages it server-side and streams events back.

Policy enforcement built in

Every session has a PolicyBridge that wraps the PolicyEngine. You can set budget limits, rate limits, time windows, and approval gates at session creation:

policies.ts
const session = await codespar.create("user_123", {
  preset: "brazilian",
  budgetLimit: 500,  // USD per session
  policies: [
    { name: "rate-limit", type: "rate-limit", config: { maxPerMinute: 30 } },
    { name: "business-hours", type: "time-window", config: { startHour: 8, endHour: 18 } },
    { name: "high-value-approval", type: "approval-gate", config: {
      threshold: 10000,  // R$ - require human approval above this
      approvers: ["finance@company.com"],
      timeoutMinutes: 30,
    }},
  ],
});

Policy checks happen before every tool execution. Budget usage is tracked per session. If a policy denies an action, the SDK throws a typed PolicyDeniedError or BudgetExceededError that your application can handle gracefully.

The approval gate is particularly important for high-value commerce operations. An AI agent processing a R$50,000 wholesale order should pause and wait for human confirmation. The gate sends a notification (email, Slack, or webhook), waits for approval, and then resumes execution. If the timeout expires without approval, the tool call is denied and the agent receives a clear error explaining why.

Commerce events as triggers

The TriggerManager emits typed events when commerce actions are detected in agent responses:

triggers.ts
codespar.on("payment.completed", (event) => {
  console.log(event.data.amount, event.data.currency);
  // Trigger downstream workflows: send receipt, update CRM, etc.
});

codespar.on("nfe.issued", (event) => {
  console.log(event.data.number, event.data.chave_nfe);
  // Archive the NF-e XML, send DANFE PDF to customer
});

codespar.on("shipping.label_created", (event) => {
  console.log(event.data.tracking_code, event.data.carrier);
  // Update order status in your database
});

// Handle webhooks from providers (async confirmations)
app.post("/webhook", async (req, res) => {
  const events = await codespar.handleWebhook({
    body: req.body,
    headers: req.headers,
  });
  // events: typed array of commerce events that were detected
  for (const event of events) {
    console.log(event.type, event.data);
  }
  res.send("ok");
});

Events are detected by analyzing tool calls and response content using regex patterns that match both English and Portuguese terms. This means a payment confirmation triggers payment.completed whether the agent says "Payment confirmed" or "Pagamento confirmado."

Triggers can also fire server-side. You can configure triggers in the dashboard that run automatically when specific events occur -- for example, auto-issuing an NF-e whenever a payment completes, or sending a WhatsApp notification whenever a shipping label is created. This turns the SDK from a tool execution layer into a full commerce automation engine.

Billing integration

Every tool call is a billing unit. This is a deliberate design choice: billing at the tool call level gives you precise cost attribution per user, per session, per order. There is no ambiguity about what you are paying for.

billing.ts
// Every tool call is logged automatically
// You can query usage at any level of granularity

// Session-level usage
const usage = await session.usage();
console.log(usage.toolCalls);     // 6
console.log(usage.durationMs);    // 8131
console.log(usage.budgetUsed);    // 1249.00 (BRL)
console.log(usage.budgetRemaining); // 3751.00

// Organization-level usage (for billing dashboards)
const orgUsage = await codespar.usage({
  period: "current_month",
});
console.log(orgUsage.totalToolCalls);  // 12,847
console.log(orgUsage.plan);            // "starter"
console.log(orgUsage.limit);           // 200000
console.log(orgUsage.remaining);       // 187153

The billing model maps directly to Stripe metered billing. At the end of each billing cycle, the session_tool_calls table is aggregated per organization and reported as metered usage. Overages are billed at the per-call rate for your plan tier.

Why not bill per session or per API call? Because sessions vary wildly in complexity. A simple "check my balance" query is one tool call. A Complete Loop is six. A customer support agent handling a return might make twelve. Per-tool-call billing ensures fair pricing regardless of how your agents use the SDK.

PlanTool calls/moPricePer-call
Hobby20,000$0Free
Starter200,000$29/mo$0.000145
Growth2,000,000$229/mo$0.000115
EnterpriseCustomCustomCustom

What we shipped

Five packages. All typed. All building via Turborepo. Published on npm at v0.2.0.

PackagePurpose
@codespar/sdkCore: sessions, tools, execute, send, sendStream, loop, managed auth
@codespar/claudeClaude API adapter (getTools, handleToolUse, toToolResultBlock)
@codespar/openaiOpenAI function calling adapter
@codespar/vercelVercel AI SDK adapter with execute() functions
@codespar/mcpMCP config generator for IDE integration

The architecture decision that made shipping fast: all adapters depend on @codespar/sdk for types and the session interface, but they do not depend on each other. Installing @codespar/openai does not pull in Vercel AI SDK. Installing @codespar/vercel does not pull in the Anthropic SDK. You only pay for the dependencies you actually use.

What is next

The SDK at v0.2.0 covers the core surface: sessions, tools, send, sendStream, execute, loop, managed auth, and billing tracking. Here is what is coming next, in order of priority:

  1. Stripe billing integration (Marco 3). Wire metered billing from session_tool_calls to Stripe. Quota enforcement in the execute and send endpoints. Self-serve plan upgrades from the dashboard. This is the monetization milestone.
  2. Real MCP routing. Currently, the meta tools execute against mock servers. The next step connects them to the 57 live MCP servers via Streamable HTTP transport. Tool calls will hit real Zoop, Nuvem Fiscal, Melhor Envio, and Omie APIs.
  3. Managed connection UI. An OAuth connection flow in the dashboard where users click "Connect Zoop" and complete the authorization without writing code. Tokens stored server-side, auto-refreshed, never exposed to the client.
  4. Python SDK. Same API surface, same session model, targeting LangChain and CrewAI adapters. Python is essential for reaching the ML/data science agent community.
  5. Server-side triggers. Define automation rules in the dashboard: "When payment.completed, auto-issue NF-e" or "When shipping.label_created, notify customer on WhatsApp." This turns CodeSpar from a tool execution layer into a commerce automation engine that runs with no code at all.
  6. Mexico and Colombia presets. The MCP server catalog already covers 4 countries. Adding presets for Mexico (SAT/CFDI, Conekta, Envia) and Colombia (DIAN, Wompi, Envia) unlocks the rest of LatAm.

The SDK is the product. Everything else -- dashboard, servers, docs -- exists to make the SDK easier to adopt.

The SDK is open source. The code is at github.com/codespar/codespar-core. Install it with npm install @codespar/sdk and tell us what breaks.

This post covers the public SDK (@codespar/sdk@0.2.0). For a hands-on tutorial, see the Complete Loop tutorial. The enterprise packages (PolicyEngine, MandateGenerator, PaymentGateway) are covered in the thesis post.