Operator walkthrough for /dashboard/router — per (provider × canonical_tool) attempts, success rate, p50 latency, last error, and a 24h sparkline.

Router observability

The /dashboard/router page is the operator surface for the meta-tool router. It renders one row per (provider × canonical_tool) pair with the live attempt and success counters, the p50 upstream latency, the last error code (if any), and a 24-hour sparkline.

Operators open this page when:

A rail is failing in production and you need to confirm it is the provider, not the dispatch logic.
You are about to flip META_TOOL_MOCK=0 for an org and want to confirm the live rail is healthy first.
You added a new provider to the catalog and want to verify traffic is flowing through it.

This page is service-key only. The dashboard reads your service key from a cookie set during login and forwards it on every request. There is no end-user equivalent.

Columns explained

Column	Source	What it tells you
`provider`	`provider_success_telemetry.provider_id`	Which catalog provider serviced the calls (e.g. `asaas`, `mercadopago`, `stripe`).
`canonical_tool`	`provider_success_telemetry.canonical_tool`	Which meta-tool routed here — one of `codespar_pay`, `codespar_charge`, `codespar_invoice`, `codespar_notify`, `codespar_ship`, `codespar_crypto_pay`, `codespar_kyc`.
`attempts`	rolled up	Total upstream calls in the rollup window.
`successes`	rolled up	Calls that returned a 2xx upstream.
`success_rate`	derived	`successes / attempts`. Renders red below 95%, amber 95–99%, green 99%+.
`latency_p50_ms`	rolled up	50th-percentile upstream call latency. The provider's own latency, not your dispatch overhead.
`last_error_code`	rolled up	Most recent non-success error code seen on this rail. Empty when the rail is clean.
sparkline	hourly buckets	24 hourly points of `latency_p50_ms`. Gaps render as null breaks in the line — not zero.

The rolled-up table comes from a single call to GET /v1/meta-tools/stats. The sparkline next to each row fans out one GET /v1/meta-tools/stats/hourly?provider_id=X&canonical_tool=Y per row.

What good looks like

At steady state on a healthy rail:

Success rate ≥ 99%. Pix charge rails (Asaas, Mercado Pago, iugu, Stone) typically sit at 99.5%+. Card rails (Stripe, MP) the same. Boleto is structurally lower because of buyer-side cancel.
latency_p50_ms under the provider's published SLA. Asaas Pix charge p50 ~400ms; Stripe charge p50 ~600ms; Melhor Envio quote p50 ~250ms. Sustained values 2× the published SLA mean something is wrong.
last_error_code empty or rotating through one-off codes. A persistent error code (the same string row after row) is a strong signal of credentials drift or a provider-side breaking change.

What to do when it goes wrong

High error rate on one provider

The rail's success_rate is below 95% and last_error_code is the same string for many rows in a row.

Open /dashboard/health, scroll to the connections panel. Confirm the org's connection for that provider is connected. A drifted credential surfaces here first.
Check the provider's status page. Asaas, Stripe, and Mercado Pago all publish status. A real outage will show.
If the provider is confirmed broken, set eligibility = false for that (org_id × provider_id × canonical_tool) row in meta_tool_eligibility. The router drains traffic to the next failover candidate within ~30s. Reverse the flag once the provider recovers.

p50 latency degraded

latency_p50_ms is 2× or more the steady-state value.

Look at the sparkline. A clean step at one hour boundary points at a deploy or a provider-side incident; a rising slope points at a capacity issue.
Cross-reference with /dashboard/health — if the telemetry check is idle or counts are unexpectedly low, the issue may be on your side, not the provider's.
If the latency is real and persistent, consider draining via meta_tool_eligibility while the provider investigates.

Zero attempts on a rail you expect traffic on

The row exists but attempts = 0 over the rollup window.

Check whether the tenant has a connection for the provider — GET /v1/connections?provider_id=X. If there is no connection, the dispatcher cannot pick this rail.
Check the classifier output for this tenant's recent calls — open /dashboard/router-candidates to see whether the LLM is putting traffic somewhere unexpected.
Confirm the meta-tool is actually being invoked. GET /v1/tool-calls?canonical_tool=codespar_charge&limit=10 and inspect the chosen provider_id per call.

API reference

The underlying endpoints — GET /v1/meta-tools/stats for the rollup and GET /v1/meta-tools/stats/hourly?provider_id=X&canonical_tool=Y for the 24-bucket sparkline — are documented under Sessions API. Both are service-key only and return 403 for bearer-token requests.

Next steps

CONCEPT

Health rollup

Three-section operator rollup composing /v1/health + /v1/meta-tools/stats + /v1/connections.

CONCEPT

Router candidates triage

Frontend-only triage UI for the LLM-classified rail-candidates report.

CONCEPT

Tool Router

How sessions route a meta-tool call to the right provider.

REFERENCE

Sessions API

Full HTTP reference for the rollup + hourly endpoints.

Router observability

On this page