Router observability
Operator walkthrough for /dashboard/router — per (provider × canonical_tool) attempts, success rate, p50 latency, last error, and a 24h sparkline.
Router observability
The /dashboard/router page is the operator surface for the meta-tool router. It renders one row per (provider × canonical_tool) pair with the live attempt and success counters, the p50 upstream latency, the last error code (if any), and a 24-hour sparkline.
Operators open this page when:
- A rail is failing in production and you need to confirm it is the provider, not the dispatch logic.
- You are about to flip
META_TOOL_MOCK=0for an org and want to confirm the live rail is healthy first. - You added a new provider to the catalog and want to verify traffic is flowing through it.
This page is service-key only. The dashboard reads your service key from a cookie set during login and forwards it on every request. There is no end-user equivalent.
Columns explained
| Column | Source | What it tells you |
|---|---|---|
provider | provider_success_telemetry.provider_id | Which catalog provider serviced the calls (e.g. asaas, mercadopago, stripe). |
canonical_tool | provider_success_telemetry.canonical_tool | Which meta-tool routed here — one of codespar_pay, codespar_charge, codespar_invoice, codespar_notify, codespar_ship, codespar_crypto_pay, codespar_kyc. |
attempts | rolled up | Total upstream calls in the rollup window. |
successes | rolled up | Calls that returned a 2xx upstream. |
success_rate | derived | successes / attempts. Renders red below 95%, amber 95–99%, green 99%+. |
latency_p50_ms | rolled up | 50th-percentile upstream call latency. The provider's own latency, not your dispatch overhead. |
last_error_code | rolled up | Most recent non-success error code seen on this rail. Empty when the rail is clean. |
| sparkline | hourly buckets | 24 hourly points of latency_p50_ms. Gaps render as null breaks in the line — not zero. |
The rolled-up table comes from a single call to GET /v1/meta-tools/stats. The sparkline next to each row fans out one GET /v1/meta-tools/stats/hourly?provider_id=X&canonical_tool=Y per row.
What good looks like
At steady state on a healthy rail:
- Success rate ≥ 99%. Pix charge rails (Asaas, Mercado Pago, iugu, Stone) typically sit at 99.5%+. Card rails (Stripe, MP) the same. Boleto is structurally lower because of buyer-side cancel.
latency_p50_msunder the provider's published SLA. Asaas Pix charge p50 ~400ms; Stripe charge p50 ~600ms; Melhor Envio quote p50 ~250ms. Sustained values 2× the published SLA mean something is wrong.last_error_codeempty or rotating through one-off codes. A persistent error code (the same string row after row) is a strong signal of credentials drift or a provider-side breaking change.
What to do when it goes wrong
High error rate on one provider
The rail's success_rate is below 95% and last_error_code is the same string for many rows in a row.
- Open
/dashboard/health, scroll to the connections panel. Confirm the org's connection for that provider isconnected. A drifted credential surfaces here first. - Check the provider's status page. Asaas, Stripe, and Mercado Pago all publish status. A real outage will show.
- If the provider is confirmed broken, set
eligibility = falsefor that(org_id × provider_id × canonical_tool)row inmeta_tool_eligibility. The router drains traffic to the next failover candidate within ~30s. Reverse the flag once the provider recovers.
p50 latency degraded
latency_p50_ms is 2× or more the steady-state value.
- Look at the sparkline. A clean step at one hour boundary points at a deploy or a provider-side incident; a rising slope points at a capacity issue.
- Cross-reference with
/dashboard/health— if thetelemetrycheck isidleor counts are unexpectedly low, the issue may be on your side, not the provider's. - If the latency is real and persistent, consider draining via
meta_tool_eligibilitywhile the provider investigates.
Zero attempts on a rail you expect traffic on
The row exists but attempts = 0 over the rollup window.
- Check whether the tenant has a connection for the provider —
GET /v1/connections?provider_id=X. If there is no connection, the dispatcher cannot pick this rail. - Check the classifier output for this tenant's recent calls — open
/dashboard/router-candidatesto see whether the LLM is putting traffic somewhere unexpected. - Confirm the meta-tool is actually being invoked.
GET /v1/tool-calls?canonical_tool=codespar_charge&limit=10and inspect the chosenprovider_idper call.
API reference
The underlying endpoints — GET /v1/meta-tools/stats for the rollup and GET /v1/meta-tools/stats/hourly?provider_id=X&canonical_tool=Y for the 24-bucket sparkline — are documented under Sessions API. Both are service-key only and return 403 for bearer-token requests.
Next steps
Last updated on
Directed-pay
Consumer-mandate flow — orgs charge an end-consumer's rail (Pix consent, card token, TED debit-auth) under a signed, capped, revocable consent. The non-wallet half of AgentGate's commerce primitives.
Health rollup
Operator walkthrough for /dashboard/health — 30s polling rollup of /v1/health, /v1/meta-tools/stats, and /v1/connections with degraded/recovered alerting.