Billing and usage
Per-org caps, soft / hard cap behavior, and Stripe overage billing for agent runs. The cost-control surface lives between the run-stream entry gate, the usage-aggregator worker, and the billing dashboard — three moving parts that share one rolled-up table.
Why caps exist #
Free-tier orgs run on the platform Anthropic key — without a hard cap, a single runaway agent on a Friday afternoon racks up unbounded spend on our card. Paid-tier customers, conversely, get no signal before the invoice arrives if a workload silently 10x's mid-month. Caps close both loops: free tiers can't blow our budget, paid tiers see the overage coming before it lands on the bill.
The pricing model #
Tier-default caps live in TIER_DEFAULT_CAPS in
api/src/lib/cost-control/aggregator.ts — one source of truth, read
by both the worker and the run-stream gate.
| Tier | Included | Cap behavior | Overage |
|---|---|---|---|
free |
100,000 runs / month, no token cap | hard cap (refuse new runs at 100%) | none — upgrade to bill overage |
pro |
$49 / agent / month, 50,000,000 input tokens / month | soft cap (warn at 80%, allow overage) | metered via Stripe (PR-5, in flight) |
enterprise |
unlimited (NULL caps) | none — handled out-of-band by contract | custom contract |
A NULL cap on a dimension means "unlimited for that dimension". Free gates on runs and tracks tokens informationally; pro gates on input tokens and tracks runs informationally; enterprise tracks everything without ever refusing.
The tier column on organizations is the denormalised cache —
constrained to 'free' | 'pro' | 'enterprise' via a CHECK constraint
(organizations_tier_check). Stripe webhook flips it on subscription
status change; defaults to 'free' so anything pre-tier-assignment
lands on the safest cap.
The billing period #
Calendar-month UTC. Period 1 is 2026-01-01T00:00:00Z (inclusive) →
2026-02-01T00:00:00Z (exclusive); period 2 is 2026-02-01T00:00:00Z
→ 2026-03-01T00:00:00Z; etc. Half-open [periodStart, periodEnd) so
two consecutive periods don't double-count the boundary instant — the
same convention as agent_runs.started_at / ended_at.
Why calendar-month, not rolling-30-day:
- Predictable invoice cycles. An org's monthly cap resets on the 1st, not on a moving target tied to signup date. "I have N runs left this month" is a calendar-readable claim.
- Aligns with Stripe's default billing-cycle anchor when no
subscription exists. Paid orgs use their Stripe subscription's
currentPeriodStart/Enddirectly; this fallback math is for free / pre-subscribe orgs and aims at the same boundary Stripe will eventually pick. - Operationally simpler. The worker's "is this still the current
period?" check is
now < periodEnd, not a rolling-window subtraction that drifts across worker restarts.
Defined in api/src/lib/billing-period.ts (currentPeriodFor(now, tier) → { periodStart, periodEnd }).
What gets measured #
Three dimensions, all summed from the usage_records append-only
table by usage_records.kind:
| Dimension | usage_records.kind |
Cap column on the period row |
|---|---|---|
runs |
agent_run (one per agent run-stream) |
cap_runs |
input_tokens |
input_tokens (sum from agent_messages) |
cap_input_tokens |
output_tokens |
output_tokens (sum from agent_messages) |
cap_output_tokens |
The runtime appends usage_records rows at run-end via
POST /v1/internal/usage (org-scoped from the deployment lookup, not
the caller's claim). The PR-2 aggregator worker rolls those rows into
organization_usage_periods every ~60s — one row per (organization, periodStart), written via INSERT … ON CONFLICT (organization_id, period_start) DO UPDATE. Reads (the dashboard, the gate) hit the
rolled-up row, never re-aggregate usage_records on the hot path.
agent_messages is the source of truth for the observability surface
(see observability.md) because it's the most
honest view of what the user actually saw. usage_records is the
billing surface — what feeds Stripe — and may diverge from
agent_messages on retried persists. The aggregator uses
usage_records so caps and invoices agree on a single counting rule.
Cancellation semantics follow the runtime's usage_records writes —
see agent-runs.md for the cancellation contract;
runs that the runtime suppresses from usage_records won't count
against the cap, runs it doesn't won't.
Where to look #
The dashboard surfaces are in PR-4 (in flight at write time — this section reflects the spec in epic #296 PR-4; once shipped the actual behavior should match):
/settings/billing(org admin dashboard, planned for PR-4) — current-period usage chart per dimension, projection-to-period-end (linear extrapolation past today, drawn faded), the approaching-cap banner, and the overage rules editor.- API —
GET /v1/orgs/:slug/billingreturns subscription + plan- current-period usage today. PR-4 extends it to include the period
row's
cap_*,softCapWarnedAt, andhardCapHitAtmarkers programmatically. Auth: any org member; cross-org isolation enforced at the join.
- current-period usage today. PR-4 extends it to include the period
row's
audit.flaggedwebhook event — fires once per period when a soft-cap or hard-cap threshold is crossed, withdata.kindofusage_soft_caporusage_hard_caprespectively. See Soft-cap behavior / Hard-cap behavior for payload shapes.
Soft-cap behavior #
When any tracked dimension crosses the soft-cap threshold (default
80%, per-org configurable via organization_overage_rules. soft_cap_threshold_pct, range 0–100 enforced via DB CHECK), the
PR-2 worker stamps organization_usage_periods.soft_cap_warned_at on
its next tick and fires audit.flagged with data.kind = 'usage_soft_cap'.
{
"id": "1f2e3d4c-5b6a-7889-99aa-bbccddeeff00",
"type": "audit.flagged",
"createdAt": "2026-05-09T08:30:00.000Z",
"organizationId": "org_2t4b...",
"data": {
"kind": "usage_soft_cap",
"organizationId": "org_2t4b...",
"dimension": "input_tokens",
"currentUsage": 40500000,
"cap": 50000000,
"percentUsed": 81.0,
"thresholdPct": 80,
"periodStart": "2026-05-01T00:00:00.000Z",
"periodEnd": "2026-06-01T00:00:00.000Z"
}
}dimension is the first one to trip — order is runs →
input_tokens → output_tokens so subscribers see consistent
dimension labels across periods. Runs continue normally; the dashboard
surfaces an amber banner. For paid tiers, overage past 100% is billed
at end of period (see Overage billing).
The soft_cap_warned_at column is the one-shot dedupe key — the
worker's WHERE soft_cap_warned_at IS NULL predicate guarantees the
webhook fires at most once per period regardless of how many ticks
re-detect the over-threshold state.
Hard-cap behavior #
When any tracked dimension reaches 100% AND hard_cap_enabled is
true (default true for free tier, default false for pro / enterprise,
per-org override via organization_overage_rules.hard_cap_enabled):
The PR-2 worker stamps
organization_usage_periods.hard_cap_hit_atand fires a secondaudit.flaggedwithdata.kind = 'usage_hard_cap'(same payload shape asusage_soft_capminus thethresholdPctfield).The PR-3 run-stream entry gate refuses new runs with HTTP 402 Payment Required (RFC 9110 §15.5.2):
{ "error": "usage_cap_exceeded", "tripDimension": "runs", "currentUsage": { "runs": 100002, "inputTokens": 12345678, "outputTokens": 543210 }, "capUsage": { "runs": 100000, "inputTokens": null, "outputTokens": null }, "periodEnd": "2026-06-01T00:00:00.000Z", "reason": "free_tier_cap_exceeded", "upgradeUrl": "https://app.stech.com/settings/billing" }tripDimensionis the first dimension to trip (sameruns→input_tokens→output_tokensorder as the soft-cap fan-out).capUsage.*fields arenullfor unlimited dimensions.reasonis"free_tier_cap_exceeded"for free orgs and"hard_cap_exceeded"for paid orgs that have flipped to hard-cap via the override row — both map to the same 402 status; the string is purely cosmetic for nicer dashboard / SDK copy.
402 status is deliberate: 403 means RBAC-deny in this codebase,
429 means rate-limit. SDK consumers should branch on
response.status === 402 && body.error === "usage_cap_exceeded" to
surface the upgrade prompt cleanly.
In-flight runs at the time of the cap stamping continue to
completion — the gate only blocks NEW runs at the run-stream entry.
At period rollover (next 1st-of-month UTC) the next worker tick
materialises a fresh period row with fresh cap headroom and
hard_cap_hit_at = NULL, and the gate clears automatically.
The ~60s aggregation lag #
The usage_records writer appends rows at run END, not run start.
The aggregator worker rolls them into organization_usage_periods
every 60s by default (DEFAULT_POLL_INTERVAL_MS in
api/src/lib/cost-control/aggregator.ts). The gate's view of "current
usage" is therefore up to ~60s stale.
Worst case: two concurrent run-stream calls at 99% both pass the gate
check, both succeed, and push the period to 101%. The next worker
tick stamps hard_cap_hit_at and from then on every new run is
refused with 402. We accept the 1–2 over-cap runs as a known cost in
exchange for keeping the gate to a single SQL roundtrip — re-summing
usage_records on every run-stream entry would defeat the whole
point of the rolled-up table.
For free orgs the worst case is ~2 runs of platform-key spend; for paid orgs the same race surfaces as 1–2 runs of overage on the next invoice. Documented in epic #296's "Risks" section as an accepted trade-off; the right fix is per-agent token budgets pre-charged at run-stream entry, deferred as a follow-up.
If you need hard real-time refusal in your own pipeline, you can run
a pre-flight check via GET /v1/orgs/:slug/billing before each run
— but that's a 1-RTT round-trip on every call and we don't recommend
it for normal workloads.
Overage billing #
Spec — PR-5 (Stripe overage wiring) is in flight at write time and not yet merged. The reference here matches what's planned in epic #296 PR-5; once shipped the actual behavior should match.
At end of period, completed periods with overage past cap_* get
reported to Stripe via the metering API (stripe.billing. meterEvents.create). One Stripe meter event per dimension (runs /
input_tokens / output_tokens) per org per closed period. The
Stripe subscription's usage-based prices map meter events to dollar
amounts.
The reporter is web/lib/billing-overage.ts (already exists,
unwired). PR-5 mounts it behind an admin-only Next route at
/api/admin/report-overage and schedules a daily POST via a GitHub
Actions workflow at cron('5 1 * * *') (~02:00 UTC daily). Only
periods past their period_end get reported; the current
in-progress period rolls up at next 1st-of-month UTC.
Stripe's metering API has ~10s eventual consistency between meter
event creation and dashboard visibility — neither this code nor
this doc poll Stripe to confirm receipt; the meter_event_identifier
we send is the cite, and Stripe's dashboard is the source of truth.
Configuring per-org policy #
Hand-tuned overrides land in organization_overage_rules (one row
per org, most orgs have no row at all and use tier defaults):
| Column | Default | Range | Meaning |
|---|---|---|---|
hard_cap_enabled |
false |
bool | Override the tier default. Free is hard-cap by default; flipping a paid org to true makes the gate refuse instead of bill overage. |
soft_cap_threshold_pct |
80 |
0–100 (DB CHECK) | Percent of cap at which the soft-cap warning webhook fires. 60 / 90 are the typical asks from regulated customers. |
overage_input_token_micros, overage_output_token_micros, overage_run_micros |
NULL | bigint | Per-org overage rate overrides. NULL = use tier default from plans. |
notes |
NULL | text | Free-form admin annotation — "negotiated as part of MSA 2026-Q2", "test sandbox, do not bill", etc. |
updated_by_user_id |
NULL | FK → users.id (SET NULL on delete) |
Audit trail for who last touched the override. |
Edits surface in /settings/billing (PR-4) under the overage rules
editor; the underlying CRUD endpoints land alongside that PR.
Cross-references #
- observability.md — the run-history surface and
the failure-rate watchdog. Cancellation semantics for usage
counting follow the runtime's
usage_recordswrites; see agent-runs.md for the cancellation contract. - webhooks.md —
audit.flaggedenvelope, signing scheme, retry policy. Theusage_soft_cap/usage_hard_cappayloads above use the same envelope as every other curatedaudit.flaggedevent; receivers branch ondata.kindbefore touching dimension-specific fields. - audit-log.md — the retrospective view. Once PR-3
starts refusing runs with 402, the corresponding
agent_runs. cap_refusedadmin_action surfaces in the admin tab for "who got refused, when, on which dimension" forensics.
Limitations #
- No per-agent budgets yet (per-org only). The
organization_overage_rulestable is per-org by design; per-agent multiplies the gate's join cost on every run-stream entry. Filed as a future epic if customers ask. - No pre-pay credits flow. Different billing model; not on the current roadmap.
- No annual billing. Stripe handles month-to-month; annual-with-overage is a separate Stripe price config we haven't wired.
- Mid-period tier changes apply on the NEXT period boundary. The
current period keeps its cap snapshot — see the "no mid-period
retroactive re-cap" rule in
db/src/schema/organization-usage- periods.ts. A customer downgrading shouldn't suddenly see their period as over-cap; an upgrade gets the new headroom on the 1st. - Aggregation lag means the cap may admit one or two runs past the limit. See The ~60s aggregation lag.
audit.flaggedfrom the worker fires once per period per cap level. Subscribers wanting per-failed-request notifications should subscribe toadmin_action.recorded(PR-3 writesagent_runs.cap_refusedadmin_actions on every refused run).- Single-replica aggregator. The PR-2 worker assumes one api
process owns aggregation. Multi-replica setups would race two
workers on the same period row; the upsert's UNIQUE constraint
collapses the duplicate INSERT but the soft-cap dedupe
IS NULLcheck could double-fire if two ticks race between SELECT and UPDATE. Same posture as the webhook delivery worker; multi-replica advisory locks are deferred until we actually run >1 api replica.
Troubleshooting #
I got HTTP 402 but I'm a paid customer. Check
organization_overage_rules.hard_cap_enabled for your org. Paid
tiers default to false (overage is billed instead of refused) but
admins can flip the override row to true for hard-cap behavior.
The gate reads the override first, falls back to the tier default
otherwise.
Soft-cap warning fired but I'm under 80%. Check
organization_overage_rules.soft_cap_threshold_pct — an admin may
have set it lower (60 is a common ask from regulated customers).
The aggregator's DEFAULT_SOFT_CAP_THRESHOLD_PCT is 80; the
override wins if present.
I see my usage in /settings/billing but Stripe didn't bill.
Only periods past their period_end get reported by PR-5's reporter.
The current in-progress period rolls up at the next 1st-of-month UTC
and gets reported on the next scheduled cron tick (~02:00 UTC daily).
Stripe shows higher usage than the dashboard. Eventual consistency window — Stripe's metering API has ~10s lag on the Stripe side; the dashboard updates once per worker tick (~60s). The two converge but not in lockstep.
I want to test cap behavior in dev. Set organizations.tier = 'free' on a test org (or flip organization_overage_rules. hard_cap_enabled to true on a paid one) and crank up usage_records
via the smoke script. The aggregator's next tick (within 60s) will
stamp the cap markers and the gate will start refusing.
Webhook fired twice for the same cap crossing. Shouldn't happen
under normal operation — the worker's one-shot dedupe (WHERE soft_cap_warned_at IS NULL / WHERE hard_cap_hit_at IS NULL)
guarantees at-most-once. If you see it, check whether multiple api
replicas are running aggregator workers (single-replica is the
documented posture); the next worker tick after a duplicate fire
won't re-fire on the same row.