all systems operationalv0.17.10
stech/

Billing and usage

Per-org caps, soft / hard cap behavior, and Stripe overage billing for agent runs. The cost-control surface lives between the run-stream entry gate, the usage-aggregator worker, and the billing dashboard — three moving parts that share one rolled-up table.

Why caps exist #

Free-tier orgs run on the platform Anthropic key — without a hard cap, a single runaway agent on a Friday afternoon racks up unbounded spend on our card. Paid-tier customers, conversely, get no signal before the invoice arrives if a workload silently 10x's mid-month. Caps close both loops: free tiers can't blow our budget, paid tiers see the overage coming before it lands on the bill.

The pricing model #

Tier-default caps live in TIER_DEFAULT_CAPS in api/src/lib/cost-control/aggregator.ts — one source of truth, read by both the worker and the run-stream gate.

Tier Included Cap behavior Overage
free 100,000 runs / month, no token cap hard cap (refuse new runs at 100%) none — upgrade to bill overage
pro $49 / agent / month, 50,000,000 input tokens / month soft cap (warn at 80%, allow overage) metered via Stripe (PR-5, in flight)
enterprise unlimited (NULL caps) none — handled out-of-band by contract custom contract

A NULL cap on a dimension means "unlimited for that dimension". Free gates on runs and tracks tokens informationally; pro gates on input tokens and tracks runs informationally; enterprise tracks everything without ever refusing.

The tier column on organizations is the denormalised cache — constrained to 'free' | 'pro' | 'enterprise' via a CHECK constraint (organizations_tier_check). Stripe webhook flips it on subscription status change; defaults to 'free' so anything pre-tier-assignment lands on the safest cap.

The billing period #

Calendar-month UTC. Period 1 is 2026-01-01T00:00:00Z (inclusive) → 2026-02-01T00:00:00Z (exclusive); period 2 is 2026-02-01T00:00:00Z2026-03-01T00:00:00Z; etc. Half-open [periodStart, periodEnd) so two consecutive periods don't double-count the boundary instant — the same convention as agent_runs.started_at / ended_at.

Why calendar-month, not rolling-30-day:

  • Predictable invoice cycles. An org's monthly cap resets on the 1st, not on a moving target tied to signup date. "I have N runs left this month" is a calendar-readable claim.
  • Aligns with Stripe's default billing-cycle anchor when no subscription exists. Paid orgs use their Stripe subscription's currentPeriodStart / End directly; this fallback math is for free / pre-subscribe orgs and aims at the same boundary Stripe will eventually pick.
  • Operationally simpler. The worker's "is this still the current period?" check is now < periodEnd, not a rolling-window subtraction that drifts across worker restarts.

Defined in api/src/lib/billing-period.ts (currentPeriodFor(now, tier){ periodStart, periodEnd }).

What gets measured #

Three dimensions, all summed from the usage_records append-only table by usage_records.kind:

Dimension usage_records.kind Cap column on the period row
runs agent_run (one per agent run-stream) cap_runs
input_tokens input_tokens (sum from agent_messages) cap_input_tokens
output_tokens output_tokens (sum from agent_messages) cap_output_tokens

The runtime appends usage_records rows at run-end via POST /v1/internal/usage (org-scoped from the deployment lookup, not the caller's claim). The PR-2 aggregator worker rolls those rows into organization_usage_periods every ~60s — one row per (organization, periodStart), written via INSERT … ON CONFLICT (organization_id, period_start) DO UPDATE. Reads (the dashboard, the gate) hit the rolled-up row, never re-aggregate usage_records on the hot path.

agent_messages is the source of truth for the observability surface (see observability.md) because it's the most honest view of what the user actually saw. usage_records is the billing surface — what feeds Stripe — and may diverge from agent_messages on retried persists. The aggregator uses usage_records so caps and invoices agree on a single counting rule.

Cancellation semantics follow the runtime's usage_records writes — see agent-runs.md for the cancellation contract; runs that the runtime suppresses from usage_records won't count against the cap, runs it doesn't won't.

Where to look #

The dashboard surfaces are in PR-4 (in flight at write time — this section reflects the spec in epic #296 PR-4; once shipped the actual behavior should match):

  • /settings/billing (org admin dashboard, planned for PR-4) — current-period usage chart per dimension, projection-to-period-end (linear extrapolation past today, drawn faded), the approaching-cap banner, and the overage rules editor.
  • APIGET /v1/orgs/:slug/billing returns subscription + plan
    • current-period usage today. PR-4 extends it to include the period row's cap_*, softCapWarnedAt, and hardCapHitAt markers programmatically. Auth: any org member; cross-org isolation enforced at the join.
  • audit.flagged webhook event — fires once per period when a soft-cap or hard-cap threshold is crossed, with data.kind of usage_soft_cap or usage_hard_cap respectively. See Soft-cap behavior / Hard-cap behavior for payload shapes.

Soft-cap behavior #

When any tracked dimension crosses the soft-cap threshold (default 80%, per-org configurable via organization_overage_rules. soft_cap_threshold_pct, range 0–100 enforced via DB CHECK), the PR-2 worker stamps organization_usage_periods.soft_cap_warned_at on its next tick and fires audit.flagged with data.kind = 'usage_soft_cap'.

{
  "id": "1f2e3d4c-5b6a-7889-99aa-bbccddeeff00",
  "type": "audit.flagged",
  "createdAt": "2026-05-09T08:30:00.000Z",
  "organizationId": "org_2t4b...",
  "data": {
    "kind": "usage_soft_cap",
    "organizationId": "org_2t4b...",
    "dimension": "input_tokens",
    "currentUsage": 40500000,
    "cap": 50000000,
    "percentUsed": 81.0,
    "thresholdPct": 80,
    "periodStart": "2026-05-01T00:00:00.000Z",
    "periodEnd": "2026-06-01T00:00:00.000Z"
  }
}

dimension is the first one to trip — order is runsinput_tokensoutput_tokens so subscribers see consistent dimension labels across periods. Runs continue normally; the dashboard surfaces an amber banner. For paid tiers, overage past 100% is billed at end of period (see Overage billing).

The soft_cap_warned_at column is the one-shot dedupe key — the worker's WHERE soft_cap_warned_at IS NULL predicate guarantees the webhook fires at most once per period regardless of how many ticks re-detect the over-threshold state.

Hard-cap behavior #

When any tracked dimension reaches 100% AND hard_cap_enabled is true (default true for free tier, default false for pro / enterprise, per-org override via organization_overage_rules.hard_cap_enabled):

  1. The PR-2 worker stamps organization_usage_periods.hard_cap_hit_at and fires a second audit.flagged with data.kind = 'usage_hard_cap' (same payload shape as usage_soft_cap minus the thresholdPct field).

  2. The PR-3 run-stream entry gate refuses new runs with HTTP 402 Payment Required (RFC 9110 §15.5.2):

    {
      "error": "usage_cap_exceeded",
      "tripDimension": "runs",
      "currentUsage": {
        "runs": 100002,
        "inputTokens": 12345678,
        "outputTokens": 543210
      },
      "capUsage": {
        "runs": 100000,
        "inputTokens": null,
        "outputTokens": null
      },
      "periodEnd": "2026-06-01T00:00:00.000Z",
      "reason": "free_tier_cap_exceeded",
      "upgradeUrl": "https://app.stech.com/settings/billing"
    }

    tripDimension is the first dimension to trip (same runsinput_tokensoutput_tokens order as the soft-cap fan-out). capUsage.* fields are null for unlimited dimensions. reason is "free_tier_cap_exceeded" for free orgs and "hard_cap_exceeded" for paid orgs that have flipped to hard-cap via the override row — both map to the same 402 status; the string is purely cosmetic for nicer dashboard / SDK copy.

402 status is deliberate: 403 means RBAC-deny in this codebase, 429 means rate-limit. SDK consumers should branch on response.status === 402 && body.error === "usage_cap_exceeded" to surface the upgrade prompt cleanly.

In-flight runs at the time of the cap stamping continue to completion — the gate only blocks NEW runs at the run-stream entry. At period rollover (next 1st-of-month UTC) the next worker tick materialises a fresh period row with fresh cap headroom and hard_cap_hit_at = NULL, and the gate clears automatically.

The ~60s aggregation lag #

The usage_records writer appends rows at run END, not run start. The aggregator worker rolls them into organization_usage_periods every 60s by default (DEFAULT_POLL_INTERVAL_MS in api/src/lib/cost-control/aggregator.ts). The gate's view of "current usage" is therefore up to ~60s stale.

Worst case: two concurrent run-stream calls at 99% both pass the gate check, both succeed, and push the period to 101%. The next worker tick stamps hard_cap_hit_at and from then on every new run is refused with 402. We accept the 1–2 over-cap runs as a known cost in exchange for keeping the gate to a single SQL roundtrip — re-summing usage_records on every run-stream entry would defeat the whole point of the rolled-up table.

For free orgs the worst case is ~2 runs of platform-key spend; for paid orgs the same race surfaces as 1–2 runs of overage on the next invoice. Documented in epic #296's "Risks" section as an accepted trade-off; the right fix is per-agent token budgets pre-charged at run-stream entry, deferred as a follow-up.

If you need hard real-time refusal in your own pipeline, you can run a pre-flight check via GET /v1/orgs/:slug/billing before each run — but that's a 1-RTT round-trip on every call and we don't recommend it for normal workloads.

Overage billing #

Spec — PR-5 (Stripe overage wiring) is in flight at write time and not yet merged. The reference here matches what's planned in epic #296 PR-5; once shipped the actual behavior should match.

At end of period, completed periods with overage past cap_* get reported to Stripe via the metering API (stripe.billing. meterEvents.create). One Stripe meter event per dimension (runs / input_tokens / output_tokens) per org per closed period. The Stripe subscription's usage-based prices map meter events to dollar amounts.

The reporter is web/lib/billing-overage.ts (already exists, unwired). PR-5 mounts it behind an admin-only Next route at /api/admin/report-overage and schedules a daily POST via a GitHub Actions workflow at cron('5 1 * * *') (~02:00 UTC daily). Only periods past their period_end get reported; the current in-progress period rolls up at next 1st-of-month UTC.

Stripe's metering API has ~10s eventual consistency between meter event creation and dashboard visibility — neither this code nor this doc poll Stripe to confirm receipt; the meter_event_identifier we send is the cite, and Stripe's dashboard is the source of truth.

Configuring per-org policy #

Hand-tuned overrides land in organization_overage_rules (one row per org, most orgs have no row at all and use tier defaults):

Column Default Range Meaning
hard_cap_enabled false bool Override the tier default. Free is hard-cap by default; flipping a paid org to true makes the gate refuse instead of bill overage.
soft_cap_threshold_pct 80 0–100 (DB CHECK) Percent of cap at which the soft-cap warning webhook fires. 60 / 90 are the typical asks from regulated customers.
overage_input_token_micros, overage_output_token_micros, overage_run_micros NULL bigint Per-org overage rate overrides. NULL = use tier default from plans.
notes NULL text Free-form admin annotation — "negotiated as part of MSA 2026-Q2", "test sandbox, do not bill", etc.
updated_by_user_id NULL FK → users.id (SET NULL on delete) Audit trail for who last touched the override.

Edits surface in /settings/billing (PR-4) under the overage rules editor; the underlying CRUD endpoints land alongside that PR.

Cross-references #

  • observability.md — the run-history surface and the failure-rate watchdog. Cancellation semantics for usage counting follow the runtime's usage_records writes; see agent-runs.md for the cancellation contract.
  • webhooks.mdaudit.flagged envelope, signing scheme, retry policy. The usage_soft_cap / usage_hard_cap payloads above use the same envelope as every other curated audit.flagged event; receivers branch on data.kind before touching dimension-specific fields.
  • audit-log.md — the retrospective view. Once PR-3 starts refusing runs with 402, the corresponding agent_runs. cap_refused admin_action surfaces in the admin tab for "who got refused, when, on which dimension" forensics.

Limitations #

  • No per-agent budgets yet (per-org only). The organization_overage_rules table is per-org by design; per-agent multiplies the gate's join cost on every run-stream entry. Filed as a future epic if customers ask.
  • No pre-pay credits flow. Different billing model; not on the current roadmap.
  • No annual billing. Stripe handles month-to-month; annual-with-overage is a separate Stripe price config we haven't wired.
  • Mid-period tier changes apply on the NEXT period boundary. The current period keeps its cap snapshot — see the "no mid-period retroactive re-cap" rule in db/src/schema/organization-usage- periods.ts. A customer downgrading shouldn't suddenly see their period as over-cap; an upgrade gets the new headroom on the 1st.
  • Aggregation lag means the cap may admit one or two runs past the limit. See The ~60s aggregation lag.
  • audit.flagged from the worker fires once per period per cap level. Subscribers wanting per-failed-request notifications should subscribe to admin_action.recorded (PR-3 writes agent_runs.cap_refused admin_actions on every refused run).
  • Single-replica aggregator. The PR-2 worker assumes one api process owns aggregation. Multi-replica setups would race two workers on the same period row; the upsert's UNIQUE constraint collapses the duplicate INSERT but the soft-cap dedupe IS NULL check could double-fire if two ticks race between SELECT and UPDATE. Same posture as the webhook delivery worker; multi-replica advisory locks are deferred until we actually run >1 api replica.

Troubleshooting #

I got HTTP 402 but I'm a paid customer. Check organization_overage_rules.hard_cap_enabled for your org. Paid tiers default to false (overage is billed instead of refused) but admins can flip the override row to true for hard-cap behavior. The gate reads the override first, falls back to the tier default otherwise.

Soft-cap warning fired but I'm under 80%. Check organization_overage_rules.soft_cap_threshold_pct — an admin may have set it lower (60 is a common ask from regulated customers). The aggregator's DEFAULT_SOFT_CAP_THRESHOLD_PCT is 80; the override wins if present.

I see my usage in /settings/billing but Stripe didn't bill. Only periods past their period_end get reported by PR-5's reporter. The current in-progress period rolls up at the next 1st-of-month UTC and gets reported on the next scheduled cron tick (~02:00 UTC daily).

Stripe shows higher usage than the dashboard. Eventual consistency window — Stripe's metering API has ~10s lag on the Stripe side; the dashboard updates once per worker tick (~60s). The two converge but not in lockstep.

I want to test cap behavior in dev. Set organizations.tier = 'free' on a test org (or flip organization_overage_rules. hard_cap_enabled to true on a paid one) and crank up usage_records via the smoke script. The aggregator's next tick (within 60s) will stamp the cap markers and the gate will start refusing.

Webhook fired twice for the same cap crossing. Shouldn't happen under normal operation — the worker's one-shot dedupe (WHERE soft_cap_warned_at IS NULL / WHERE hard_cap_hit_at IS NULL) guarantees at-most-once. If you see it, check whether multiple api replicas are running aggregator workers (single-replica is the documented posture); the next worker tick after a duplicate fire won't re-fire on the same row.