Policy and guardrails

Agent authors declare guardrails as an array of strings on defineAgent(). The platform parses them at deploy time, refuses unknown shapes with a 400, and enforces every parsed entry in both the dev runtime and the cloud runtime. A run that fires a guardrail terminates with a blocked:<kind> stop reason, a structured blocked envelope on the response, a row in the guardrail_violations audit table, an audit.flagged webhook, and a parallel agent_run.blocked webhook on the dedicated SIEM channel.

This page is the catalog: what each guardrail does, where it fires, the wire shape of a block, and how to subscribe to or query the audit trail.

Authoring #

Guardrails live on the AgentConfig object passed to defineAgent():

import { defineAgent, tool } from "@stech/sdk";

export default defineAgent({
  name: "support-triage",
  model: "claude-sonnet-4-5",
  instructions: "Help users triage open tickets.",
  tools: [
    tool("ticket.lookup", ticketSchema),
    tool("crm.lookup", crmSchema),
  ],
  guardrails: [
    "pii.redact",
    "rate:10/min",
    "max_tokens=4096",
    "input_max_chars=8000",
    "require_tool_allowlist=ticket.lookup,crm.lookup",
  ],
});

The strings parse into a closed union (see cli/src/sdk/guardrails.ts). Adding a new kind is a four-step coordinated change: extend the union, add a parser case, add checker functions in both runtimes, and document it here. Custom custom:<name> opaque shapes are deliberately refused for v1 — letting an author smuggle a policy claim that the runtime can't actually enforce is worse than refusing the unknown string outright.

Validation #

Two gates catch malformed guardrail strings before they reach the cloud runtime:

stech dev boot. AgentRunner construction calls installGuardrails() and hard-fails on bad-config strings (e.g. rate:10/foobar, max_tokens=-1). It warns and continues on unknown-kind strings (e.g. pii.shred) so a future SDK shape added in cloud doesn't brick a stale local checkout.
stech deploy. The api validates the guardrails form field with the SDK-mirrored parser and refuses the deploy with 400 invalid_guardrails plus a detail listing every bad string and the full menu of accepted shapes.

The mirror tests (api/tests/guardrails.test.ts, runtime/tests/guardrails-mirror.test.ts) pin the api + runtime parsers to the SDK parser on a representative happy + sad input set so the three sides cannot drift.

Catalog #

Eight guardrail kinds in v1. Every one has a string syntax, a runtime seam, a BlockedDetails shape, and a worked example.

`pii.redact` #

Strip emails, US SSNs, and US/E.164 phone numbers from user prompts before they reach the model. Regex-based — fast, deterministic, with a documented false-positive cost (matches inside code samples or API docs the agent is summarizing).

Field	Value
String form	`pii.redact`
Seam	user-prompt entry, before message append
Action	rewrites the prompt in place; does not block the run
`BlockedDetails`	n/a — this guardrail observes, it doesn't terminate
Audit signal	runtime emits a `guardrail` SSE event (no audit row, no webhook)

guardrails: ["pii.redact"]

A prompt of email me at [email protected] becomes email me at [REDACTED:email] before the model sees it. The original text is not persisted anywhere — once redacted, the cleaned prompt is what lands in agent_messages.user_text.

LLM-judged redaction (more accurate, more expensive) is a future opt-in shape; the regex catches the common cases and is cheap enough to run on every turn.

`rate:N/UNIT` #

Cap ask() calls to N per rolling UNIT window. Per AgentRunner instance in dev; per runtime machine in cloud.

Field	Value
String form	`rate:N/{sec,min,hour}`
Seam	per-iteration boundary, before the first `provider.generate()`
Action	terminates with `stopReason='blocked:rate'`
`limit`	configured `N`
`observed`	actual in-window request count at trip time

guardrails: ["rate:10/min", "rate:100/hour"]

Multiple rate: entries stack — the strictest one trips first.

Distributed limitation. A customer running multiple Fly machines for one deployment gets N times the configured budget — each machine maintains its own bucket. Redis-backed distributed rate limiting is filed for a follow-up; the in-memory v1 is a per-machine ceiling, not a per-org one.

`max_tokens=N` #

Per-run cumulative output-token ceiling. Two checks:

Pre-iteration: clamp the provider's max_tokens request to min(provider_default, N - cumulative_output).
Post-iteration: recordUsage() checks cumulative_output > N and terminates if a provider returned more than the clamped request.

Field	Value
String form	`max_tokens=N` (positive integer)
Seam	both pre-call (clamp) and post-call (verify)
Action	terminates with `stopReason='blocked:max_tokens'`
`limit`	configured `N`
`observed`	cumulative output tokens at trip time

guardrails: ["max_tokens=4096"]

The post-iteration check is defense in depth — Anthropic clamps internally, but a future provider may not.

`max_cost=N` #

Per-run USD spend ceiling, in micro-cents (1_000_000 micros == 1 cent == $0.01). Tracked via the runtime's recordCost() seam. Cost is computed by the caller from usage + a model-pricing table — the runtime doesn't know prices.

Field	Value
String form	`max_cost=N` (positive integer micros)
Seam	per-iteration `recordCost()`
Action	terminates with `stopReason='blocked:max_cost'`
`limit`	configured `N` (micros)
`observed`	cumulative micros at trip time

guardrails: ["max_cost=10000"]
// → 10000 micros == 0.01 cent == cap of $0.0001 per run

The micro-cent unit avoids floating-point drift on multi-iteration runs and matches the resolution of the billing aggregator's usage_records table.

`block_models=PAT,...` #

Refuse to even start the run if agent.model matches any pattern. Patterns are simple anchored globs — only * is supported as a wildcard, and the match is against the full model string. claude-* does NOT match my-claude-3 because both ends are anchored; metachars like . are escaped before compilation so a stray gpt-4.0 cannot widen to gpt-410.

Field	Value
String form	`block_models=pat1,pat2,...`
Seam	run boot, before provider construction
Action	terminates with `stopReason='blocked:block_models'`
`limit`	`null` (the trigger is a match, not a numeric ceiling)
`observed`	the offending model string

guardrails: ["block_models=gpt-3.5*,claude-2*"]

Use case: an org policy clamps which model families the agent can target. Forward-compat with PR-3 of the epic, where the same block_models= shape is the surface an admin uses to ban model families across every agent in the org.

`require_tool_allowlist=tool_a,tool_b` #

Only allow tool calls whose name appears in the comma-separated allowlist. Other tool calls return an is_error tool_result block with a "blocked by policy" message and the model can recover — a refused tool does NOT terminate the run, matching how unknown- tool errors are handled today.

Field	Value
String form	`require_tool_allowlist=tool_a,tool_b,...`
Seam	tool dispatch, per call
Action	refused tool returns an `is_error` `tool_result`; run continues
Audit signal	runtime emits a `guardrail` SSE event per refusal

guardrails: ["require_tool_allowlist=ticket.lookup,crm.lookup"]

The model sees the refusal in the next turn's tool-result block and can choose to continue without the blocked tool, retry with a different name, or surrender by returning text.

`input_max_chars=N` #

Refuse user prompts longer than N characters. Cheap defense against accidental prompt-bombing. The check runs before the pii.redact regex pass, so a 50KB prompt doesn't waste a regex pass before being rejected anyway.

Field	Value
String form	`input_max_chars=N` (positive integer)
Seam	user-prompt entry, before message append
Action	terminates with `stopReason='blocked:input_max_chars'`
`limit`	configured `N`
`observed`	actual prompt length

guardrails: ["input_max_chars=8000"]

`output_max_chars=N` #

Truncate / abort runs whose final assistant text exceeds N characters. Checked at the iteration end, after the model produced its text but before the loop returns the RunResult.

Field	Value
String form	`output_max_chars=N` (positive integer)
Seam	iteration end, after `extractFinalText`
Action	terminates with `stopReason='blocked:output_max_chars'`
`limit`	configured `N`
`observed`	actual final-text length

guardrails: ["output_max_chars=12000"]

Wire shape of a block #

When a guardrail fires, the run terminates cleanly with a structured envelope on both the synchronous /run JSON response and the streaming /run-stream done SSE frame.

Synchronous /run:

{
  "runId": "run_8w3...",
  "stopReason": "blocked:max_tokens",
  "finalText": "",
  "iterations": 2,
  "usage": { "input": 4218, "output": 4521 },
  "blocked": {
    "guardrail": "max_tokens",
    "limit": 4096,
    "observed": 4521,
    "source": "agent",
    "message": "cumulative output 4521 tokens > guardrail max_tokens=4096"
  }
}

Streaming /run-stream done frame:

{
  "type": "done",
  "runId": "run_8w3...",
  "stopReason": "blocked:max_tokens",
  "finalText": "",
  "iterations": 2,
  "usage": { "input": 4218, "output": 4521 },
  "messages": [ /* ... */ ],
  "blocked": {
    "guardrail": "max_tokens",
    "limit": 4096,
    "observed": 4521,
    "source": "agent",
    "message": "cumulative output 4521 tokens > guardrail max_tokens=4096"
  }
}

The blocked key is omitted when the run is not blocked — wire shape stable for the normal path. Receivers that want to detect a block branch on blocked != null, not on a stop-reason prefix.

source is "agent" for v1 — every guardrail in v1 originates from the agent's declared guardrails array. The field exists today so a future org-policy override surface widens to "org_policy" without a wire-shape change.

limit is null for guardrails where the trigger is a match, not a numeric ceiling (pii.redact, block_models).

observed is heterogeneous: number for size/count checks (rate, max_tokens, input_max_chars, output_max_chars, max_cost), string for name-match checks (block_models, require_tool_allowlist), null for boolean-style triggers.

Dashboard #

/settings/guardrails (org-admin link in the settings sidebar). Server-rendered table of every blocked run in the org, with a stat strip + per-agent + per-guardrail filters + cursor-paginated "load more". Same chrome as the audit page.

The table sources from the guardrail_violations audit table — one row per terminal block — written by the api proxy in the same code path that fans out the agent_run.completed/failed/cancelled webhooks. The writer is fire-and-forget: a write failure logs and swallows; the run is unaffected.

API #

`GET /v1/orgs/:slug/guardrail-violations` #

Paginated org-scoped list of guardrail violations. Same auth posture as the audit-log viewer (any org member; product policy is full audit transparency).

Query param	Type	Default	Meaning
`limit`	int	50	Page size, clamped to [1, 200]
`cursor`	base64 string	none	Opaque cursor from a prior `nextCursor`
`agentId`	string	none	Narrow to one deployment
`guardrail`	string	none	Narrow to one guardrail kind (`max_tokens`, `rate`, …)

Response:

{
  "violations": [
    {
      "id": "gv_2k3...",
      "agentId": "dep_4kp...",
      "agentName": "support-triage",
      "runId": "run_8w3...",
      "guardrailKind": "max_tokens",
      "source": "agent",
      "limit": "4096",
      "observed": 4521,
      "message": "cumulative output 4521 tokens > guardrail max_tokens=4096",
      "stopReason": "blocked:max_tokens",
      "occurredAt": "2026-05-08T14:09:51.103Z"
    }
    // ...
  ],
  "nextCursor": "eyJ0cyI6IjIwMjYtMDUtMDhUMTQ6MDk6NTEuMTAzWiIsImlkIjoiZ3ZfMmszIn0",
  "aggregations": {
    "total": 47,
    "byGuardrail": [
      { "guardrail": "max_tokens", "count": 31 },
      { "guardrail": "rate", "count": 12 },
      { "guardrail": "block_models", "count": 4 }
    ]
  }
}

nextCursor is null when there are no more rows. limit is a string on the wire (the audit table stores it as text to keep heterogeneous values in one column); coerce with parseInt if you need a number. observed is number | string | null.

curl -fsSL "https://api.stech.com/v1/orgs/$ORG/guardrail-violations?guardrail=max_tokens&limit=100" \
  -H "Authorization: Bearer $STECH_API_KEY"

`GET /v1/orgs/:slug/agent-runs/metrics` #

The org-metrics endpoint (see observability.md) adds a blockedRuns total + a topGuardrailsByBlocks breakdown so admins can answer "which policy fires the most this week" without hitting the violations route directly.

{
  "totals": {
    "runs": 4218,
    "failedRuns": 47,
    "cancelledRuns": 12,
    "blockedRuns": 31,
    "inputTokens": 18223451,
    "outputTokens": 2811042
  },
  "topGuardrailsByBlocks": [
    { "guardrail": "max_tokens", "blocks": 18 },
    { "guardrail": "rate", "blocks": 8 },
    { "guardrail": "block_models", "blocks": 5 }
  ]
}

`?status=blocked` on the run history #

/v1/orgs/:slug/agents/:id/runs?status=blocked filters to blocked runs only. The status field on each row is one of completed, failed, cancelled, blocked — blocked is disjoint from the other three, and the failedRuns aggregation excludes blocks.

Webhook events #

Two events fire per blocked run, with identical data payloads. Pick whichever channel matches your subscriber:

audit.flagged with data.kind = "agent_run_blocked" — aggregate alerting channel. Subscribers already filter on audit.flagged for the failure-rate watchdog (see observability.md) and the SCIM admin alerts; blocked runs slot in behind the data.kind discriminator.
agent_run.blocked — dedicated SIEM channel. For consumers that don't want every audit.flagged (which mixes blocked-runs with admin actions + watchdog alerts).

A subscriber to both channels gets two deliveries per blocked run with identical data bytes, identical createdAt, distinct event_id. That is the documented design — belt-and-braces SIEM feeds are intentional, not a bug.

The data payload, identical on both events:

{
  "id": "1f2e3d4c-5b6a-7889-99aa-bbccddeeff00",
  "type": "agent_run.blocked",
  "createdAt": "2026-05-08T14:09:51.103Z",
  "organizationId": "org_2t4b...",
  "data": {
    "kind": "agent_run_blocked",
    "agentId": "dep_4kp...",
    "agentName": "support-triage",
    "runId": "run_8w3...",
    "conversationId": "cnv_7m1...",
    "guardrail": "max_tokens",
    "source": "agent",
    "limit": 4096,
    "observed": 4521,
    "message": "cumulative output 4521 tokens > guardrail max_tokens=4096",
    "stopReason": "blocked:max_tokens"
  }
}

limit is a number on the webhook payload (the api projects it from the runtime envelope's typed BlockedDetails); observed is number | string | null.

Subscribing #

Two endpoint configs — one for each channel:

# Dedicated SIEM channel — only blocked runs.
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints \
  -H "Authorization: Bearer $STECH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://hooks.acme.example/stech-blocks",
    "description": "agent_run.blocked → siem",
    "events": ["agent_run.blocked"]
  }'

# Aggregate channel — failure-rate watchdog + admin actions + blocks.
# Branch on data.kind to route blocks vs other audit.flagged kinds.
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints \
  -H "Authorization: Bearer $STECH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://hooks.acme.example/stech-audit",
    "description": "audit.flagged → slack",
    "events": ["audit.flagged"]
  }'

Wildcard (["*"]) subscriptions match agent_run.blocked and audit.flagged automatically.

The full create / verify / rotate flow is in webhooks.md — signing scheme, signed-body verification, retry policy, dedupe on event.id.

Debugging a blocked run #

Three surfaces, three answers to "why did this run terminate".

The run itself. The synchronous /run JSON or the streaming done SSE frame carries the full blocked envelope inline. The message field is the human-readable explanation; guardrail + limit + observed are the structured fields a dashboard renders.

The audit table. GET /v1/orgs/:slug/guardrail-violations lists every block in the org, filterable by agent + guardrail kind. Useful when the runtime caller already disconnected and you want to know "what blocked yesterday afternoon's runs".

The dashboard. /settings/guardrails is the same data with chrome — stat strip, agent filter, load-more.

If the run does not appear in any of those:

Did the runtime fire done? A runtime crash mid-stream never terminates with a blocked: reason — it just dies. Check the api logs for [agents] stream persist failed lines.
Did the parser refuse the guardrail at deploy time? A 400 invalid_guardrails from stech deploy means the runtime never received the policy. Re-run the deploy and read the detail array.
Is the dev runtime catching it but the cloud isn't? The dev runtime hard-fails on bad-config strings at boot; the cloud warns and drops them so a stale config slips through doesn't brick a user-visible agent. Grep the runtime logs for [guardrails] dropping bad-config string.

Failure-rate exclusion #

A blocked run is a policy success, not an agent quality failure. The failure-rate watchdog in api/src/lib/agent-failure-alert.ts excludes blocked runs from both numerator and denominator (same shape as cancelled-run exclusion in #289 PR-3). The AgentFailureAlertPayload includes a blockedCount field so the alert receiver can see "the agent is at 22% failure rate AND has 31 blocks this window" if both signals are firing simultaneously.

failedExpr in api/src/routes/agent-runs.ts classifies any stop_reason starting with blocked: as not failed. Any new guardrail kind added to the catalog is automatically excluded — the prefix match is over-conservative on purpose so the runtime can ship new kinds before the api knows about them.

v1 limitations #

In-memory rate limiting. A customer running multiple Fly machines for one deployment gets N times the configured rate budget — each machine maintains its own bucket. Redis-backed distributed rate limiting is filed for a follow-up.
Regex-based PII redaction. Catches the common shapes (email, US SSN, US/E.164 phone) but has false positives inside code samples or API docs the agent is summarizing. LLM-judged redaction is a future opt-in shape.
No org-policy override. Guardrails today are agent-author- declared only. An org-admin override surface that clamps what a less-trusted author can opt out of is filed for a follow-up; the source: "agent" | "org_policy" field on the wire shape is forward-compat for it.
No LLM-judged guardrails. Using a small model to detect prompt injection / jailbreak attempts is a separate epic — real value, but expensive (per-call inference cost) and complicated to make deterministic.
No per-tool guardrails. "block specific tool calls (no gh repo delete)" is a tool-policy concern; the seam is dispatchTool, not the policy engine. Filed separately.
No custom user-authored guardrail functions. An extension surface where customers ship code that runs in our runtime is a v2 concern after we know what customers actually want to extend.
max_cost requires caller-side pricing. The runtime doesn't know model prices. The recordCost() seam is wired but no callers feed it yet — a future PR adds a model-pricing table and wires the per-iteration cost computation.

Agent runs — cancellation — the parallel status-bucket pattern for cancellations. Same exclusion shape on the failure-rate watchdog; same agent_run.cancelled / agent_run.blocked dual-event posture.
Observability — topGuardrailsByBlocks aggregation, ?status=blocked filter, the failure-rate watchdog's blocked-exclusion behavior.
Webhooks — audit.flagged envelope, signing scheme, retry policy. The agent_run_blocked payload above is one of the curated audit.flagged data.kind values.
Audit log — the broader audit surface; the guardrail_violations table is a per-block audit trail that cross-references with the audit log via agentId + runId.

edit this page on github →

Policy and guardrails

Authoring #

Validation #

Catalog #

pii.redact #

rate:N/UNIT #

max_tokens=N #

max_cost=N #

block_models=PAT,... #

require_tool_allowlist=tool_a,tool_b #

input_max_chars=N #

output_max_chars=N #