all systems operationalv0.17.10
stech/

Policy and guardrails

Agent authors declare guardrails as an array of strings on defineAgent(). The platform parses them at deploy time, refuses unknown shapes with a 400, and enforces every parsed entry in both the dev runtime and the cloud runtime. A run that fires a guardrail terminates with a blocked:<kind> stop reason, a structured blocked envelope on the response, a row in the guardrail_violations audit table, an audit.flagged webhook, and a parallel agent_run.blocked webhook on the dedicated SIEM channel.

This page is the catalog: what each guardrail does, where it fires, the wire shape of a block, and how to subscribe to or query the audit trail.

Authoring #

Guardrails live on the AgentConfig object passed to defineAgent():

import { defineAgent, tool } from "@stech/sdk";

export default defineAgent({
  name: "support-triage",
  model: "claude-sonnet-4-5",
  instructions: "Help users triage open tickets.",
  tools: [
    tool("ticket.lookup", ticketSchema),
    tool("crm.lookup", crmSchema),
  ],
  guardrails: [
    "pii.redact",
    "rate:10/min",
    "max_tokens=4096",
    "input_max_chars=8000",
    "require_tool_allowlist=ticket.lookup,crm.lookup",
  ],
});

The strings parse into a closed union (see cli/src/sdk/guardrails.ts). Adding a new kind is a four-step coordinated change: extend the union, add a parser case, add checker functions in both runtimes, and document it here. Custom custom:<name> opaque shapes are deliberately refused for v1 — letting an author smuggle a policy claim that the runtime can't actually enforce is worse than refusing the unknown string outright.

Validation #

Two gates catch malformed guardrail strings before they reach the cloud runtime:

  • stech dev boot. AgentRunner construction calls installGuardrails() and hard-fails on bad-config strings (e.g. rate:10/foobar, max_tokens=-1). It warns and continues on unknown-kind strings (e.g. pii.shred) so a future SDK shape added in cloud doesn't brick a stale local checkout.
  • stech deploy. The api validates the guardrails form field with the SDK-mirrored parser and refuses the deploy with 400 invalid_guardrails plus a detail listing every bad string and the full menu of accepted shapes.

The mirror tests (api/tests/guardrails.test.ts, runtime/tests/guardrails-mirror.test.ts) pin the api + runtime parsers to the SDK parser on a representative happy + sad input set so the three sides cannot drift.

Catalog #

Eight guardrail kinds in v1. Every one has a string syntax, a runtime seam, a BlockedDetails shape, and a worked example.

pii.redact #

Strip emails, US SSNs, and US/E.164 phone numbers from user prompts before they reach the model. Regex-based — fast, deterministic, with a documented false-positive cost (matches inside code samples or API docs the agent is summarizing).

Field Value
String form pii.redact
Seam user-prompt entry, before message append
Action rewrites the prompt in place; does not block the run
BlockedDetails n/a — this guardrail observes, it doesn't terminate
Audit signal runtime emits a guardrail SSE event (no audit row, no webhook)
guardrails: ["pii.redact"]

A prompt of email me at [email protected] becomes email me at [REDACTED:email] before the model sees it. The original text is not persisted anywhere — once redacted, the cleaned prompt is what lands in agent_messages.user_text.

LLM-judged redaction (more accurate, more expensive) is a future opt-in shape; the regex catches the common cases and is cheap enough to run on every turn.

rate:N/UNIT #

Cap ask() calls to N per rolling UNIT window. Per AgentRunner instance in dev; per runtime machine in cloud.

Field Value
String form rate:N/{sec,min,hour}
Seam per-iteration boundary, before the first provider.generate()
Action terminates with stopReason='blocked:rate'
limit configured N
observed actual in-window request count at trip time
guardrails: ["rate:10/min", "rate:100/hour"]

Multiple rate: entries stack — the strictest one trips first.

Distributed limitation. A customer running multiple Fly machines for one deployment gets N times the configured budget — each machine maintains its own bucket. Redis-backed distributed rate limiting is filed for a follow-up; the in-memory v1 is a per-machine ceiling, not a per-org one.

max_tokens=N #

Per-run cumulative output-token ceiling. Two checks:

  1. Pre-iteration: clamp the provider's max_tokens request to min(provider_default, N - cumulative_output).
  2. Post-iteration: recordUsage() checks cumulative_output > N and terminates if a provider returned more than the clamped request.
Field Value
String form max_tokens=N (positive integer)
Seam both pre-call (clamp) and post-call (verify)
Action terminates with stopReason='blocked:max_tokens'
limit configured N
observed cumulative output tokens at trip time
guardrails: ["max_tokens=4096"]

The post-iteration check is defense in depth — Anthropic clamps internally, but a future provider may not.

max_cost=N #

Per-run USD spend ceiling, in micro-cents (1_000_000 micros == 1 cent == $0.01). Tracked via the runtime's recordCost() seam. Cost is computed by the caller from usage + a model-pricing table — the runtime doesn't know prices.

Field Value
String form max_cost=N (positive integer micros)
Seam per-iteration recordCost()
Action terminates with stopReason='blocked:max_cost'
limit configured N (micros)
observed cumulative micros at trip time
guardrails: ["max_cost=10000"]
// → 10000 micros == 0.01 cent == cap of $0.0001 per run

The micro-cent unit avoids floating-point drift on multi-iteration runs and matches the resolution of the billing aggregator's usage_records table.

block_models=PAT,... #

Refuse to even start the run if agent.model matches any pattern. Patterns are simple anchored globs — only * is supported as a wildcard, and the match is against the full model string. claude-* does NOT match my-claude-3 because both ends are anchored; metachars like . are escaped before compilation so a stray gpt-4.0 cannot widen to gpt-410.

Field Value
String form block_models=pat1,pat2,...
Seam run boot, before provider construction
Action terminates with stopReason='blocked:block_models'
limit null (the trigger is a match, not a numeric ceiling)
observed the offending model string
guardrails: ["block_models=gpt-3.5*,claude-2*"]

Use case: an org policy clamps which model families the agent can target. Forward-compat with PR-3 of the epic, where the same block_models= shape is the surface an admin uses to ban model families across every agent in the org.

require_tool_allowlist=tool_a,tool_b #

Only allow tool calls whose name appears in the comma-separated allowlist. Other tool calls return an is_error tool_result block with a "blocked by policy" message and the model can recover — a refused tool does NOT terminate the run, matching how unknown- tool errors are handled today.

Field Value
String form require_tool_allowlist=tool_a,tool_b,...
Seam tool dispatch, per call
Action refused tool returns an is_error tool_result; run continues
Audit signal runtime emits a guardrail SSE event per refusal
guardrails: ["require_tool_allowlist=ticket.lookup,crm.lookup"]

The model sees the refusal in the next turn's tool-result block and can choose to continue without the blocked tool, retry with a different name, or surrender by returning text.

input_max_chars=N #

Refuse user prompts longer than N characters. Cheap defense against accidental prompt-bombing. The check runs before the pii.redact regex pass, so a 50KB prompt doesn't waste a regex pass before being rejected anyway.

Field Value
String form input_max_chars=N (positive integer)
Seam user-prompt entry, before message append
Action terminates with stopReason='blocked:input_max_chars'
limit configured N
observed actual prompt length
guardrails: ["input_max_chars=8000"]

output_max_chars=N #

Truncate / abort runs whose final assistant text exceeds N characters. Checked at the iteration end, after the model produced its text but before the loop returns the RunResult.

Field Value
String form output_max_chars=N (positive integer)
Seam iteration end, after extractFinalText
Action terminates with stopReason='blocked:output_max_chars'
limit configured N
observed actual final-text length
guardrails: ["output_max_chars=12000"]

Wire shape of a block #

When a guardrail fires, the run terminates cleanly with a structured envelope on both the synchronous /run JSON response and the streaming /run-stream done SSE frame.

Synchronous /run:

{
  "runId": "run_8w3...",
  "stopReason": "blocked:max_tokens",
  "finalText": "",
  "iterations": 2,
  "usage": { "input": 4218, "output": 4521 },
  "blocked": {
    "guardrail": "max_tokens",
    "limit": 4096,
    "observed": 4521,
    "source": "agent",
    "message": "cumulative output 4521 tokens > guardrail max_tokens=4096"
  }
}

Streaming /run-stream done frame:

{
  "type": "done",
  "runId": "run_8w3...",
  "stopReason": "blocked:max_tokens",
  "finalText": "",
  "iterations": 2,
  "usage": { "input": 4218, "output": 4521 },
  "messages": [ /* ... */ ],
  "blocked": {
    "guardrail": "max_tokens",
    "limit": 4096,
    "observed": 4521,
    "source": "agent",
    "message": "cumulative output 4521 tokens > guardrail max_tokens=4096"
  }
}

The blocked key is omitted when the run is not blocked — wire shape stable for the normal path. Receivers that want to detect a block branch on blocked != null, not on a stop-reason prefix.

source is "agent" for v1 — every guardrail in v1 originates from the agent's declared guardrails array. The field exists today so a future org-policy override surface widens to "org_policy" without a wire-shape change.

limit is null for guardrails where the trigger is a match, not a numeric ceiling (pii.redact, block_models).

observed is heterogeneous: number for size/count checks (rate, max_tokens, input_max_chars, output_max_chars, max_cost), string for name-match checks (block_models, require_tool_allowlist), null for boolean-style triggers.

Dashboard #

/settings/guardrails (org-admin link in the settings sidebar). Server-rendered table of every blocked run in the org, with a stat strip + per-agent + per-guardrail filters + cursor-paginated "load more". Same chrome as the audit page.

The table sources from the guardrail_violations audit table — one row per terminal block — written by the api proxy in the same code path that fans out the agent_run.completed/failed/cancelled webhooks. The writer is fire-and-forget: a write failure logs and swallows; the run is unaffected.

API #

GET /v1/orgs/:slug/guardrail-violations #

Paginated org-scoped list of guardrail violations. Same auth posture as the audit-log viewer (any org member; product policy is full audit transparency).

Query param Type Default Meaning
limit int 50 Page size, clamped to [1, 200]
cursor base64 string none Opaque cursor from a prior nextCursor
agentId string none Narrow to one deployment
guardrail string none Narrow to one guardrail kind (max_tokens, rate, …)

Response:

{
  "violations": [
    {
      "id": "gv_2k3...",
      "agentId": "dep_4kp...",
      "agentName": "support-triage",
      "runId": "run_8w3...",
      "guardrailKind": "max_tokens",
      "source": "agent",
      "limit": "4096",
      "observed": 4521,
      "message": "cumulative output 4521 tokens > guardrail max_tokens=4096",
      "stopReason": "blocked:max_tokens",
      "occurredAt": "2026-05-08T14:09:51.103Z"
    }
    // ...
  ],
  "nextCursor": "eyJ0cyI6IjIwMjYtMDUtMDhUMTQ6MDk6NTEuMTAzWiIsImlkIjoiZ3ZfMmszIn0",
  "aggregations": {
    "total": 47,
    "byGuardrail": [
      { "guardrail": "max_tokens", "count": 31 },
      { "guardrail": "rate", "count": 12 },
      { "guardrail": "block_models", "count": 4 }
    ]
  }
}

nextCursor is null when there are no more rows. limit is a string on the wire (the audit table stores it as text to keep heterogeneous values in one column); coerce with parseInt if you need a number. observed is number | string | null.

curl -fsSL "https://api.stech.com/v1/orgs/$ORG/guardrail-violations?guardrail=max_tokens&limit=100" \
  -H "Authorization: Bearer $STECH_API_KEY"

GET /v1/orgs/:slug/agent-runs/metrics #

The org-metrics endpoint (see observability.md) adds a blockedRuns total + a topGuardrailsByBlocks breakdown so admins can answer "which policy fires the most this week" without hitting the violations route directly.

{
  "totals": {
    "runs": 4218,
    "failedRuns": 47,
    "cancelledRuns": 12,
    "blockedRuns": 31,
    "inputTokens": 18223451,
    "outputTokens": 2811042
  },
  "topGuardrailsByBlocks": [
    { "guardrail": "max_tokens", "blocks": 18 },
    { "guardrail": "rate", "blocks": 8 },
    { "guardrail": "block_models", "blocks": 5 }
  ]
}

?status=blocked on the run history #

/v1/orgs/:slug/agents/:id/runs?status=blocked filters to blocked runs only. The status field on each row is one of completed, failed, cancelled, blockedblocked is disjoint from the other three, and the failedRuns aggregation excludes blocks.

Webhook events #

Two events fire per blocked run, with identical data payloads. Pick whichever channel matches your subscriber:

  • audit.flagged with data.kind = "agent_run_blocked" — aggregate alerting channel. Subscribers already filter on audit.flagged for the failure-rate watchdog (see observability.md) and the SCIM admin alerts; blocked runs slot in behind the data.kind discriminator.
  • agent_run.blocked — dedicated SIEM channel. For consumers that don't want every audit.flagged (which mixes blocked-runs with admin actions + watchdog alerts).

A subscriber to both channels gets two deliveries per blocked run with identical data bytes, identical createdAt, distinct event_id. That is the documented design — belt-and-braces SIEM feeds are intentional, not a bug.

The data payload, identical on both events:

{
  "id": "1f2e3d4c-5b6a-7889-99aa-bbccddeeff00",
  "type": "agent_run.blocked",
  "createdAt": "2026-05-08T14:09:51.103Z",
  "organizationId": "org_2t4b...",
  "data": {
    "kind": "agent_run_blocked",
    "agentId": "dep_4kp...",
    "agentName": "support-triage",
    "runId": "run_8w3...",
    "conversationId": "cnv_7m1...",
    "guardrail": "max_tokens",
    "source": "agent",
    "limit": 4096,
    "observed": 4521,
    "message": "cumulative output 4521 tokens > guardrail max_tokens=4096",
    "stopReason": "blocked:max_tokens"
  }
}

limit is a number on the webhook payload (the api projects it from the runtime envelope's typed BlockedDetails); observed is number | string | null.

Subscribing #

Two endpoint configs — one for each channel:

# Dedicated SIEM channel — only blocked runs.
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints \
  -H "Authorization: Bearer $STECH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://hooks.acme.example/stech-blocks",
    "description": "agent_run.blocked → siem",
    "events": ["agent_run.blocked"]
  }'

# Aggregate channel — failure-rate watchdog + admin actions + blocks.
# Branch on data.kind to route blocks vs other audit.flagged kinds.
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints \
  -H "Authorization: Bearer $STECH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://hooks.acme.example/stech-audit",
    "description": "audit.flagged → slack",
    "events": ["audit.flagged"]
  }'

Wildcard (["*"]) subscriptions match agent_run.blocked and audit.flagged automatically.

The full create / verify / rotate flow is in webhooks.md — signing scheme, signed-body verification, retry policy, dedupe on event.id.

Debugging a blocked run #

Three surfaces, three answers to "why did this run terminate".

The run itself. The synchronous /run JSON or the streaming done SSE frame carries the full blocked envelope inline. The message field is the human-readable explanation; guardrail + limit + observed are the structured fields a dashboard renders.

The audit table. GET /v1/orgs/:slug/guardrail-violations lists every block in the org, filterable by agent + guardrail kind. Useful when the runtime caller already disconnected and you want to know "what blocked yesterday afternoon's runs".

The dashboard. /settings/guardrails is the same data with chrome — stat strip, agent filter, load-more.

If the run does not appear in any of those:

  • Did the runtime fire done? A runtime crash mid-stream never terminates with a blocked: reason — it just dies. Check the api logs for [agents] stream persist failed lines.
  • Did the parser refuse the guardrail at deploy time? A 400 invalid_guardrails from stech deploy means the runtime never received the policy. Re-run the deploy and read the detail array.
  • Is the dev runtime catching it but the cloud isn't? The dev runtime hard-fails on bad-config strings at boot; the cloud warns and drops them so a stale config slips through doesn't brick a user-visible agent. Grep the runtime logs for [guardrails] dropping bad-config string.

Failure-rate exclusion #

A blocked run is a policy success, not an agent quality failure. The failure-rate watchdog in api/src/lib/agent-failure-alert.ts excludes blocked runs from both numerator and denominator (same shape as cancelled-run exclusion in #289 PR-3). The AgentFailureAlertPayload includes a blockedCount field so the alert receiver can see "the agent is at 22% failure rate AND has 31 blocks this window" if both signals are firing simultaneously.

failedExpr in api/src/routes/agent-runs.ts classifies any stop_reason starting with blocked: as not failed. Any new guardrail kind added to the catalog is automatically excluded — the prefix match is over-conservative on purpose so the runtime can ship new kinds before the api knows about them.

v1 limitations #

  • In-memory rate limiting. A customer running multiple Fly machines for one deployment gets N times the configured rate budget — each machine maintains its own bucket. Redis-backed distributed rate limiting is filed for a follow-up.
  • Regex-based PII redaction. Catches the common shapes (email, US SSN, US/E.164 phone) but has false positives inside code samples or API docs the agent is summarizing. LLM-judged redaction is a future opt-in shape.
  • No org-policy override. Guardrails today are agent-author- declared only. An org-admin override surface that clamps what a less-trusted author can opt out of is filed for a follow-up; the source: "agent" | "org_policy" field on the wire shape is forward-compat for it.
  • No LLM-judged guardrails. Using a small model to detect prompt injection / jailbreak attempts is a separate epic — real value, but expensive (per-call inference cost) and complicated to make deterministic.
  • No per-tool guardrails. "block specific tool calls (no gh repo delete)" is a tool-policy concern; the seam is dispatchTool, not the policy engine. Filed separately.
  • No custom user-authored guardrail functions. An extension surface where customers ship code that runs in our runtime is a v2 concern after we know what customers actually want to extend.
  • max_cost requires caller-side pricing. The runtime doesn't know model prices. The recordCost() seam is wired but no callers feed it yet — a future PR adds a model-pricing table and wires the per-iteration cost computation.
  • Agent runs — cancellation — the parallel status-bucket pattern for cancellations. Same exclusion shape on the failure-rate watchdog; same agent_run.cancelled / agent_run.blocked dual-event posture.
  • ObservabilitytopGuardrailsByBlocks aggregation, ?status=blocked filter, the failure-rate watchdog's blocked-exclusion behavior.
  • Webhooksaudit.flagged envelope, signing scheme, retry policy. The agent_run_blocked payload above is one of the curated audit.flagged data.kind values.
  • Audit log — the broader audit surface; the guardrail_violations table is a per-block audit trail that cross-references with the audit log via agentId + runId.