Webhooks

Outbound HTTP notifications when something happens in your org — deployments, agent runs, SCIM identity changes, admin actions. Replaces polling: subscribe once and receive a signed POST per event. Stripe-shape signing, six-step exponential-backoff retry, auto-disable on sustained failure.

Quick start #

# 1. create the endpoint — admin or owner only
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints \
  -H "Authorization: Bearer $STECH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://hooks.acme.example/stech",
    "description": "prod alerting",
    "events": ["deployment.failed", "agent_run.failed"]
  }'
# → { "endpoint": { ... }, "signingSecret": "whsec_<64-hex>" }

# 2. paste signingSecret into your receiver's env. The api never returns
#    it again — rotate (below) if you lose it.

# 3. verify a delivery hits your endpoint (the dashboard delivery log at
#    /settings/webhooks/<id>/deliveries shows POST status + body).

# 4. rotate later if needed
curl -fsSL -X POST \
  https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/rotate-secret \
  -H "Authorization: Bearer $STECH_API_KEY"
# → { "endpoint": { ... }, "signingSecret": "whsec_<new-64-hex>" }

A minimal Express receiver that verifies + responds 200:

import express from "express";
import { createHmac, timingSafeEqual } from "node:crypto";

const app = express();
const SECRET = process.env.STECH_WEBHOOK_SECRET!;

// IMPORTANT: read the raw body bytes. Parsing to JSON before HMAC will
// re-serialize and produce a different byte sequence — signature will
// never verify.
app.post("/stech", express.raw({ type: "application/json" }), (req, res) => {
  const ts = req.get("X-Stech-Timestamp");
  const sig = req.get("X-Stech-Signature");
  if (!ts || !sig) return res.status(400).send("missing headers");

  const tsNum = Number(ts);
  if (!Number.isFinite(tsNum) || Math.abs(Date.now() / 1000 - tsNum) > 300) {
    return res.status(400).send("replay window");
  }

  // Two-call .update() — keeps the body bytes uninterpreted. A
  // template-literal `${ts}.${req.body}` would coerce the Buffer
  // through .toString() (UTF-8) and silently break for any byte
  // sequence that isn't valid UTF-8.
  const expected =
    "sha256=" +
    createHmac("sha256", SECRET)
      .update(`${ts}.`)
      .update(req.body)
      .digest("hex");
  const a = Buffer.from(expected);
  const b = Buffer.from(sig);
  if (a.length !== b.length || !timingSafeEqual(a, b)) {
    return res.status(400).send("bad signature");
  }

  const event = JSON.parse(req.body.toString());
  console.log(event.type, event.id, event.data);
  res.status(200).send("ok");
});

Event catalog #

Every event body has the same envelope:

{
  "id": "<uuid>",
  "type": "deployment.created",
  "createdAt": "2026-05-08T12:34:56.789Z",
  "organizationId": "<cuid2>",
  "data": { ... }
}

id is server-minted at fan-out time and is stable across retries and across multiple subscribed endpoints — dedupe on it. type is one of the values below. createdAt is ISO-8601 UTC. data shape is event-specific:

Type	Trigger	`data` keys
`deployment.created`	provisioner reports a new live deploy	`deploymentId`, `agentId`, `agentName`, `machineUrl`, `deployedByUserId`
`deployment.failed`	provisioner gives up on a build, or throws	`deploymentId`, `agentId`, `agentName`, `reason`, `deployedByUserId`
`agent_run.completed`	streaming `/run` produced a `done` frame	`runId`, `deploymentId`, `agentName`, `conversationId`, `stopReason`, `finalText`, `iterations`, `usage.input`, `usage.output`
`agent_run.failed`	streaming `/run` ended without a `done` frame	`runId`, `deploymentId`, `agentName`, `conversationId`, `reason`
`agent_run.cancelled`	streaming `/run` ended with `stop_reason='cancelled'` — see agent-runs.md	`runId`, `deploymentId`, `agentName`, `conversationId`, `stopReason`, `iterations`, `usage.input`, `usage.output`, `cancelledByUserId`, `cancellationReason`, `requestedAt`, `acknowledgedAt`
`agent_run.blocked`	streaming `/run` ended with `stop_reason='blocked:<kind>'` — a guardrail terminated the run. Same payload bytes as the `audit.flagged` `data.kind='agent_run_blocked'` envelope; pick this channel to skip the noisier aggregate. See policy-and-guardrails.md	`kind`, `agentId`, `agentName`, `runId`, `conversationId`, `guardrail`, `source`, `limit`, `observed`, `message`, `stopReason`
`agent_version.deployed`	`stech deploy` minted a new version row (upload time — the deployment is queued, not yet live; pair with `deployment.created` for the live signal)	`agentName`, `versionLabel`, `versionId`, `deploymentId`, `actorUserId`
`agent_version.promoted_to_canary`	the canary channel now points at this deployment (via `stech canary set` / dashboard)	`agentName`, `channelName`, `versionLabel`, `deploymentId`, `trafficWeight`, `actorUserId`, `previousDeploymentId`
`agent_version.promoted_to_stable`	the stable channel now points at this deployment (via `stech canary promote` / dashboard / direct stable PUT)	same shape as `agent_version.promoted_to_canary`
`agent_version.rolled_back`	`stech rollback` reverted stable to a previous version — distinct type from `promoted_to_stable` so PagerDuty / SIEM tooling can branch ("production was reverted" vs "canary graduated")	same shape as `agent_version.promoted_to_canary`, plus `versionId`
`scim.user_added`	SCIM POST /Users created a new identity	`userId`, `externalId`, `email`, `membershipId`
`scim.user_deactivated`	SCIM PUT/PATCH/DELETE flipped a user inactive	`userId`, `externalId`, `email`, `membershipId`, `deactivatedAt`
`scim.group_added`	SCIM POST /Groups created a new group	`groupId`, `externalId`, `displayName`, `memberCount`
`admin_action.recorded`	any row written to `admin_actions` (audit log)	`actionId`, `action`, `actorUserId`, `targetType`, `targetId`, `reason`
`audit.flagged`	curated subset of admin actions + watchdog signals worth alerting on (token revocation, OAuth disconnect, source deletion, SSO updates, webhook secret rotation, plan changes, deployment supersede, agent failure-rate alerts, agent-run guardrail blocks). The `data.kind` discriminator routes between them — `agent_failure_rate`, `agent_run_blocked`, etc.	varies by `data.kind`; see linked surfaces

Strings can be null where the upstream column is nullable — deployedByUserId, runId, conversationId, externalId, email, stopReason, iterations, usage.*, actionId, targetType, targetId, reason, previousDeploymentId (null on a fresh channel attach). Receivers should treat the data object as forward- compatible: we may add keys, we won't remove them without a deprecation cycle.

finalText on agent_run.completed is truncated to 500 chars (with a trailing …); the full text lives on the conversation row, not the event.

Subscribing to events #

Each endpoint stores an events array. Two shapes:

{ "events": ["deployment.failed", "agent_run.failed"] }   // exact match
{ "events": ["*"] }                                       // wildcard — every type

Wildcard subscriptions match new event types added in the future automatically. Mix-and-match (["*", "deployment.failed"]) collapses to ["*"] server-side.

Example bodies #

deployment.created:

{
  "id": "0d8c5e44-3e2a-4f8d-9d2e-9b6f8a1f1e25",
  "type": "deployment.created",
  "createdAt": "2026-05-08T14:02:11.482Z",
  "organizationId": "org_2t4b...",
  "data": {
    "deploymentId": "dep_4kp...",
    "agentId": "dep_4kp...",
    "agentName": "support-triage",
    "machineUrl": "https://support-triage-acme.fly.dev",
    "deployedByUserId": "usr_9q1..."
  }
}

agent_run.completed:

{
  "id": "e1c2a16b-8b71-4d88-9d3e-9b8f0d6c5a02",
  "type": "agent_run.completed",
  "createdAt": "2026-05-08T14:09:51.103Z",
  "organizationId": "org_2t4b...",
  "data": {
    "runId": "run_8w3...",
    "deploymentId": "dep_4kp...",
    "agentName": "support-triage",
    "conversationId": "cnv_7m1...",
    "stopReason": "end_turn",
    "finalText": "Ticket 42 was resolved at 13:48 UTC by an SRE rollback…",
    "iterations": 3,
    "usage": { "input": 4218, "output": 612 }
  }
}

scim.user_deactivated:

{
  "id": "f3a8d211-77b6-4e3c-9f12-2d6c0a3e5b91",
  "type": "scim.user_deactivated",
  "createdAt": "2026-05-08T14:15:09.001Z",
  "organizationId": "org_2t4b...",
  "data": {
    "userId": "usr_9q1...",
    "externalId": "okta-00u1abcd",
    "email": "[email protected]",
    "membershipId": "mem_2k3...",
    "deactivatedAt": "2026-05-08T14:15:09.001Z"
  }
}

agent_version.deployed (upload-time — deployment is queued, not yet live; pair with the later deployment.created for the reachability signal):

{
  "id": "8c0b91a4-2147-4a8c-8e09-1d6e2a3f9012",
  "type": "agent_version.deployed",
  "createdAt": "2026-05-09T08:21:11.001Z",
  "organizationId": "org_2t4b...",
  "data": {
    "agentName": "support-triage",
    "versionLabel": "v4",
    "versionId": "ver_8h3...",
    "deploymentId": "dep_4kp...",
    "actorUserId": "usr_9q1..."
  }
}

agent_version.promoted_to_canary (operator pointed canary at v4 with 10% traffic; previousDeploymentId is null on the first attach):

{
  "id": "1a7f3e22-8c91-4a02-b7d4-3e1f8c0a2b54",
  "type": "agent_version.promoted_to_canary",
  "createdAt": "2026-05-09T08:25:42.310Z",
  "organizationId": "org_2t4b...",
  "data": {
    "agentName": "support-triage",
    "channelName": "canary",
    "versionLabel": "v4",
    "deploymentId": "dep_4kp...",
    "trafficWeight": 0.1,
    "actorUserId": "usr_9q1...",
    "previousDeploymentId": null
  }
}

agent_version.rolled_back (operator reverted stable from v4 to v3 — distinct type from agent_version.promoted_to_stable so alerting ("production was reverted") and graduation tracking ("canary became stable on green metrics") don't get confused):

{
  "id": "f92c0a15-3b8d-4e1f-a7c2-9d8e1c5b3027",
  "type": "agent_version.rolled_back",
  "createdAt": "2026-05-09T08:33:15.778Z",
  "organizationId": "org_2t4b...",
  "data": {
    "agentName": "support-triage",
    "channelName": "stable",
    "versionLabel": "v3",
    "versionId": "ver_7g2...",
    "deploymentId": "dep_3jp...",
    "trafficWeight": 1.0,
    "actorUserId": "usr_9q1...",
    "previousDeploymentId": "dep_4kp..."
  }
}

admin_action.recorded (paired audit.flagged for curated verbs):

{
  "id": "9b2e1c44-15a3-4f0e-91d4-7c8a3f1d2e10",
  "type": "admin_action.recorded",
  "createdAt": "2026-05-08T14:22:08.554Z",
  "organizationId": "org_2t4b...",
  "data": {
    "actionId": null,
    "action": "webhook_endpoints.secret_rotated",
    "actorUserId": "usr_9q1...",
    "targetType": "webhook_endpoint",
    "targetId": "whe_3z9...",
    "reason": null
  }
}

Signing scheme #

Every POST carries four headers:

Header	Value
`X-Stech-Event`	the event `type` (also in the body)
`X-Stech-Delivery`	the per-attempt delivery id (changes on retry)
`X-Stech-Timestamp`	unix epoch seconds at sign time
`X-Stech-Signature`	`sha256=<hex>` — see below

The signed string is ${timestamp}.${body} — the literal timestamp header value, a dot, then the raw POST body bytes. The HMAC key is the plaintext signing secret (whsec_…, the value the api returned once on create / rotate). Algorithm is HMAC-SHA256. The header value is the algorithm prefix sha256= concatenated with the lowercase-hex digest:

X-Stech-Signature: sha256=4f2ed3bce1...d7  (64 hex chars after the prefix)

Receivers MUST:

Re-compute the HMAC over the raw body bytes, not the parsed-and-reserialized JSON. JSON re-serialization changes whitespace and key order; the signature won't match.
Compare digests in constant time (e.g. crypto.timingSafeEqual, hmac.compare_digest, subtle.ConstantTimeCompare). A naive == leaks the secret one byte at a time over the network.
Enforce a 5-minute replay window: reject if abs(now - timestamp) > 300. The api side does the same check; the belt-and-suspenders pair stops a captured-and-replayed request from being honored more than once.
Pin the algorithm prefix (sha256=). We can ship sha384= later without breaking pinned receivers.

The plaintext signing secret is stored encrypted at rest (AES-256-GCM) on our side. We surface it exactly once on create and on rotate; lose it and you must rotate.

Verifier snippets #

Three working receivers using only the standard library. Each is ~20 LOC and handles missing headers, the replay window, and the constant-time compare.

TypeScript / Node (Express) #

import { createHmac, timingSafeEqual } from "node:crypto";
import express from "express";

const app = express();
const SECRET = process.env.STECH_WEBHOOK_SECRET!;
const REPLAY_WINDOW_SEC = 300;

// `express.raw` keeps `req.body` as a Buffer of the exact bytes we
// received. Parsing to JSON first would re-encode the bytes and break
// the HMAC.
app.post("/stech", express.raw({ type: "application/json" }), (req, res) => {
  const ts = req.get("X-Stech-Timestamp");
  const sig = req.get("X-Stech-Signature");
  if (!ts || !sig) return res.status(400).send("missing headers");

  const tsNum = Number(ts);
  if (
    !Number.isFinite(tsNum) ||
    Math.abs(Date.now() / 1000 - tsNum) > REPLAY_WINDOW_SEC
  ) {
    return res.status(400).send("replay window");
  }

  // Two-call .update() — keeps the body bytes uninterpreted. A
  // template-literal `${ts}.${req.body}` would coerce the Buffer
  // through .toString() (UTF-8) and silently break for any byte
  // sequence that isn't valid UTF-8.
  const expected =
    "sha256=" +
    createHmac("sha256", SECRET)
      .update(`${ts}.`)
      .update(req.body)
      .digest("hex");

  // Constant-time compare. A naive `===` would short-circuit on the
  // first mismatched byte and leak the secret over network timing.
  const a = Buffer.from(expected);
  const b = Buffer.from(sig);
  if (a.length !== b.length || !timingSafeEqual(a, b)) {
    return res.status(400).send("bad signature");
  }

  const event = JSON.parse(req.body.toString());
  // …handle event.type / event.id / event.data here…
  res.status(200).send("ok");
});

app.listen(8080);

Python (Flask) #

import hmac, hashlib, os, time
from flask import Flask, request, abort

app = Flask(__name__)
SECRET = os.environ["STECH_WEBHOOK_SECRET"].encode()
REPLAY_WINDOW_SEC = 300

@app.post("/stech")
def stech():
    ts = request.headers.get("X-Stech-Timestamp")
    sig = request.headers.get("X-Stech-Signature")
    if not ts or not sig:
        abort(400, "missing headers")

    try:
        ts_num = int(ts)
    except ValueError:
        abort(400, "bad timestamp")
    if abs(time.time() - ts_num) > REPLAY_WINDOW_SEC:
        abort(400, "replay window")

    # Use request.get_data() not request.json — we MUST hash the raw
    # bytes the api signed, not flask's parsed-then-reserialized JSON.
    body = request.get_data()
    mac = hmac.new(SECRET, f"{ts}.".encode() + body, hashlib.sha256)
    expected = "sha256=" + mac.hexdigest()

    # hmac.compare_digest is constant-time; `==` would leak the secret
    # byte-by-byte over network timing.
    if not hmac.compare_digest(expected, sig):
        abort(400, "bad signature")

    import json
    event = json.loads(body)
    # …handle event["type"] / event["id"] / event["data"] here…
    return "ok", 200

Go (net/http) #

package main

import (
	"crypto/hmac"
	"crypto/sha256"
	"encoding/hex"
	"encoding/json"
	"io"
	"net/http"
	"os"
	"strconv"
	"time"
)

// Read at startup — failing fast on a missing env beats every
// signature comparing equal to HMAC("", body) and then mysteriously
// failing the constant-time compare.
var secret = mustEnv("STECH_WEBHOOK_SECRET")

const replayWindowSec = 300

func mustEnv(name string) []byte {
	v := os.Getenv(name)
	if v == "" {
		panic("missing env " + name)
	}
	return []byte(v)
}

func handle(w http.ResponseWriter, r *http.Request) {
	ts := r.Header.Get("X-Stech-Timestamp")
	sig := r.Header.Get("X-Stech-Signature")
	if ts == "" || sig == "" {
		http.Error(w, "missing headers", 400)
		return
	}
	tsNum, err := strconv.ParseInt(ts, 10, 64)
	if err != nil || abs(time.Now().Unix()-tsNum) > replayWindowSec {
		http.Error(w, "replay window", 400)
		return
	}

	// Read the raw body bytes — re-encoding the parsed JSON would
	// produce different bytes than the api signed.
	body, err := io.ReadAll(r.Body)
	if err != nil {
		http.Error(w, "read body", 400)
		return
	}

	mac := hmac.New(sha256.New, secret)
	mac.Write([]byte(ts + "."))
	mac.Write(body)
	expected := "sha256=" + hex.EncodeToString(mac.Sum(nil))

	// hmac.Equal is constant-time; a plain string `==` would leak the
	// secret one byte at a time over the network.
	if !hmac.Equal([]byte(expected), []byte(sig)) {
		http.Error(w, "bad signature", 400)
		return
	}

	var event struct {
		ID, Type string
		Data     json.RawMessage
	}
	_ = json.Unmarshal(body, &event)
	// …handle event.Type / event.ID / event.Data here…
	w.WriteHeader(200)
}

func abs(n int64) int64 {
	if n < 0 {
		return -n
	}
	return n
}

func main() {
	http.HandleFunc("/stech", handle)
	_ = http.ListenAndServe(":8080", nil)
}

Retry policy #

The delivery worker classifies each attempt and retries on transient failures only.

Outcome	Worker behavior
2xx	`delivered`. Endpoint failure counter resets.
408, 429	retryable — schedule next attempt
4xx (anything else)	`gave_up`. We stop retrying this delivery.
3xx	`gave_up` (`redirect_blocked`) — we don't follow redirects, the signed body is bound to the host you registered
5xx	retryable
network error / timeout (30s)	retryable

Backoff schedule, applied after each retryable failure:

Attempt	Wait before next
1 → 2	1 minute
2 → 3	5 minutes
3 → 4	25 minutes
4 → 5	2 hours
5 → 6	12 hours
6 → 7	24 hours

A delivery gets up to 7 attempts: the initial POST plus the six scheduled retries above. If the 7th attempt fails retryably the schedule is exhausted and the delivery flips to terminal failed — we don't retry it again. Last attempt fires roughly 39h45m after the first. A delivery's terminal status is one of delivered, gave_up, failed. The full attempt history lives in the dashboard delivery log.

Per-endpoint failure counter increments on every non-2xx attempt (retryable or terminal). When the counter hits 50 consecutive failures the endpoint is auto-disabled (enabled = false). Successful 2xx delivery resets the counter. The dashboard shows failureCount + lastFailedAt + lastFailureStatus so an operator can see the receiver's pathology without reading worker logs.

Idempotency #

The id field on the event body is a server-minted uuid. The same uuid is reused across:

all retries of one delivery (same delivery id, same body bytes, same signature inputs except timestamp);
all endpoints in the same fan-out (one event going to two subscribed endpoints sends the same id to both).

Receivers should dedupe on id. Postgres-shape ($1 is the libpq / node-postgres positional placeholder for event.id from the parsed body — substitute ? for MySQL / SQLite, :event_id for SQLAlchemy / Oracle, etc.):

CREATE TABLE webhook_dedupe (
  event_id uuid PRIMARY KEY,
  received_at timestamptz NOT NULL DEFAULT now()
);

INSERT INTO webhook_dedupe (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;
-- 0 rows back → already processed; ack + return early.
-- 1 row back → first time; do the work.

The dashboard's redeliver button (per row) re-fires the same id + body — receivers that dedupe correctly silently no-op on re-deliveries.

Operating webhooks #

Endpoint settings #

Setting	Dashboard	API
Create	`/settings/webhooks` → new endpoint	`POST /v1/orgs/:slug/webhook-endpoints`
List	`/settings/webhooks`	`GET /v1/orgs/:slug/webhook-endpoints`
Edit (url / events / enabled / description)	row → edit	`PATCH /v1/orgs/:slug/webhook-endpoints/:id`
Delete	row → delete	`DELETE /v1/orgs/:slug/webhook-endpoints/:id`

url must be https://, ≤ 2048 chars, and not resolve to a private host (see Security model). events is a non-empty string array of known types or the * wildcard. enabled is a boolean — flip back to true after fixing a receiver that auto-disabled.

The plaintext signing secret is only returned on create and on rotate. Subsequent reads return hasSecret: true and never the ciphertext.

Rotating the signing secret #

curl -fsSL -X POST \
  https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/rotate-secret \
  -H "Authorization: Bearer $STECH_API_KEY"
# → { "endpoint": { ... }, "signingSecret": "whsec_<new-64-hex>" }

Rotation is atomic: every delivery dispatched after the rotate response is signed with the new secret. Retries already in flight (already loaded into the worker's batch) finish on the previous secret if they were claimed before the UPDATE; new attempts after that pick up the rotated value. In practice you'll see a brief window where both signatures appear — keep the old secret around for a minute past rotation if your receiver verifies a single secret at a time.

The zero-downtime pattern is dual-secret verification: store the new secret as primary and the old as fallback, accept either, then drop the old after the worker batch flushes (one minute is enough, five if you're cautious):

const expected1 = sign(primarySecret, ts, body);
const expected2 = oldSecret ? sign(oldSecret, ts, body) : null;
if (
  !timingEq(expected1, sig) &&
  !(expected2 && timingEq(expected2, sig))
) {
  return res.status(400).send("bad signature");
}

Always use timingSafeEqual for each compare; the || between them short-circuits on a primary-secret match (which is fine — both secrets are valid, and "primary matched" isn't a secret).

A rotation also fires audit.flagged (verb webhook_endpoints.secret_rotated), so receivers subscribed to the flagged channel see "your customer rotated their signing secret" in real time and can alert on it.

Redelivering a single delivery #

Open /settings/webhooks/<id>/deliveries, find the row, click redeliver. From scripts:

curl -fsSL -X POST \
  https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/deliveries/$DEL/redeliver \
  -H "Authorization: Bearer $STECH_API_KEY"

Redeliver clones the row with a fresh delivery id, attempt_count = 0, status = pending, next_attempt_at = now(). The event_id and the payload bytes are preserved — receivers dedupe naturally.

Viewing the delivery log #

Dashboard: /settings/webhooks/<id>/deliveries. API:

curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/deliveries \
  -H "Authorization: Bearer $STECH_API_KEY"
# → { "deliveries": [...], "hasMore": false }

Cursor pagination via ?before=<deliveryId>&limit=<n> (default 50, max 200). Rows include eventType, status, attemptCount, nextAttemptAt, lastResponseStatus, deliveredAt, createdAt. The response body the receiver returned isn't on the list response — open the row in the dashboard for the truncated body (≤ 8 KiB).

Security model #

HTTPS only. Endpoint creation rejects http://. Webhooks ship signed payloads but the body itself can carry information you don't want plaintext on the wire — we don't compromise on transport.
Signing secret encrypted at rest via AES-256-GCM, same facade as SCIM bearer tokens and CLI source secrets. Surfaced once on create / rotate; the dashboard / api never echo the ciphertext column or decrypt-on-list.
5-minute replay window on the timestamp header (both past and future skew). Pairs with receiver-side dedupe on event_id.
SSRF guard at every delivery, not just at create. A customer could register a public hostname whose A record later flips to point at metadata.google.internal or RFC1918. The worker re-runs the hostname blocklist + a DNS-resolution-time check before every fetch — same posture as the CLI source binary downloader. redirect: "manual" on the POST stops a 3xx from bouncing the signed body to a different host.
Constant-time signature compare on our side; receivers should do the same (the verifier snippets above use timingSafeEqual / compare_digest / hmac.Equal).
No body retention beyond the delivery log. We keep the rendered payload + truncated response body (≤ 8 KiB) for delivery rows so operators can debug; we don't archive your event stream beyond that.

Limitations #

Single-replica delivery worker. Today's worker assumes one api process owns delivery for the deployment. Multi-replica setups would race on the same pending row. Per-row advisory locks are filed but deferred until we run multi-replica api.
No payload templating / no per-event filters. Subscriptions match on the event type (or *) and that's it. "Only deployment.failed for agent X" needs to be filtered receiver-side.
No email-on-auto-disable yet. When an endpoint trips the 50- consecutive-failure threshold we flip enabled = false and log it, but the org-admin notification email is filed under #264 and not yet shipped. The dashboard shows the disabled state immediately.
System-event audit gap. A handful of system-emitted admin actions (the worker's own bookkeeping rows) don't fan out to webhooks by design; tracked in #266.
No outbound IP allowlist published. Deliveries originate from the api host. If your receiver is behind a strict allowlist, contact support — we don't yet publish a stable egress range.

Troubleshooting #

My endpoint flipped to enabled=false — the delivery worker auto-disables an endpoint after 50 consecutive non-2xx attempts. Check failureCount + lastFailureStatus + the delivery log. Fix the receiver, then PATCH … {"enabled": true}. The failure counter resets on the next 2xx.

Signature verification fails — most likely your code computes the HMAC over the parsed-then-reserialized JSON instead of the raw bytes the api signed. The signed string is ${timestamp}.${rawBody}. In Express that means express.raw({ type: "application/json" }), not express.json(); in Flask, request.get_data() not request.json; in Go, io.ReadAll(r.Body) before any decode. Compare bytes.

replay window expired — your server's clock is more than 5 minutes off ours. Run NTP. We deliberately bound the window in both directions so a clock that's set forward also fails closed.

I never get any events — check, in order: (a) the endpoint is enabled = true; (b) the events array on the endpoint matches the type you're expecting (or contains *); (c) the event was actually emitted (/settings/webhooks/<id>/deliveries shows pending + delivered + failed rows for this endpoint — empty list = nothing matched + fanned out). The * wildcard matches every type, including ones added after you subscribed.

I'm getting duplicates — that's intended. We retry on 5xx / network / 408 / 429, and one event fans out to every subscribed endpoint. Dedupe on event.id (server-minted uuid, stable across retries and across endpoints). The redeliver button also re-fires the same id — your dedupe table catches it.

opaqueredirect / redirect_blocked in the delivery log — your endpoint returned a 3xx, or your runtime is presenting one to us as an opaque redirect. We don't follow redirects (the signed body is bound to the host you registered) — point the endpoint URL directly at the final host.

ssrf_blocked in the delivery log — your endpoint's hostname resolved to a private / loopback / metadata IP at delivery time, even though it passed the synchronous shape check at create time. Most common cause: a CDN / DNS provider returning an internal address. Re-resolve the hostname; if it's intentional (e.g. internal-only testing) you'll need a public receiver instead.

Agent runs — cancellation — the agent_run.cancelled event's full payload (including the cancelledByUserId / cancellationReason / requestedAt / acknowledgedAt audit context), the cancel API, and the runtime lifecycle that fires it.
Audit log — retrospective forensics on dispatch failures across all endpoints in the org (the /settings/audit?tab=webhooks org-wide cross-endpoint view; this doc's /settings/webhooks/<id>/deliveries is the per-endpoint drill-down).
Observability — the agent failure-rate watchdog fires audit.flagged with data.kind = "agent_failure_rate"; the docs cover thresholds, dedupe, and a copy-paste Slack bridge.
CLI tool sources — agent runs that fork CLI binaries fire agent_run.completed / agent_run.failed like any other run.
Magic-link sign-in — same encrypted-at-rest posture for sensitive credentials (signing secret encrypted via SecretsCrypto on our side).
Billing and usage — the cost-control worker fires audit.flagged with data.kind = "usage_soft_cap" / "usage_hard_cap" once per period when an org crosses a soft- or hard-cap threshold; payload shapes and dedupe semantics live there.

edit this page on github →