Webhooks
Outbound HTTP notifications when something happens in your org — deployments, agent runs, SCIM identity changes, admin actions. Replaces polling: subscribe once and receive a signed POST per event. Stripe-shape signing, six-step exponential-backoff retry, auto-disable on sustained failure.
Quick start #
Register an endpoint, copy the signing secret once, paste it into your receiver, verify.
# 1. create the endpoint — admin or owner only
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints \
-H "Authorization: Bearer $STECH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://hooks.acme.example/stech",
"description": "prod alerting",
"events": ["deployment.failed", "agent_run.failed"]
}'
# → { "endpoint": { ... }, "signingSecret": "whsec_<64-hex>" }
# 2. paste signingSecret into your receiver's env. The api never returns
# it again — rotate (below) if you lose it.
# 3. verify a delivery hits your endpoint (the dashboard delivery log at
# /settings/webhooks/<id>/deliveries shows POST status + body).
# 4. rotate later if needed
curl -fsSL -X POST \
https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/rotate-secret \
-H "Authorization: Bearer $STECH_API_KEY"
# → { "endpoint": { ... }, "signingSecret": "whsec_<new-64-hex>" }A minimal Express receiver that verifies + responds 200:
import express from "express";
import { createHmac, timingSafeEqual } from "node:crypto";
const app = express();
const SECRET = process.env.STECH_WEBHOOK_SECRET!;
// IMPORTANT: read the raw body bytes. Parsing to JSON before HMAC will
// re-serialize and produce a different byte sequence — signature will
// never verify.
app.post("/stech", express.raw({ type: "application/json" }), (req, res) => {
const ts = req.get("X-Stech-Timestamp");
const sig = req.get("X-Stech-Signature");
if (!ts || !sig) return res.status(400).send("missing headers");
const tsNum = Number(ts);
if (!Number.isFinite(tsNum) || Math.abs(Date.now() / 1000 - tsNum) > 300) {
return res.status(400).send("replay window");
}
// Two-call .update() — keeps the body bytes uninterpreted. A
// template-literal `${ts}.${req.body}` would coerce the Buffer
// through .toString() (UTF-8) and silently break for any byte
// sequence that isn't valid UTF-8.
const expected =
"sha256=" +
createHmac("sha256", SECRET)
.update(`${ts}.`)
.update(req.body)
.digest("hex");
const a = Buffer.from(expected);
const b = Buffer.from(sig);
if (a.length !== b.length || !timingSafeEqual(a, b)) {
return res.status(400).send("bad signature");
}
const event = JSON.parse(req.body.toString());
console.log(event.type, event.id, event.data);
res.status(200).send("ok");
});Event catalog #
Every event body has the same envelope:
{
"id": "<uuid>",
"type": "deployment.created",
"createdAt": "2026-05-08T12:34:56.789Z",
"organizationId": "<cuid2>",
"data": { ... }
}id is server-minted at fan-out time and is stable across retries
and across multiple subscribed endpoints — dedupe on it. type is
one of the values below. createdAt is ISO-8601 UTC. data shape is
event-specific:
| Type | Trigger | data keys |
|---|---|---|
deployment.created |
provisioner reports a new live deploy | deploymentId, agentId, agentName, machineUrl, deployedByUserId |
deployment.failed |
provisioner gives up on a build, or throws | deploymentId, agentId, agentName, reason, deployedByUserId |
agent_run.completed |
streaming /run produced a done frame |
runId, deploymentId, agentName, conversationId, stopReason, finalText, iterations, usage.input, usage.output |
agent_run.failed |
streaming /run ended without a done frame |
runId, deploymentId, agentName, conversationId, reason |
agent_run.cancelled |
streaming /run ended with stop_reason='cancelled' — see agent-runs.md |
runId, deploymentId, agentName, conversationId, stopReason, iterations, usage.input, usage.output, cancelledByUserId, cancellationReason, requestedAt, acknowledgedAt |
agent_run.blocked |
streaming /run ended with stop_reason='blocked:<kind>' — a guardrail terminated the run. Same payload bytes as the audit.flagged data.kind='agent_run_blocked' envelope; pick this channel to skip the noisier aggregate. See policy-and-guardrails.md |
kind, agentId, agentName, runId, conversationId, guardrail, source, limit, observed, message, stopReason |
agent_version.deployed |
stech deploy minted a new version row (upload time — the deployment is queued, not yet live; pair with deployment.created for the live signal) |
agentName, versionLabel, versionId, deploymentId, actorUserId |
agent_version.promoted_to_canary |
the canary channel now points at this deployment (via stech canary set / dashboard) |
agentName, channelName, versionLabel, deploymentId, trafficWeight, actorUserId, previousDeploymentId |
agent_version.promoted_to_stable |
the stable channel now points at this deployment (via stech canary promote / dashboard / direct stable PUT) |
same shape as agent_version.promoted_to_canary |
agent_version.rolled_back |
stech rollback reverted stable to a previous version — distinct type from promoted_to_stable so PagerDuty / SIEM tooling can branch ("production was reverted" vs "canary graduated") |
same shape as agent_version.promoted_to_canary, plus versionId |
scim.user_added |
SCIM POST /Users created a new identity | userId, externalId, email, membershipId |
scim.user_deactivated |
SCIM PUT/PATCH/DELETE flipped a user inactive | userId, externalId, email, membershipId, deactivatedAt |
scim.group_added |
SCIM POST /Groups created a new group | groupId, externalId, displayName, memberCount |
admin_action.recorded |
any row written to admin_actions (audit log) |
actionId, action, actorUserId, targetType, targetId, reason |
audit.flagged |
curated subset of admin actions + watchdog signals worth alerting on (token revocation, OAuth disconnect, source deletion, SSO updates, webhook secret rotation, plan changes, deployment supersede, agent failure-rate alerts, agent-run guardrail blocks). The data.kind discriminator routes between them — agent_failure_rate, agent_run_blocked, etc. |
varies by data.kind; see linked surfaces |
Strings can be null where the upstream column is nullable —
deployedByUserId, runId, conversationId, externalId, email,
stopReason, iterations, usage.*, actionId, targetType,
targetId, reason, previousDeploymentId (null on a fresh channel
attach). Receivers should treat the data object as forward-
compatible: we may add keys, we won't remove them without a
deprecation cycle.
finalText on agent_run.completed is truncated to 500 chars (with a
trailing …); the full text lives on the conversation row, not the
event.
Subscribing to events #
Each endpoint stores an events array. Two shapes:
{ "events": ["deployment.failed", "agent_run.failed"] } // exact match
{ "events": ["*"] } // wildcard — every typeWildcard subscriptions match new event types added in the future
automatically. Mix-and-match (["*", "deployment.failed"]) collapses
to ["*"] server-side.
Example bodies #
deployment.created:
{
"id": "0d8c5e44-3e2a-4f8d-9d2e-9b6f8a1f1e25",
"type": "deployment.created",
"createdAt": "2026-05-08T14:02:11.482Z",
"organizationId": "org_2t4b...",
"data": {
"deploymentId": "dep_4kp...",
"agentId": "dep_4kp...",
"agentName": "support-triage",
"machineUrl": "https://support-triage-acme.fly.dev",
"deployedByUserId": "usr_9q1..."
}
}agent_run.completed:
{
"id": "e1c2a16b-8b71-4d88-9d3e-9b8f0d6c5a02",
"type": "agent_run.completed",
"createdAt": "2026-05-08T14:09:51.103Z",
"organizationId": "org_2t4b...",
"data": {
"runId": "run_8w3...",
"deploymentId": "dep_4kp...",
"agentName": "support-triage",
"conversationId": "cnv_7m1...",
"stopReason": "end_turn",
"finalText": "Ticket 42 was resolved at 13:48 UTC by an SRE rollback…",
"iterations": 3,
"usage": { "input": 4218, "output": 612 }
}
}scim.user_deactivated:
{
"id": "f3a8d211-77b6-4e3c-9f12-2d6c0a3e5b91",
"type": "scim.user_deactivated",
"createdAt": "2026-05-08T14:15:09.001Z",
"organizationId": "org_2t4b...",
"data": {
"userId": "usr_9q1...",
"externalId": "okta-00u1abcd",
"email": "[email protected]",
"membershipId": "mem_2k3...",
"deactivatedAt": "2026-05-08T14:15:09.001Z"
}
}agent_version.deployed (upload-time — deployment is queued, not yet
live; pair with the later deployment.created for the reachability
signal):
{
"id": "8c0b91a4-2147-4a8c-8e09-1d6e2a3f9012",
"type": "agent_version.deployed",
"createdAt": "2026-05-09T08:21:11.001Z",
"organizationId": "org_2t4b...",
"data": {
"agentName": "support-triage",
"versionLabel": "v4",
"versionId": "ver_8h3...",
"deploymentId": "dep_4kp...",
"actorUserId": "usr_9q1..."
}
}agent_version.promoted_to_canary (operator pointed canary at v4 with
10% traffic; previousDeploymentId is null on the first attach):
{
"id": "1a7f3e22-8c91-4a02-b7d4-3e1f8c0a2b54",
"type": "agent_version.promoted_to_canary",
"createdAt": "2026-05-09T08:25:42.310Z",
"organizationId": "org_2t4b...",
"data": {
"agentName": "support-triage",
"channelName": "canary",
"versionLabel": "v4",
"deploymentId": "dep_4kp...",
"trafficWeight": 0.1,
"actorUserId": "usr_9q1...",
"previousDeploymentId": null
}
}agent_version.rolled_back (operator reverted stable from v4 to v3 —
distinct type from agent_version.promoted_to_stable so alerting
("production was reverted") and graduation tracking ("canary became
stable on green metrics") don't get confused):
{
"id": "f92c0a15-3b8d-4e1f-a7c2-9d8e1c5b3027",
"type": "agent_version.rolled_back",
"createdAt": "2026-05-09T08:33:15.778Z",
"organizationId": "org_2t4b...",
"data": {
"agentName": "support-triage",
"channelName": "stable",
"versionLabel": "v3",
"versionId": "ver_7g2...",
"deploymentId": "dep_3jp...",
"trafficWeight": 1.0,
"actorUserId": "usr_9q1...",
"previousDeploymentId": "dep_4kp..."
}
}admin_action.recorded (paired audit.flagged for curated verbs):
{
"id": "9b2e1c44-15a3-4f0e-91d4-7c8a3f1d2e10",
"type": "admin_action.recorded",
"createdAt": "2026-05-08T14:22:08.554Z",
"organizationId": "org_2t4b...",
"data": {
"actionId": null,
"action": "webhook_endpoints.secret_rotated",
"actorUserId": "usr_9q1...",
"targetType": "webhook_endpoint",
"targetId": "whe_3z9...",
"reason": null
}
}Signing scheme #
Every POST carries four headers:
| Header | Value |
|---|---|
X-Stech-Event |
the event type (also in the body) |
X-Stech-Delivery |
the per-attempt delivery id (changes on retry) |
X-Stech-Timestamp |
unix epoch seconds at sign time |
X-Stech-Signature |
sha256=<hex> — see below |
The signed string is ${timestamp}.${body} — the literal timestamp
header value, a dot, then the raw POST body bytes. The HMAC key is the
plaintext signing secret (whsec_…, the value the api returned once on
create / rotate). Algorithm is HMAC-SHA256. The header value is the
algorithm prefix sha256= concatenated with the lowercase-hex digest:
X-Stech-Signature: sha256=4f2ed3bce1...d7 (64 hex chars after the prefix)Receivers MUST:
- Re-compute the HMAC over the raw body bytes, not the parsed-and-reserialized JSON. JSON re-serialization changes whitespace and key order; the signature won't match.
- Compare digests in constant time (e.g.
crypto.timingSafeEqual,hmac.compare_digest,subtle.ConstantTimeCompare). A naive==leaks the secret one byte at a time over the network. - Enforce a 5-minute replay window: reject if
abs(now - timestamp) > 300. The api side does the same check; the belt-and-suspenders pair stops a captured-and-replayed request from being honored more than once. - Pin the algorithm prefix (
sha256=). We can shipsha384=later without breaking pinned receivers.
The plaintext signing secret is stored encrypted at rest (AES-256-GCM) on our side. We surface it exactly once on create and on rotate; lose it and you must rotate.
Verifier snippets #
Three working receivers using only the standard library. Each is ~20 LOC and handles missing headers, the replay window, and the constant-time compare.
TypeScript / Node (Express) #
import { createHmac, timingSafeEqual } from "node:crypto";
import express from "express";
const app = express();
const SECRET = process.env.STECH_WEBHOOK_SECRET!;
const REPLAY_WINDOW_SEC = 300;
// `express.raw` keeps `req.body` as a Buffer of the exact bytes we
// received. Parsing to JSON first would re-encode the bytes and break
// the HMAC.
app.post("/stech", express.raw({ type: "application/json" }), (req, res) => {
const ts = req.get("X-Stech-Timestamp");
const sig = req.get("X-Stech-Signature");
if (!ts || !sig) return res.status(400).send("missing headers");
const tsNum = Number(ts);
if (
!Number.isFinite(tsNum) ||
Math.abs(Date.now() / 1000 - tsNum) > REPLAY_WINDOW_SEC
) {
return res.status(400).send("replay window");
}
// Two-call .update() — keeps the body bytes uninterpreted. A
// template-literal `${ts}.${req.body}` would coerce the Buffer
// through .toString() (UTF-8) and silently break for any byte
// sequence that isn't valid UTF-8.
const expected =
"sha256=" +
createHmac("sha256", SECRET)
.update(`${ts}.`)
.update(req.body)
.digest("hex");
// Constant-time compare. A naive `===` would short-circuit on the
// first mismatched byte and leak the secret over network timing.
const a = Buffer.from(expected);
const b = Buffer.from(sig);
if (a.length !== b.length || !timingSafeEqual(a, b)) {
return res.status(400).send("bad signature");
}
const event = JSON.parse(req.body.toString());
// …handle event.type / event.id / event.data here…
res.status(200).send("ok");
});
app.listen(8080);Python (Flask) #
import hmac, hashlib, os, time
from flask import Flask, request, abort
app = Flask(__name__)
SECRET = os.environ["STECH_WEBHOOK_SECRET"].encode()
REPLAY_WINDOW_SEC = 300
@app.post("/stech")
def stech():
ts = request.headers.get("X-Stech-Timestamp")
sig = request.headers.get("X-Stech-Signature")
if not ts or not sig:
abort(400, "missing headers")
try:
ts_num = int(ts)
except ValueError:
abort(400, "bad timestamp")
if abs(time.time() - ts_num) > REPLAY_WINDOW_SEC:
abort(400, "replay window")
# Use request.get_data() not request.json — we MUST hash the raw
# bytes the api signed, not flask's parsed-then-reserialized JSON.
body = request.get_data()
mac = hmac.new(SECRET, f"{ts}.".encode() + body, hashlib.sha256)
expected = "sha256=" + mac.hexdigest()
# hmac.compare_digest is constant-time; `==` would leak the secret
# byte-by-byte over network timing.
if not hmac.compare_digest(expected, sig):
abort(400, "bad signature")
import json
event = json.loads(body)
# …handle event["type"] / event["id"] / event["data"] here…
return "ok", 200Go (net/http) #
package main
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
"encoding/json"
"io"
"net/http"
"os"
"strconv"
"time"
)
// Read at startup — failing fast on a missing env beats every
// signature comparing equal to HMAC("", body) and then mysteriously
// failing the constant-time compare.
var secret = mustEnv("STECH_WEBHOOK_SECRET")
const replayWindowSec = 300
func mustEnv(name string) []byte {
v := os.Getenv(name)
if v == "" {
panic("missing env " + name)
}
return []byte(v)
}
func handle(w http.ResponseWriter, r *http.Request) {
ts := r.Header.Get("X-Stech-Timestamp")
sig := r.Header.Get("X-Stech-Signature")
if ts == "" || sig == "" {
http.Error(w, "missing headers", 400)
return
}
tsNum, err := strconv.ParseInt(ts, 10, 64)
if err != nil || abs(time.Now().Unix()-tsNum) > replayWindowSec {
http.Error(w, "replay window", 400)
return
}
// Read the raw body bytes — re-encoding the parsed JSON would
// produce different bytes than the api signed.
body, err := io.ReadAll(r.Body)
if err != nil {
http.Error(w, "read body", 400)
return
}
mac := hmac.New(sha256.New, secret)
mac.Write([]byte(ts + "."))
mac.Write(body)
expected := "sha256=" + hex.EncodeToString(mac.Sum(nil))
// hmac.Equal is constant-time; a plain string `==` would leak the
// secret one byte at a time over the network.
if !hmac.Equal([]byte(expected), []byte(sig)) {
http.Error(w, "bad signature", 400)
return
}
var event struct {
ID, Type string
Data json.RawMessage
}
_ = json.Unmarshal(body, &event)
// …handle event.Type / event.ID / event.Data here…
w.WriteHeader(200)
}
func abs(n int64) int64 {
if n < 0 {
return -n
}
return n
}
func main() {
http.HandleFunc("/stech", handle)
_ = http.ListenAndServe(":8080", nil)
}Retry policy #
The delivery worker classifies each attempt and retries on transient failures only.
| Outcome | Worker behavior |
|---|---|
| 2xx | delivered. Endpoint failure counter resets. |
| 408, 429 | retryable — schedule next attempt |
| 4xx (anything else) | gave_up. We stop retrying this delivery. |
| 3xx | gave_up (redirect_blocked) — we don't follow redirects, the signed body is bound to the host you registered |
| 5xx | retryable |
| network error / timeout (30s) | retryable |
Backoff schedule, applied after each retryable failure:
| Attempt | Wait before next |
|---|---|
| 1 → 2 | 1 minute |
| 2 → 3 | 5 minutes |
| 3 → 4 | 25 minutes |
| 4 → 5 | 2 hours |
| 5 → 6 | 12 hours |
| 6 → 7 | 24 hours |
A delivery gets up to 7 attempts: the initial POST plus the six
scheduled retries above. If the 7th attempt fails retryably the
schedule is exhausted and the delivery flips to terminal failed —
we don't retry it again. Last attempt fires roughly 39h45m after the
first. A delivery's terminal status is one of delivered, gave_up,
failed. The full attempt history lives in the dashboard delivery log.
Per-endpoint failure counter increments on every non-2xx attempt
(retryable or terminal). When the counter hits 50 consecutive
failures the endpoint is auto-disabled (enabled = false). Successful
2xx delivery resets the counter. The dashboard shows
failureCount + lastFailedAt + lastFailureStatus so an operator
can see the receiver's pathology without reading worker logs.
Idempotency #
The id field on the event body is a server-minted uuid. The same uuid
is reused across:
- all retries of one delivery (same delivery id, same body bytes,
same signature inputs except
timestamp); - all endpoints in the same fan-out (one event going to two
subscribed endpoints sends the same
idto both).
Receivers should dedupe on id. Postgres-shape ($1 is the libpq /
node-postgres positional placeholder for event.id from the parsed
body — substitute ? for MySQL / SQLite, :event_id for SQLAlchemy /
Oracle, etc.):
CREATE TABLE webhook_dedupe (
event_id uuid PRIMARY KEY,
received_at timestamptz NOT NULL DEFAULT now()
);
INSERT INTO webhook_dedupe (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;
-- 0 rows back → already processed; ack + return early.
-- 1 row back → first time; do the work.The dashboard's redeliver button (per row) re-fires the same
id + body — receivers that dedupe correctly silently no-op on
re-deliveries.
Operating webhooks #
Endpoint settings #
| Setting | Dashboard | API |
|---|---|---|
| Create | /settings/webhooks → new endpoint |
POST /v1/orgs/:slug/webhook-endpoints |
| List | /settings/webhooks |
GET /v1/orgs/:slug/webhook-endpoints |
| Edit (url / events / enabled / description) | row → edit | PATCH /v1/orgs/:slug/webhook-endpoints/:id |
| Delete | row → delete | DELETE /v1/orgs/:slug/webhook-endpoints/:id |
url must be https://, ≤ 2048 chars, and not resolve to a private
host (see Security model). events is a non-empty string array of
known types or the * wildcard. enabled is a boolean — flip back to
true after fixing a receiver that auto-disabled.
The plaintext signing secret is only returned on create and on
rotate. Subsequent reads return hasSecret: true and never the
ciphertext.
Rotating the signing secret #
curl -fsSL -X POST \
https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/rotate-secret \
-H "Authorization: Bearer $STECH_API_KEY"
# → { "endpoint": { ... }, "signingSecret": "whsec_<new-64-hex>" }Rotation is atomic: every delivery dispatched after the rotate response is signed with the new secret. Retries already in flight (already loaded into the worker's batch) finish on the previous secret if they were claimed before the UPDATE; new attempts after that pick up the rotated value. In practice you'll see a brief window where both signatures appear — keep the old secret around for a minute past rotation if your receiver verifies a single secret at a time.
The zero-downtime pattern is dual-secret verification: store the new secret as primary and the old as fallback, accept either, then drop the old after the worker batch flushes (one minute is enough, five if you're cautious):
const expected1 = sign(primarySecret, ts, body);
const expected2 = oldSecret ? sign(oldSecret, ts, body) : null;
if (
!timingEq(expected1, sig) &&
!(expected2 && timingEq(expected2, sig))
) {
return res.status(400).send("bad signature");
}Always use timingSafeEqual for each compare; the || between them
short-circuits on a primary-secret match (which is fine — both
secrets are valid, and "primary matched" isn't a secret).
A rotation also fires audit.flagged (verb
webhook_endpoints.secret_rotated), so receivers subscribed to the
flagged channel see "your customer rotated their signing secret" in
real time and can alert on it.
Redelivering a single delivery #
Open /settings/webhooks/<id>/deliveries, find the row, click
redeliver. From scripts:
curl -fsSL -X POST \
https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/deliveries/$DEL/redeliver \
-H "Authorization: Bearer $STECH_API_KEY"Redeliver clones the row with a fresh delivery id, attempt_count = 0,
status = pending, next_attempt_at = now(). The event_id and the
payload bytes are preserved — receivers dedupe naturally.
Viewing the delivery log #
Dashboard: /settings/webhooks/<id>/deliveries. API:
curl -fsSL https://api.stech.com/v1/orgs/$ORG/webhook-endpoints/$ID/deliveries \
-H "Authorization: Bearer $STECH_API_KEY"
# → { "deliveries": [...], "hasMore": false }Cursor pagination via ?before=<deliveryId>&limit=<n> (default 50, max
200). Rows include eventType, status, attemptCount,
nextAttemptAt, lastResponseStatus, deliveredAt, createdAt. The
response body the receiver returned isn't on the list response — open
the row in the dashboard for the truncated body (≤ 8 KiB).
Security model #
- HTTPS only. Endpoint creation rejects
http://. Webhooks ship signed payloads but the body itself can carry information you don't want plaintext on the wire — we don't compromise on transport. - Signing secret encrypted at rest via AES-256-GCM, same facade as SCIM bearer tokens and CLI source secrets. Surfaced once on create / rotate; the dashboard / api never echo the ciphertext column or decrypt-on-list.
- 5-minute replay window on the timestamp header (both past and
future skew). Pairs with receiver-side dedupe on
event_id. - SSRF guard at every delivery, not just at create. A customer
could register a public hostname whose A record later flips to point
at
metadata.google.internalor RFC1918. The worker re-runs the hostname blocklist + a DNS-resolution-time check before every fetch — same posture as the CLI source binary downloader.redirect: "manual"on the POST stops a 3xx from bouncing the signed body to a different host. - Constant-time signature compare on our side; receivers should do
the same (the verifier snippets above use
timingSafeEqual/compare_digest/hmac.Equal). - No body retention beyond the delivery log. We keep the rendered payload + truncated response body (≤ 8 KiB) for delivery rows so operators can debug; we don't archive your event stream beyond that.
Limitations #
- Single-replica delivery worker. Today's worker assumes one api
process owns delivery for the deployment. Multi-replica setups would
race on the same
pendingrow. Per-row advisory locks are filed but deferred until we run multi-replica api. - No payload templating / no per-event filters. Subscriptions match
on the event type (or
*) and that's it. "Onlydeployment.failedfor agent X" needs to be filtered receiver-side. - No email-on-auto-disable yet. When an endpoint trips the 50-
consecutive-failure threshold we flip
enabled = falseand log it, but the org-admin notification email is filed under #264 and not yet shipped. The dashboard shows the disabled state immediately. - System-event audit gap. A handful of system-emitted admin actions (the worker's own bookkeeping rows) don't fan out to webhooks by design; tracked in #266.
- No outbound IP allowlist published. Deliveries originate from the api host. If your receiver is behind a strict allowlist, contact support — we don't yet publish a stable egress range.
Troubleshooting #
My endpoint flipped to enabled=false — the delivery worker
auto-disables an endpoint after 50 consecutive non-2xx attempts. Check
failureCount + lastFailureStatus + the delivery log. Fix the
receiver, then PATCH … {"enabled": true}. The failure counter resets
on the next 2xx.
Signature verification fails — most likely your code computes the
HMAC over the parsed-then-reserialized JSON instead of the raw bytes
the api signed. The signed string is ${timestamp}.${rawBody}. In
Express that means express.raw({ type: "application/json" }), not
express.json(); in Flask, request.get_data() not request.json; in
Go, io.ReadAll(r.Body) before any decode. Compare bytes.
replay window expired — your server's clock is more than 5
minutes off ours. Run NTP. We deliberately bound the window in both
directions so a clock that's set forward also fails closed.
I never get any events — check, in order: (a) the endpoint is
enabled = true; (b) the events array on the endpoint matches the
type you're expecting (or contains *); (c) the event was actually
emitted (/settings/webhooks/<id>/deliveries shows pending +
delivered + failed rows for this endpoint — empty list = nothing
matched + fanned out). The * wildcard matches every type, including
ones added after you subscribed.
I'm getting duplicates — that's intended. We retry on 5xx /
network / 408 / 429, and one event fans out to every subscribed
endpoint. Dedupe on event.id (server-minted uuid, stable across
retries and across endpoints). The redeliver button also re-fires
the same id — your dedupe table catches it.
opaqueredirect / redirect_blocked in the delivery log — your
endpoint returned a 3xx, or your runtime is presenting one to us as an
opaque redirect. We don't follow redirects (the signed body is bound
to the host you registered) — point the endpoint URL directly at the
final host.
ssrf_blocked in the delivery log — your endpoint's hostname
resolved to a private / loopback / metadata IP at delivery time, even
though it passed the synchronous shape check at create time. Most
common cause: a CDN / DNS provider returning an internal address.
Re-resolve the hostname; if it's intentional (e.g. internal-only
testing) you'll need a public receiver instead.
Related #
- Agent runs — cancellation — the
agent_run.cancelledevent's full payload (including thecancelledByUserId/cancellationReason/requestedAt/acknowledgedAtaudit context), the cancel API, and the runtime lifecycle that fires it. - Audit log — retrospective forensics on dispatch
failures across all endpoints in the org (the
/settings/audit?tab=webhooksorg-wide cross-endpoint view; this doc's/settings/webhooks/<id>/deliveriesis the per-endpoint drill-down). - Observability — the agent failure-rate
watchdog fires
audit.flaggedwithdata.kind = "agent_failure_rate"; the docs cover thresholds, dedupe, and a copy-paste Slack bridge. - CLI tool sources — agent runs that fork CLI
binaries fire
agent_run.completed/agent_run.failedlike any other run. - Magic-link sign-in — same encrypted-at-rest
posture for sensitive credentials (signing secret encrypted via
SecretsCryptoon our side). - Billing and usage — the cost-control
worker fires
audit.flaggedwithdata.kind = "usage_soft_cap"/"usage_hard_cap"once per period when an org crosses a soft- or hard-cap threshold; payload shapes and dedupe semantics live there.