Knowledge

The knowledge platform turns org-curated text into retrievable context that agents can search at runtime. A customer-support bot that answers "how do I reset my SSO?" by looking up the org's onboarding guide is the canonical use; the dashboard is where humans see what got ingested, what got cited, and what didn't match.

The original v1 (epic #43) shipped the schema + the SDK + a thin index page. Epic #299 fleshed out the surface: paginated browser, ingestion-source configuration, query log, citation persistence, and the chat-shell chip strip that closes the loop ("the bot said X because it cited entry Y").

The data model #

Five tables, all org-scoped, all cascade-on-org-delete:

knowledge_entries ──┬──── knowledge_chunks  (1:N, cascade)
                    ├──── knowledge_citations  (N:M via assistant turn)
                    └──── knowledge_sources?  (FK SET NULL — see below)

knowledge_query_log  (standalone log of search calls)

knowledge_entries (db/src/schema/knowledge-entries.ts) — the human-readable record. title + content + tags[] + type (open enum-by-convention: note | rule | pattern | reference | decision | …; new categories don't need a migration). Carries embedding_status (ready | pending | error, CHECK-locked) so the dashboard's status pill comes from a real column rather than being inferred from chunk count.
knowledge_chunks (db/src/schema/knowledge-chunks.ts) — the embedded pieces. embedding vector(1536) for OpenAI's text-embedding-3-small; HNSW (m=16, ef_construction=64) for the ANN leg + GIN-on-to_tsvector('english', content) for the BM25 leg. organization_id denormalized off the entry so the search path stays single-table on the hot loop.
knowledge_sources (db/src/schema/knowledge-sources.ts) — managed-ingestion configuration. One row per configured source; kind is locked to url_crawl | file_upload | mcp via CHECK. Entries produced by a source carry source_id (FK SET NULL — deleting a source preserves the ingested content but unlinks it).
knowledge_citations (db/src/schema/knowledge-citations.ts) — one row per (assistant turn × cited chunk) pair. Cascade on every FK (org / message / entry / chunk) so deletes sweep cleanly. Documented analytics consequence: deleting an entry drops its historical citations — the table is "what's working now", not a permanent audit log.
knowledge_query_log (db/src/schema/knowledge-query-log.ts) — one row per search call. origin locked to sdk | dashboard | cli via CHECK; optional agent_deployment_id (SET NULL on delete) links SDK-origin queries back to the deployment that issued them.

The dashboard #

AppShell mounts the knowledge surface under /knowledge. Three pages.

/knowledge — the entries browser (app/app/knowledge/page.tsx). Paginated table over GET /v1/orgs/:slug/knowledge-entries. Filters (URL-driven, plain <form method="get">):

q — full-text across title + chunk content.
source_type — COALESCE(s.kind, e.type) — managed-source entries surface their kind (url_crawl / file_upload / mcp), manual entries surface their type (note / rule / …).
embedding_status — ready | pending | error. Aggregations live inside the same WHERE so date / status / source narrowing narrows the totals too.
from / to — ISO 8601 bounds on created_at.

Pagination is cursor-based: tuple-base64url over (createdAt, id), over-fetch by 1 to detect hasMore. Mirrors the agent-runs pattern from observability epic #270. Filter changes drop the cursor so a stale page-2 marker doesn't cling to a narrowed window.

The cites 30d column reads from knowledge_citations (org-scoped, 30-day window) — zero is "0" (entry seen but never cited), not "—". Different signal from "data not available".

/knowledge/[id] — entry detail (app/app/knowledge/[id]/page.tsx). Renders the entry header + an expandable chunks list (200-char preview → full body on click), embedding-status pill, per-chunk character count + provider, and the citations in last 30d row. For entries linked to a managed source, surfaces the source kind chip + a view original ↗ link when source_url is set.

Raw embeddings vectors are not exposed — 1536 floats is noise for a human reader. The embedding_provider column tells you which model produced the vector (openai/text-embedding-3-small, etc); a debugging story for "why didn't this chunk match?" can grow a per-chunk re-similarity probe later if anyone asks.

/knowledge/sources — managed-ingestion configuration (app/app/knowledge/sources/page.tsx). Elevated-only (owners + admins, same gate as the existing entry DELETE). Add / remove / re-run dialogs over the /v1/orgs/:slug/knowledge-sources endpoints. Last-run pill (ok | partial | error) renders from the column directly. Re-run now enqueues an immediate ingestion run — see Ingestion below for what's wired today vs. deferred.

The search API #

GET /v1/orgs/:slug/knowledge/search?q=...&limit=20&types=rule,pattern (api/src/routes/knowledge.ts). Hybrid retrieval: embed the query once, run vector + keyword in parallel, blend with Reciprocal Rank Fusion.

Vector leg — cosine over knowledge_chunks.embedding. HNSW index, org-filtered.
Keyword leg — plainto_tsquery + ts_rank_cd over to_tsvector('english', content). Same GIN index.
Blend — RRF with k=60 (the standard recipe). Each leg fetches min(limit*3, 100) so the rank-fusion has overlap to work with; the blender returns the top limit (default 20, max 100).

Pre-filtering: ?types= narrows to specific entry types (or COALESCE-d source kinds, post-PR-2). Defensive caps: q capped at 1000 chars (paste-accident guard), limit clamped to [1, 100].

Each call writes a knowledge_query_log row. The optional ?origin= query param (sdk | dashboard | cli) tags which surface issued the search; defaults to sdk — the dominant caller; dashboard + CLI surfaces stamp explicitly. The write is fire-and-forget — analytics infra can never block the search response. ?agent_deployment_id= links SDK-origin queries back to the deployment that issued them.

The dashboard search bar (app/app/knowledge/KnowledgeSearchBar.tsx) calls the same endpoint as the SDK. By design — parity between "what the agent retrieves" and "what the human sees when they search the same query" is the whole point of the dashboard.

Calling search from an agent (developer) #

import { defineAgent, createMemoryClient } from "@stech/agent";

export default defineAgent({
  name: "support",
  model: "claude-sonnet-4-6",
  async run({ input }) {
    const memory = createMemoryClient({
      endpoint: process.env.STECH_API_URL!,
      apiKey: process.env.STECH_API_KEY!,
      orgSlug: process.env.STECH_ORG_SLUG!,
    });

    const hits = await memory.knowledge.search(input.text, {
      limit: 5,
      types: ["rule", "pattern"],
    });
    // hits: { entryId, chunkId, score, content, entryTitle? }[]
  },
});

The runtime exposes the same client at memory.knowledge.search(...) inside the agent. When the agent runs through run-stream, the runtime emits a citation SSE frame for each hit that lands in the model's prompt context — the chat shell renders chips, the api proxy persists rows to knowledge_citations. See Citation contract below for the wire shape.

Ingestion configuration #

knowledge_sources ships three kinds:

kind	`config` shape	source of `knowledge_entries`
`url_crawl`	`{ url, depth?, prefixes? }`	one entry per fetched page (`source_url` set)
`file_upload`	`{ filename, r2Key }` (uses r2.ts)	one entry per uploaded file
`mcp`	`{ mcpSourceId, resourcePattern? }`	one entry per matched MCP resource

Optional cron_expression schedules automatic re-fetch passes — if you set a cron schedule, the runner reads it; null means manual-run only via the /run endpoint. There is no built-in default schedule. Last-run state (last_run_at, last_run_status, last_run_error, last_run_entry_count) gets stamped after each pass — the dashboard reads these columns directly.

The CRUD surface is at /v1/orgs/:slug/knowledge-sources (api/src/routes/knowledge-sources.ts): GET is any-member, write paths (POST / PATCH / DELETE / POST .../:id/run) are elevated-only.

The actual ingestion runner is deferred. PR-2 shipped the table, the api, the dashboard. The /run endpoint stamps last_run_at = now() + last_run_status = 'ok' so the dashboard contract is stable from day one — but the actual URL-crawl / R2-fetch / MCP-resource-list pass is a follow-up. Until that lands, configured sources are an organizing primitive (you can stamp existing knowledge_entries with a source_id manually) but re-run now doesn't re-fetch any content.

When the runner lands, two existing pieces come into play:

SSRF guard — url_crawl URLs go through api/src/lib/ssrf-guard.ts (same multi-layer guard as compute-sha for CLI sources): hostname blocklist at validate time, DNS-resolution-time guard at fetch time, redirect: "manual" to block 3xx-to-private-IP bounces.
R2 storage — file_upload reuses api/src/lib/storage/r2.ts. Same shape MCP file artifacts use.

Citation contract #

The chat shell renders a footnote-style chip strip ([1] Entry title) below assistant turns that cited knowledge entries. Click → /knowledge/<entryId>. End-to-end:

1. The MCP server formats hits with a structured header. cli/src/runtime/mcp/server.ts emits one text block per hit:

[entry <entryId> · chunk <chunkId> · score <s>] <title?>
<chunk content>

2. The runtime parses the header and emits a citation frame. After a successful knowledge_search tool result, the agent loop (runtime/src/agent/loop.ts) matches each line against CITATION_HEADER_RE and emits a {type:"citation", entryId, chunkId, score, entryTitle?} frame on the SSE stream. Tool-name match (knowledge_search) is the gate so non-knowledge tools never spuriously cite.

3. The api proxy tees the SSE stream and persists. The run-stream handler (api/src/routes/agents.ts) accumulates citation frames per stream; once persistConversationTurn returns the new agent_message_id, it calls writeCitations (api/src/routes/knowledge-citations.ts) which bulk-inserts to knowledge_citations. Best-effort — a write failure logs but doesn't break the chat surface.

4. The chat shell renders the chips live + on rehydrate. app/app/agents/[id]/MessageList.tsx accumulates citations in the streaming reducer (de-duped by chunkId) and renders <CitationStrip> under each assistant bubble. Page reload re-hydrates from GET .../conversations/:cid (joins knowledge_citations → knowledge_entries per assistant message, returns {citations: [...]} per row).

Opt-in semantics. The runtime emits citations only when the model received the search results in its prompt context for that turn — explicit attribution from the search call, not post-hoc substring-matching. Post-hoc attribution by chunk-id matching is brittle (the model can paraphrase, edit, or omit chunks); opt-in is cleaner because the SDK already gives the runtime the chunk ids on each search call, so the data is right there.

Sync /run doesn't emit citations. Only /run-stream has the SSE channel. The chat shell uses /run-stream exclusively; /run is the legacy fallback for older runtime deployments. Acceptable gap, documented inline in the run-stream handler.

Operator runbook #

Add a manual entry — /knowledge/new → fill title + content + optional type/tags, submit. The api inserts with embedding_status='pending', embeds, flips to ready (or error if the embedder is down — the entry persists either way so you can retry without losing the content).
Re-index a single entry — there is no PATCH yet. Today: delete
- recreate from /knowledge/new. The entry id changes (and any citation FKs cascade-delete) — acceptable for v1; the PATCH path is tracked as a known gap in #299.
Configure a managed source — /knowledge/sources → add source. Fill kind + label + kind-shaped config + optional cron. Save. The runner is deferred (see Ingestion configuration) so re-run now stamps the contract but doesn't fetch content yet.
Surface a content gap — analytics for zero-result queries lives in knowledge_query_log (WHERE result_count = 0 is the gap signal). The /knowledge/insights UI page is deferred; query the table directly until it ships.
Why didn't the bot cite the right entry? — open the agent conversation in /agents/<id>/conversations/<cid>. Each assistant turn shows its citation strip; click through to the cited entry to see the chunk that ranked. If no chips render, the model didn't receive any memory.knowledge.search() results in that turn — either the agent didn't call search, or the call returned [] (check knowledge_query_log for the matching row + result_count=0).

Multi-agent visibility #

Knowledge is org-scoped, not agent-scoped. Agent A's runs cite from the same knowledge_entries pool as agent B's runs in the same org. Per-agent scoping (some agents see only some entries) is explicitly out of scope for this epic — it would need a knowledge_entry_agent_scope join + a new auth predicate, large enough for its own epic.

In practice this matches the natural unit of curation: org-level knowledge ("our docs", "our runbooks") is what customers actually maintain, and the simpler model means there's one place to look when asking "what does the agent know?".

Limitations (v1) #

Ingestion runner is deferred. Sources can be configured + the re-run button stamps the dashboard contract, but URL crawl / R2 fetch / MCP resource fetch don't actually pull content yet. Track in #299.
No PATCH on entries. Re-indexing requires DELETE + re-POST, which loses the entry id and cascade-deletes any historical citations. Captured in #299; the PATCH endpoint is a follow-up.
English-only FTS. to_tsvector('english', content) is hardcoded; multi-language search needs a migration + index rebuild.
OpenAI text-embedding-3-small only. The chunks table is pinned to vector(1536). Wider providers would land in a separate column when added — pluggable Embedder interface (api/src/lib/embeddings/) makes the change non-breaking on the runtime side.
No /knowledge/insights page yet. knowledge_query_log + knowledge_citations are populated on every search / run; the read views (top queries, top-cited entries, zero-result queries) ship later. Today: query the tables directly.
No retention worker yet. knowledge_query_log and knowledge_citations grow unbounded until the daily cleanup job lands (default 90d per epic decision). Indexes are wired so the eventual DELETE WHERE created_at < now() - interval '<N> days' is cheap.
Citations cascade-delete with entries. Deleting a knowledge_entries row drops its citation history. The analytics story is "what's working now", not a permanent audit log.

Persisted conversations — the chat shell that renders the citation chip strip.
Agent runs — the run-stream proxy that tees citation frames into knowledge_citations.
CLI tool sources — same SSRF guard + R2-storage shape the url_crawl / file_upload ingestion kinds reuse.
Observability — the observability epic the entries-browser pagination pattern came from.

edit this page on github →