Knowledge
The knowledge platform turns org-curated text into retrievable context that agents can search at runtime. A customer-support bot that answers "how do I reset my SSO?" by looking up the org's onboarding guide is the canonical use; the dashboard is where humans see what got ingested, what got cited, and what didn't match.
The original v1 (epic #43) shipped the schema + the SDK + a thin index page. Epic #299 fleshed out the surface: paginated browser, ingestion-source configuration, query log, citation persistence, and the chat-shell chip strip that closes the loop ("the bot said X because it cited entry Y").
The data model #
Five tables, all org-scoped, all cascade-on-org-delete:
knowledge_entries ──┬──── knowledge_chunks (1:N, cascade)
├──── knowledge_citations (N:M via assistant turn)
└──── knowledge_sources? (FK SET NULL — see below)
knowledge_query_log (standalone log of search calls)knowledge_entries(db/src/schema/knowledge-entries.ts) — the human-readable record.title+content+tags[]+type(open enum-by-convention:note | rule | pattern | reference | decision | …; new categories don't need a migration). Carriesembedding_status(ready | pending | error, CHECK-locked) so the dashboard's status pill comes from a real column rather than being inferred from chunk count.knowledge_chunks(db/src/schema/knowledge-chunks.ts) — the embedded pieces.embedding vector(1536)for OpenAI'stext-embedding-3-small; HNSW (m=16,ef_construction=64) for the ANN leg + GIN-on-to_tsvector('english', content)for the BM25 leg.organization_iddenormalized off the entry so the search path stays single-table on the hot loop.knowledge_sources(db/src/schema/knowledge-sources.ts) — managed-ingestion configuration. One row per configured source;kindis locked tourl_crawl | file_upload | mcpvia CHECK. Entries produced by a source carrysource_id(FK SET NULL — deleting a source preserves the ingested content but unlinks it).knowledge_citations(db/src/schema/knowledge-citations.ts) — one row per (assistant turn × cited chunk) pair. Cascade on every FK (org / message / entry / chunk) so deletes sweep cleanly. Documented analytics consequence: deleting an entry drops its historical citations — the table is "what's working now", not a permanent audit log.knowledge_query_log(db/src/schema/knowledge-query-log.ts) — one row per search call.originlocked tosdk | dashboard | clivia CHECK; optionalagent_deployment_id(SET NULL on delete) links SDK-origin queries back to the deployment that issued them.
The dashboard #
AppShell mounts the knowledge surface under /knowledge. Three
pages.
/knowledge — the entries browser
(app/app/knowledge/page.tsx).
Paginated table over GET /v1/orgs/:slug/knowledge-entries. Filters
(URL-driven, plain <form method="get">):
- q — full-text across title + chunk content.
- source_type —
COALESCE(s.kind, e.type)— managed-source entries surface their kind (url_crawl/file_upload/mcp), manual entries surface theirtype(note/rule/ …). - embedding_status —
ready | pending | error. Aggregations live inside the same WHERE so date / status / source narrowing narrows the totals too. - from / to — ISO 8601 bounds on
created_at.
Pagination is cursor-based: tuple-base64url over (createdAt, id),
over-fetch by 1 to detect hasMore. Mirrors the agent-runs pattern
from observability epic #270.
Filter changes drop the cursor so a stale page-2 marker doesn't cling
to a narrowed window.
The cites 30d column reads from knowledge_citations (org-scoped,
30-day window) — zero is "0" (entry seen but never cited), not "—".
Different signal from "data not available".
/knowledge/[id] — entry detail
(app/app/knowledge/[id]/page.tsx).
Renders the entry header + an expandable chunks list (200-char
preview → full body on click), embedding-status pill, per-chunk
character count + provider, and the citations in last 30d row.
For entries linked to a managed source, surfaces the source kind
chip + a view original ↗ link when source_url is set.
Raw embeddings vectors are not exposed — 1536 floats is noise for a
human reader. The embedding_provider column tells you which model
produced the vector (openai/text-embedding-3-small, etc); a
debugging story for "why didn't this chunk match?" can grow a
per-chunk re-similarity probe later if anyone asks.
/knowledge/sources — managed-ingestion configuration
(app/app/knowledge/sources/page.tsx).
Elevated-only (owners + admins, same gate as the existing entry
DELETE). Add / remove / re-run dialogs over the
/v1/orgs/:slug/knowledge-sources endpoints. Last-run pill
(ok | partial | error) renders from the column directly. Re-run
now enqueues an immediate ingestion run — see Ingestion below for
what's wired today vs. deferred.
The search API #
GET /v1/orgs/:slug/knowledge/search?q=...&limit=20&types=rule,pattern
(api/src/routes/knowledge.ts).
Hybrid retrieval: embed the query once, run vector + keyword in
parallel, blend with Reciprocal Rank Fusion.
- Vector leg — cosine over
knowledge_chunks.embedding. HNSW index, org-filtered. - Keyword leg —
plainto_tsquery+ts_rank_cdoverto_tsvector('english', content). Same GIN index. - Blend — RRF with
k=60(the standard recipe). Each leg fetchesmin(limit*3, 100)so the rank-fusion has overlap to work with; the blender returns the toplimit(default 20, max 100).
Pre-filtering: ?types= narrows to specific entry types (or
COALESCE-d source kinds, post-PR-2). Defensive caps: q capped at
1000 chars (paste-accident guard), limit clamped to [1, 100].
Each call writes a knowledge_query_log row. The optional ?origin=
query param (sdk | dashboard | cli) tags which surface issued the
search; defaults to sdk — the dominant caller; dashboard + CLI
surfaces stamp explicitly. The write is fire-and-forget — analytics
infra can never block the search response. ?agent_deployment_id=
links SDK-origin queries back to the deployment that issued them.
The dashboard search bar (app/app/knowledge/KnowledgeSearchBar.tsx) calls the same endpoint as the SDK. By design — parity between "what the agent retrieves" and "what the human sees when they search the same query" is the whole point of the dashboard.
Calling search from an agent (developer) #
import { defineAgent, createMemoryClient } from "@stech/agent";
export default defineAgent({
name: "support",
model: "claude-sonnet-4-6",
async run({ input }) {
const memory = createMemoryClient({
endpoint: process.env.STECH_API_URL!,
apiKey: process.env.STECH_API_KEY!,
orgSlug: process.env.STECH_ORG_SLUG!,
});
const hits = await memory.knowledge.search(input.text, {
limit: 5,
types: ["rule", "pattern"],
});
// hits: { entryId, chunkId, score, content, entryTitle? }[]
},
});The runtime exposes the same client at memory.knowledge.search(...)
inside the agent. When the agent runs through run-stream, the
runtime emits a citation SSE frame for each hit that lands in the
model's prompt context — the chat shell renders chips, the api
proxy persists rows to knowledge_citations. See Citation
contract below for the wire shape.
Ingestion configuration #
knowledge_sources ships three kinds:
| kind | config shape |
source of knowledge_entries |
|---|---|---|
url_crawl |
{ url, depth?, prefixes? } |
one entry per fetched page (source_url set) |
file_upload |
{ filename, r2Key } (uses r2.ts) |
one entry per uploaded file |
mcp |
{ mcpSourceId, resourcePattern? } |
one entry per matched MCP resource |
Optional cron_expression schedules automatic re-fetch passes — if
you set a cron schedule, the runner reads it; null means manual-run
only via the /run endpoint. There is no built-in default schedule.
Last-run state (last_run_at, last_run_status, last_run_error,
last_run_entry_count) gets stamped after each pass — the dashboard
reads these columns directly.
The CRUD surface is at /v1/orgs/:slug/knowledge-sources
(api/src/routes/knowledge-sources.ts):
GET is any-member, write paths (POST / PATCH / DELETE /
POST .../:id/run) are elevated-only.
The actual ingestion runner is deferred. PR-2 shipped the
table, the api, the dashboard. The /run endpoint stamps
last_run_at = now() + last_run_status = 'ok' so the dashboard
contract is stable from day one — but the actual URL-crawl /
R2-fetch / MCP-resource-list pass is a follow-up. Until that lands,
configured sources are an organizing primitive (you can stamp
existing knowledge_entries with a source_id manually) but
re-run now doesn't re-fetch any content.
When the runner lands, two existing pieces come into play:
- SSRF guard —
url_crawlURLs go through api/src/lib/ssrf-guard.ts (same multi-layer guard ascompute-shafor CLI sources): hostname blocklist at validate time, DNS-resolution-time guard at fetch time,redirect: "manual"to block 3xx-to-private-IP bounces. - R2 storage —
file_uploadreuses api/src/lib/storage/r2.ts. Same shape MCP file artifacts use.
Citation contract #
The chat shell renders a footnote-style chip strip
([1] Entry title) below assistant turns that cited knowledge
entries. Click → /knowledge/<entryId>. End-to-end:
1. The MCP server formats hits with a structured header. cli/src/runtime/mcp/server.ts emits one text block per hit:
[entry <entryId> · chunk <chunkId> · score <s>] <title?>
<chunk content>2. The runtime parses the header and emits a citation frame.
After a successful knowledge_search tool result, the agent loop
(runtime/src/agent/loop.ts) matches
each line against CITATION_HEADER_RE and emits a
{type:"citation", entryId, chunkId, score, entryTitle?} frame on
the SSE stream. Tool-name match (knowledge_search) is the gate
so non-knowledge tools never spuriously cite.
3. The api proxy tees the SSE stream and persists. The
run-stream handler (api/src/routes/agents.ts)
accumulates citation frames per stream; once
persistConversationTurn returns the new agent_message_id, it
calls writeCitations
(api/src/routes/knowledge-citations.ts)
which bulk-inserts to knowledge_citations. Best-effort — a write
failure logs but doesn't break the chat surface.
4. The chat shell renders the chips live + on rehydrate.
app/app/agents/[id]/MessageList.tsx
accumulates citations in the streaming reducer (de-duped by
chunkId) and renders <CitationStrip> under each assistant
bubble. Page reload re-hydrates from
GET .../conversations/:cid (joins knowledge_citations →
knowledge_entries per assistant message, returns
{citations: [...]} per row).
Opt-in semantics. The runtime emits citations only when the model received the search results in its prompt context for that turn — explicit attribution from the search call, not post-hoc substring-matching. Post-hoc attribution by chunk-id matching is brittle (the model can paraphrase, edit, or omit chunks); opt-in is cleaner because the SDK already gives the runtime the chunk ids on each search call, so the data is right there.
Sync /run doesn't emit citations. Only /run-stream has the
SSE channel. The chat shell uses /run-stream exclusively; /run
is the legacy fallback for older runtime deployments. Acceptable
gap, documented inline in the run-stream handler.
Operator runbook #
- Add a manual entry —
/knowledge/new→ fill title + content + optional type/tags, submit. The api inserts withembedding_status='pending', embeds, flips toready(orerrorif the embedder is down — the entry persists either way so you can retry without losing the content). - Re-index a single entry — there is no PATCH yet. Today: delete
- recreate from
/knowledge/new. The entry id changes (and any citation FKs cascade-delete) — acceptable for v1; the PATCH path is tracked as a known gap in #299.
- recreate from
- Configure a managed source —
/knowledge/sources→ add source. Fill kind + label + kind-shaped config + optional cron. Save. The runner is deferred (see Ingestion configuration) so re-run now stamps the contract but doesn't fetch content yet. - Surface a content gap — analytics for zero-result queries lives
in
knowledge_query_log(WHERE result_count = 0is the gap signal). The/knowledge/insightsUI page is deferred; query the table directly until it ships. - Why didn't the bot cite the right entry? — open the agent
conversation in
/agents/<id>/conversations/<cid>. Each assistant turn shows its citation strip; click through to the cited entry to see the chunk that ranked. If no chips render, the model didn't receive anymemory.knowledge.search()results in that turn — either the agent didn't call search, or the call returned[](checkknowledge_query_logfor the matching row +result_count=0).
Multi-agent visibility #
Knowledge is org-scoped, not agent-scoped. Agent A's runs cite
from the same knowledge_entries pool as agent B's runs in the same
org. Per-agent scoping (some agents see only some entries) is
explicitly out of scope for this epic — it would need a
knowledge_entry_agent_scope join + a new auth predicate, large
enough for its own epic.
In practice this matches the natural unit of curation: org-level knowledge ("our docs", "our runbooks") is what customers actually maintain, and the simpler model means there's one place to look when asking "what does the agent know?".
Limitations (v1) #
- Ingestion runner is deferred. Sources can be configured + the re-run button stamps the dashboard contract, but URL crawl / R2 fetch / MCP resource fetch don't actually pull content yet. Track in #299.
- No PATCH on entries. Re-indexing requires DELETE + re-POST, which loses the entry id and cascade-deletes any historical citations. Captured in #299; the PATCH endpoint is a follow-up.
- English-only FTS.
to_tsvector('english', content)is hardcoded; multi-language search needs a migration + index rebuild. - OpenAI
text-embedding-3-smallonly. The chunks table is pinned tovector(1536). Wider providers would land in a separate column when added — pluggableEmbedderinterface (api/src/lib/embeddings/) makes the change non-breaking on the runtime side. - No
/knowledge/insightspage yet.knowledge_query_log+knowledge_citationsare populated on every search / run; the read views (top queries, top-cited entries, zero-result queries) ship later. Today: query the tables directly. - No retention worker yet.
knowledge_query_logandknowledge_citationsgrow unbounded until the daily cleanup job lands (default 90d per epic decision). Indexes are wired so the eventualDELETE WHERE created_at < now() - interval '<N> days'is cheap. - Citations cascade-delete with entries. Deleting a
knowledge_entriesrow drops its citation history. The analytics story is "what's working now", not a permanent audit log.
Related #
- Persisted conversations — the chat shell that renders the citation chip strip.
- Agent runs — the run-stream proxy that tees
citation frames into
knowledge_citations. - CLI tool sources — same SSRF guard +
R2-storage shape the
url_crawl/file_uploadingestion kinds reuse. - Observability — the observability epic the entries-browser pagination pattern came from.