Skip to content

Compounding Memory: Pages

A page is the smallest unit of compiled memory — the harness’s structured-knowledge primitive. When you ask your agent “what do you know about Austin?”, it opens a page. When you tap a memory in the mobile app, you can see the pages that memory helps build. Everything in the wiki system revolves around the page. Pages are also where the Traceability operating guarantee shows up at the knowledge tier: every page links back to the source memories that wrote it, every section is dated, every mention is resolvable — there is no “where did the agent get this” mystery.

This doc walks through what’s inside a page, and the supporting cast of aliases, links, mentions, and sources that make pages useful.

A page has:

  • A type — entity, topic, or decision. This determines what sections it gets by default.
  • A title — human-readable. “Austin, Texas.”
  • A slug — URL-safe. “austin-texas.”
  • A summary — a one-liner. “The city in Texas where Eric lives.”
  • A body — longer-form markdown assembled from the sections below.
  • A parent page id — set when this page was promoted from a section on another page. See Hierarchy below.
  • A hubness score — a coarse monotonic number that orders pages by “how much of a hub is this.” Recomputed whenever the page is touched.
  • A set of tags — soft hints used for clustering, never structural forcing. Populated by the aggregation planner from record metadata.
  • A last-compiled timestamp — when the agent last touched this page.
  • A status — active or archived.

Every page belongs to exactly one agent. Two agents in the same workspace can each have a page called “Austin, Texas” and those are two different pages, not the same one.

Sections are the atomic unit of content. The compile pipeline never rewrites a whole page — it patches individual sections.

The default sections depend on page type:

Entity pages get overview, notes, visits, and related. “Austin, Texas” would have an overview of what the city is, notes from conversations, a history of your visits, and links to related topics.

Topic pages get summary, highlights, related_entities, and recent. “Austin Kids Activities, April 2026” would summarize what’s on during that week, highlight standout options, list the specific places or events, and keep a rolling log.

Decision pages get context, decision, rationale, and consequences. These are structured the way a decision log typically is: here’s the situation, here’s what we chose, here’s why, here’s what happened as a result.

The planner isn’t locked into these defaults. If a batch of memories clearly justifies a section called food_recommendations on an entity page, the planner can propose it and the compiler will write it. The defaults just bias toward consistency.

Some sections are rollups — their body is a list of links to other pages in the scope. When the aggregation pass emits parentSectionUpdates with linked_page_slugs populated, the compiler skips the section writer and renders the body deterministically:

- [**Franklin Barbecue**](/wiki/entity/franklin-barbecue) — BBQ joint in Austin, TX.
- [**Momofuku Daishō**](/wiki/entity/momofuku-daisho) — Korean-inspired restaurant in Toronto.

Every bullet is a real markdown link pointing at /wiki/<type>/<slug> — the mobile router path. Rollup sections also carry aggregation metadata: the list of linked page ids, the count of supporting records, the sections’s promotion score, and the timestamps of first and last evidence. That metadata is what drives section promotion (next section).

When a rollup section gets dense, coherent, and durable enough — more than about 15 linked children, 20+ supporting records, 30+ days of evidence, and visible tag overlap — the aggregation pass promotes it into its own topic page.

The promotion:

  1. Creates a new topic page with the promoted section’s content.
  2. Sets parent_page_id on the new page, pointing back at the original.
  3. Writes parent_of / child_of link rows so graph traversal works both directions.
  4. Rewrites the parent’s section body down to a short summary + top highlights + a See: link to the promoted child.
  5. Marks the parent section’s aggregation.promotion_status = "promoted" — sticky, no flapping.

An example from Marco’s wiki: Austin (parent topic) → Austin Activities (promoted child topic) was born when the Activities section on Austin crossed the threshold.

Pages can have a parent. parent_page_id is set when the page was promoted from a section on another page. The full hierarchy is also mirrored in wiki_page_links as parent_of / child_of edges — the kind column on a link row captures which role it plays:

  • reference — the ordinary “A mentions B” wikilink.
  • parent_of — parent → promoted child.
  • child_of — promoted child → parent.

Hierarchy is deliberately emergent, not declared. The planner doesn’t decide “this is a hub page”; the aggregation pass notices that a section has become dense enough to deserve its own page and promotes it. The parent keeps a summary and a link to the child; the child becomes its own compounding surface that can itself grow sections and eventually promote again.

An alias is a name that points to a page.

The page “Austin, Texas” has aliases “austin”, “austin texas”, “austin tx”, and “atx”. When any of these show up in a conversation, the system knows they all mean the same page. Without aliases, the compiler would create three separate pages and never link them.

Aliases come from three places:

  • Compiler — seeded from the title when a page is first created, or emitted by the planner when it sees a new nickname.
  • Manual — an operator can add an alias through admin tooling (not yet exposed to end users).
  • Import — applied during bulk ingest, e.g. when the journal importer pulls in alternate place names from Google Places data.

When the leaf planner proposes a new page, the compiler tries to merge the proposal into an existing page before creating anything. The match runs in two passes: exact alias match first (against wiki_page_aliases in the same scope), then a Postgres pg_trgm similarity fallback at similarity() >= 0.85. The fuzzy pass is strictly same-type — an entity proposal can only merge into an existing entity, a topic into a topic. Cross-type fuzzy matches are always rejected.

The fuzzy path bumps a separate metric (fuzzy_dedupe_merges) from the exact path (alias_dedup_merged) because it carries the over-collapse risk and deserves its own time series.

Links are directed references between pages. “Taberna dos Mercadores” has a link to “Lisbon”; “Lisbon” has a link back to “Taberna dos Mercadores” (they’re different rows because links are directional).

Each link carries a kind:

  • reference (the default) — ordinary wikilink. “Taberna mentions Lisbon.”
  • parent_of — parent hub → promoted child. Written by setParentPage.
  • child_of — promoted child → parent. Written by setParentPage.

The planner proposes reference links whenever it sees one page meaningfully referenced from another. The compiler auto-writes parent_of / child_of pairs when a section is promoted. This is where the wiki starts to behave like a real knowledge graph — when you open “Austin” you see the child topics; when you open “Austin Activities” you can jump back to its parent.

On top of planner-emitted references, two deterministic emitters write additional reference edges each compile (LLM-free, gated on WIKI_DETERMINISTIC_LINKING_ENABLED):

  • Entity leaf page → matching city/journal parent topic, tagged context='deterministic:city:<slug>' or 'deterministic:journal:<slug>'.
  • Entity ↔ entity reciprocal edges for pages that share a memory_unit source, tagged context='co_mention:<memory_unit_id>', capped at 10 directed edges per memory.

The context tags exist so the deterministic rows can be dropped cleanly — a single DELETE FROM wiki_page_links WHERE context LIKE 'deterministic:%' OR context LIKE 'co_mention:%' scopes a rollback without touching planner-emitted references. See the pipeline doc for the rules, and docs/metrics/wiki-link-density.md in the repo for the rollback SQL.

Links only exist between pages in the same scope. An agent can’t link to another agent’s page, even in the same workspace.

When the planner sees a name it can’t confidently place, it doesn’t create a page. It holds the name as an unresolved mention.

The row tracks:

  • The alias as it appeared (“Chef João”).
  • A normalized version (“chef joao”) for matching.
  • How many times it’s been seen.
  • Up to five recent quoted contexts — snippets of prose that mentioned it.
  • A suggested type, if the planner has an opinion.
  • A status: open, promoted, or ignored.

When the same mention shows up for the third time within thirty days, the nightly lint sweep enqueues a promotion job. On the next compile, the planner produces a real page from the accumulated evidence, and the mention’s status flips to promoted.

This is deliberately slow. An aggressive wiki that created pages on first sight would be full of noise — every passing reference would get its own page, most of which would never be useful again. Holding names in the unresolved state until they earn a page keeps the wiki clean.

Every section tracks which memories were used to write it.

A source row says “this specific memory unit was a source for this specific section.” Not “this memory is related to this page.” Not “this memory was in the same batch as this page.” Specific and section-scoped.

Why this matters:

  • Trust. You can always trace a claim on a page back to the memories that produced it. If something looks wrong, you can read the source.
  • Reverse lookup. Given a memory, you can ask “which pages cite this?” That’s how the mobile memory surface shows “this memory helped build these pages.”
  • Blast radius. If a memory turns out to be wrong or needs to be forgotten, you know exactly which sections need rebuilding.

The pipeline’s single strictest rule is that provenance is never written speculatively. If the planner didn’t cite a memory as a source for a section, no source row gets written for that pair. The pipeline doc has the story of why this rule exists.

Two surfaces read from the wiki.

wikiSearch is Postgres full-text search over pages in one agent’s scope. Give it a query, get back pages ranked by how well the page’s title + summary + body match the query, with a bonus if any alias matches.

mobileWikiSearch uses the same compiled-page full-text search path, scoped to the signed-in user’s wiki, then preserves the mobile wire shape (matchingMemoryIds is currently []). It also supports normalized prefix matching so mobile searches can find pages from partial input like empan for “empanada”, while still using the GIN-indexed search_tsv column rather than slow semantic recall.

A page in archived status isn’t gone — it still exists in the database, its sections still exist, its source rows still exist. It just doesn’t show up in search or reads.

A recompile will resurrect an archived page when it touches the same slug. This is useful: you can archive pages wholesale to “reset” an agent’s wiki without losing anything, knowing the next compile will rebuild what the memories still support.

To actually purge a page — including its provenance rows — it has to be hard-deleted. Cascade constraints take the sections, aliases, links, and source rows with it.


All defined in packages/database-pg/src/schema/wiki.ts:

  • wiki_pages — the page rows. owner_id NOT NULL; unique on (tenant, owner, type, slug). Generated search_tsv tsvector column with GIN index for full-text search.
    • Hierarchy columns: parent_page_id uuid (self-ref, nullable), hubness_score int, tags text[].
    • Index on parent_page_id for child-page listing.
  • wiki_page_sections — one row per section. Unique on (page, section_slug). Has a nullable body_embedding vector(1024) column reserved for semantic retrieval, not populated in v1.
    • Aggregation metadata in a nullable aggregation jsonb column:
      {
      "linked_page_ids": ["<uuid>", "..."],
      "supporting_record_count": 12,
      "first_source_at": "2026-04-01T00:00:00Z",
      "last_source_at": "2026-04-19T00:00:00Z",
      "observed_tags": ["bbq", "food"],
      "promotion_status": "none|candidate|promoted|suppressed",
      "promotion_score": 0.62,
      "promoted_page_id": null
      }
      Leaf-style sections (overview / notes / visits) leave this NULL.
  • wiki_page_aliases — normalized alternate names. Unique on (page, alias). Index on alias backs reverse lookup. Migration 0015_pg_trgm_alias_title_indexes.sql adds pg_trgm GIN indexes on wiki_page_aliases.alias and wiki_pages.title for the compiler’s fuzzy-dedupe fallback (findAliasMatchesFuzzy, similarity ≥ 0.85).
  • wiki_page_links — directed (from_page_id, to_page_id, kind) triples. kind is 'reference' / 'parent_of' / 'child_of'; unique on the full triple so the same two pages can have a reference AND a hierarchy edge. Index on to_page_id backs wikiBacklinks; index on kind filters hierarchy vs references.
  • wiki_unresolved_mentions — accumulating mentions. Unique on (tenant, owner, alias_normalized, status) so the same alias can exist in open, promoted, and ignored states. Optional cluster jsonb column carries richer context (co_mentions, candidate_parent_page_id, observed_tags) for the aggregation pipeline.
  • wiki_section_sources — provenance. section_id → wiki_page_sections(id) ON DELETE CASCADE. Unique on (section, source_kind, source_ref). Index on (source_kind, source_ref) for reverse lookup.
  • wiki_compile_jobs — the job ledger. Metrics JSON captures leaf stats, aggregation stats (parent_sections_updated, sections_promoted, deterministic_parents_derived, aggregation_planner_calls, aggregation_input_tokens, aggregation_output_tokens, aggregation_error), and densification + dedupe stats (links_written_deterministic, links_written_co_mention, duplicate_candidates_count, alias_dedup_merged, fuzzy_dedupe_merges, deterministic_linking_flag_suppressed).
  • wiki_compile_cursors — per-scope pagination state, composite primary key (tenant_id, owner_id).
// packages/api/src/lib/wiki/templates.ts (shape)
entity: ['overview', 'notes', 'visits', 'related']
topic: ['summary', 'highlights', 'related_entities', 'recent']
decision: ['context', 'decision', 'rationale', 'consequences']
-- wikiSearch resolver
WITH alias_hits AS (
SELECT DISTINCT a.page_id, a.alias
FROM wiki_page_aliases a
INNER JOIN wiki_pages p ON p.id = a.page_id
WHERE p.tenant_id = $1 AND p.owner_id = $2 AND p.status = 'active'
AND (a.alias = $3 OR a.alias ILIKE '%' || $3 || '%')
)
SELECT p.*,
COALESCE(ts_rank(p.search_tsv, plainto_tsquery('english', $3)), 0)
+ (COALESCE(ts_rank(p.search_tsv, to_tsquery('english', $4)), 0) * 0.5)
+ CASE WHEN ah.page_id IS NOT NULL THEN 1.0 ELSE 0.0 END AS score,
ah.alias AS matched_alias
FROM wiki_pages p
LEFT JOIN alias_hits ah ON ah.page_id = p.id
WHERE p.tenant_id = $1 AND p.owner_id = $2 AND p.status = 'active'
AND (
p.search_tsv @@ plainto_tsquery('english', $3)
OR p.search_tsv @@ to_tsquery('english', $4)
OR ah.page_id IS NOT NULL
)
ORDER BY score DESC, p.last_compiled_at DESC NULLS LAST
LIMIT $5

Deleting a page cascades to sections, aliases, links, and source rows. Deleting a section cascades to its source rows. Deleting the page is the cleanest way to fully purge provenance.

Archived pages (status = 'archived') retain all rows including wiki_section_sources. A subsequent compile touching the same slug resurrects the page back to active via the markCompiled: true path in upsertPage. Hard-delete is for when you explicitly want provenance gone.