Skip to main content

Context and Memory

The single most important question for an agent is what does it know right now? That answer is rebuilt from scratch every turn by the Context Assembly Engine — a pipeline that selects, ranks, and token-budgets information from up to nine distinct memory stores before handing the final prompt to the LLM.

Codebolt treats memory as a two-sided system:

  • Read path — the Context Assembly Engine pulls from storage and composes a bounded context window each turn.
  • Write path — the Memory Ingestion pipeline listens for lifecycle events, processes raw data through a sequence of processors, and routes the results to one or more storage backends.

Both sides share the same storage layer; they just move in opposite directions.

The memory layers available to a Codebolt agentWORKINGone turnscratchpadEPISODICone runturn historyKVforeversmall key→valueJSONforeverstructured recordsMARKDOWNforeverhuman-editable notesKNOWLEDGE GRAPHforeverentities + relationsVECTORforeversemantic search

Context vs. memory

TermWhat it isLifetime
MemoryThe persistent substrate — all the stores shown aboveDurable across sessions
ContextThe bag of tokens passed to the LLM this turnEphemeral — discarded after the call

Context is assembled fresh each turn from memory + the current task + recent history. You can think of memory as the database and context as the query result.

The nine memory layers

Each layer has a distinct access pattern and is optimised for a different class of knowledge.

Working memory

A per-turn scratchpad. The agent can write intermediate notes or partial results here during a multi-step tool chain; the contents are discarded once the turn is complete. Nothing is persisted.

Episodic memory

Stores every turn of the current agent run — what the LLM said, what tools were called, what came back. The episodic record is what lets an agent say "three steps ago I fetched this file." It is bounded to the lifetime of a single run and is the primary source of short-term coherence.

Persistent KV store

Key → value storage backed by a simple JSON file. Suited to small, frequently-read values: user preferences, feature flags, counters, state that must survive across runs. The entire store is loaded eagerly into context whenever a context rule references it.

Persistent JSON store

Structured JSON documents with a named schema. Larger and more complex than KV. Useful for configuration objects, per-project metadata, or structured records that agents need to read and update programmatically.

Markdown notes

Human-editable .md files that both the agent and the developer can read and write. Useful for long-form notes, decisions, or shared knowledge that benefits from being readable without tooling. Notes live inside .codebolt/memory/ and are versioned with the project.

Knowledge graph (Kuzu)

An embedded Kuzu graph database stored at .codebolt/knowledgegraph/kuzu/. Entities (files, symbols, people, concepts) are nodes; relationships (calls, depends-on, authored-by) are edges. The graph supports Cypher queries so agents can answer structural questions — "which modules depend on this function?" — without reading every file.

The knowledge graph is populated by the llm_extract processor in the ingestion pipeline (see write path below) and queried via graph_view_read steps in persistent memory retrieval pipelines.

Vector store

Semantic embeddings stored at .codebolt/vectordb/. Any text — code chunks, conversation summaries, document passages — can be embedded and stored. At retrieval time the agent's current query is embedded and the nearest neighbours are fetched. This is the primary mechanism for "recall something relevant I encountered before" queries.

Event log

An append-only structured log of every significant event in the agent's lifetime: tool calls, errors, user messages, task transitions, memory writes. Used primarily for auditing and for log_search queries in retrieval pipelines. Also the source that drives the ingestion pipeline's onAction trigger.

Persistent memory (declarative retrieval)

Persistent memory is not a storage layer — it is a retrieval abstraction that sits above the other stores. A persistent memory definition is a YAML/JSON file stored in .codebolt/PersistentMemory/ that describes, declaratively, how to retrieve a particular class of knowledge.

Each definition contains a pipeline of typed steps:

Step typeWhat it does
derive_queryGenerates a search query from the current task context
vector_searchRuns a similarity search against the vector store
graph_view_readExecutes a named Kuzu graph view
kv_getReads one or more KV keys
log_searchFilters the event log by type, time, or field
filterDrops results that don't match a predicate
rankRe-orders results by relevance score
formatRenders results into a string for inclusion in context

The Context Assembly Engine discovers all persistent memory definitions, executes their pipelines concurrently, and merges the formatted outputs into the context window. This makes it easy to add a new class of recalled knowledge without touching the assembly engine itself — just add a definition file.

The read path: Context Assembly Engine

The context assembly pipelinesystem promptmemory: KGmemory: vectorepisodictask inputcompactredactrerankloop-detectCALL

The assembly pipeline runs once per turn, before the LLM call:

1 — Validate

The incoming task and any explicit @mention references are parsed and validated. Invalid memory references are surfaced as warnings rather than silently dropped so the agent knows its context is incomplete.

2 — Evaluate context rules

Context rules are JSON documents in .codebolt/ContextRuleEngine/. Each rule is a condition → action pair: if the condition matches the current task, the action adds or suppresses specific memory references.

Rules are evaluated with a small DSL that supports 12 operators:

OperatorMatches when…
containsField string contains the value
startsWithField string starts with the value
endsWithField string ends with the value
matchesField matches a regex
equalsField equals the value exactly
not_equalsField does not equal the value
inField value is in a set
not_inField value is not in a set
gt / lt / gte / lteNumeric comparison
existsField is present

Multiple conditions within a rule combine with AND logic. Rules are evaluated in definition order; the first matching rule wins.

Example: a rule that injects the project's architecture notes whenever the task mentions "refactor":

{
"condition": { "field": "task.text", "op": "contains", "value": "refactor" },
"action": { "include": "markdown://architecture-notes" }
}

3 — Execute persistent memory pipelines

All enabled persistent memory definitions are executed concurrently. Each pipeline resolves its steps against the current query derived from the task, returning a ranked, formatted list of relevant chunks. Steps like vector_search and graph_view_read can be expensive; results are cached with a short TTL so repeated turns on the same topic don't re-run identical queries.

4 — Merge and deduplicate

Outputs from all persistent memory pipelines, plus explicitly-included KV/JSON/Markdown references, are merged. Duplicate content (same chunk appearing in both a vector search result and an explicit @mention) is deduplicated by content hash.

5 — Apply contribution rules

Each memory source has a maximum contribution — a cap on how many tokens it can contribute to the final context. Contribution rules prevent a single noisy source (e.g., a verbose event log) from crowding out everything else. The rules are also stored in .codebolt/ContextRuleEngine/.

6 — Enforce token budget

The merged context is ordered by a fixed section priority:

state → warnings → constraints → knowledge → history → suggestions → working_memory

Sections are packed into the budget in order. If the budget is exhausted before all sections are included, lower-priority sections (suggestions, working_memory) are truncated first. The assembly engine emits a structured trace of which sections were included, how many tokens each consumed, and which were truncated — visible via WebSocket assembly events.

7 — Hand off to LLM

The packed context window is passed to the LLM along with the current turn's user message.

The write path: Memory Ingestion

Memory ingestion write path: trigger → processors → routing → storageTRIGGERPROCESSORSROUTINGSTORAGEonConversationEndonTaskCompletedonTaskFailureonActionmanualINGESTION PIPELINEchunkervector_embedllm_extractnormalizercustomROUTING RULESforeach · conditionwrite_to destinationtemplate variablesresolved at runtimeKnowledge GraphVector DBKV StoreEvent LogBlobPipelines fire on lifecycle events · processors run in sequence · routing writes to one or more storage backends

Memory is written by the Ingestion Pipeline — a configurable sequence of processors triggered by agent lifecycle events and routed to one or more storage backends.

Triggers

TriggerFires when…
onConversationEndThe agent's current conversation completes
onTaskCompletedA task transitions to the completed state
onTaskFailureA task fails or times out
onActionA tool call or significant action is logged
manualAn agent or human explicitly calls the ingestion API

Multiple triggers can activate the same pipeline. Triggers are defined in .codebolt/MemoryIngestion/ alongside the pipeline configuration.

Processors

Once triggered, the raw input passes through a sequence of processors in order. Each processor transforms or augments the data:

ProcessorWhat it does
chunkerSplits large inputs into overlapping chunks of a configured token size
vector_embedEmbeds each chunk and writes the embeddings to the vector store
llm_extractCalls the LLM to extract structured entities and relationships; writes to the knowledge graph
normalizerCleans, trims, and standardises text (whitespace, encoding, PII redaction)
parserParses structured input formats (code AST, JSON, Markdown) into normalised records
customAn arbitrary TypeScript function registered via the SDK — full access to the pipeline context

Processors run in sequence; later processors see the output of earlier ones. A pipeline that both embeds and graph-extracts will run chunker → normalizer → vector_embed → llm_extract so the graph extractor operates on already-normalised chunks.

Routing rules

After processing, routing rules decide where to write. A routing rule is a foreach · condition · write_to triple:

  • foreach — iterate over items (e.g., each chunk produced by the chunker)
  • condition — only write if a predicate is true (e.g., score > 0.7)
  • write_to — a destination reference with optional template variables resolved at runtime (e.g., kv://sessions/{{sessionId}})

One pipeline run can write to multiple destinations simultaneously.

Destinations

DestinationStorage backend
Knowledge GraphKuzu embedded graph (graph://)
Vector DBVector store (vector://)
KV StoreKey-value store (kv://)
Event LogAppend-only event log (log://)
BlobRaw file storage (blob://)

Storage locations on disk

All memory configuration and data lives inside the project's .codebolt/ directory:

PathContents
.codebolt/PersistentMemory/Declarative retrieval pipeline definitions
.codebolt/MemoryIngestion/Ingestion pipeline and trigger configuration
.codebolt/ContextRuleEngine/Context rules (conditions, contribution caps)
.codebolt/memory/episodic/Per-run turn history
.codebolt/memory/context/Assembled context snapshots (debug)
.codebolt/vectordb/Vector embeddings
.codebolt/knowledgegraph/kuzu/Kuzu graph database

Because everything is inside the project directory, memory configuration is version-controlled alongside code. Checking out a branch gives you that branch's memory rules, ingestion pipelines, and persistent memory definitions — not just its code.

Observability

The Context Assembly Engine emits structured WebSocket events during assembly. Each event carries:

  • which memory sources were queried
  • how many tokens each contributed
  • which sections were truncated and why
  • the total assembly latency

These events are surfaced in the Codebolt UI as an "assembly trace" alongside the turn — useful for diagnosing why an agent made an unexpected decision.

Why this matters

Bad context = bad answers. A confused agent is almost always a context problem — too much, too little, or the wrong information at the wrong moment. The fix is rarely the prompt; it's usually the context rules or the ingestion pipeline. Understanding the read/write duality gives you the right lever to pull.

See also