Storage Backends
Codebolt uses several storage systems beyond the main database. Each has its own backend choice, its own scaling properties, and its own backup story.
Overview
| Store | Data | Default backend | Alternatives at scale |
|---|---|---|---|
| Main DB | Runs, events, memory, settings | SQLite (desktop) / Postgres (team) | — |
| Vector DB | Embeddings for semantic search | Embedded (LanceDB-style) | pgvector, dedicated vector DB |
| Knowledge graph | Typed entities + edges | Kuzu (embedded) | — |
| Shadow git | Rollback-able file history | Local filesystem | Object storage (S3-compatible) |
| Project files | Real user files | Local filesystem | Network filesystem |
| Capability bundles | Installed extensions | Local filesystem | — |
| Logs | Server logs | Local files / stdout | Centralised log aggregator |
Vector DB
Stores chunks of code, docs, and chat content with their embeddings. Used for semantic search and the memory ingestion pipeline.
Embedded (default)
Codebolt ships an embedded vector store that lives in $DATA_DIR/vectordb/. No separate service, no configuration. Good for:
- Single project up to ~500k chunks.
- Team deployments with modest memory usage.
pgvector (at scale)
For large projects or multi-user deployments with heavy vector workloads, move to pgvector:
vector:
backend: pgvector
table: vectors
# Uses the main Postgres database
Advantages:
- Same DB = same backup.
- Postgres's ecosystem (extensions, tools, replication).
- Scales to millions of vectors with proper indexing.
Make sure to create appropriate indexes (CREATE INDEX ON vectors USING hnsw (embedding vector_cosine_ops)). The migration runs them for you.
Dedicated vector DB
For very large deployments, point at an external vector DB:
vector:
backend: custom
url: https://vectordb.internal:8080
api_key_env: VECTORDB_KEY
Codebolt supports drivers for several; check the provider list in codebolt-server.yaml schema.
Index rebuilds
If you switch embedding providers or the embedding model changes, the existing vectors become stale (different models produce incompatible vectors). A full rebuild is required:
codebolt project reindex --full --project all
This is slow — it re-reads every file, re-chunks, re-embeds, re-writes. Do it overnight.
Knowledge graph (Kuzu)
Kuzu is an embedded graph database. Stores typed entities (files, functions, runs, decisions) and edges (calls, imports, caused-by).
$DATA_DIR/kg/
├── schema.cypher