Observability and Tracing

Finding out what happened — or what's happening right now — in a multi-agent run. Codebolt's event log plus the trace tools give you complete visibility.

The levels of observability

Four levels, from high-level to deepest:

Level	View	Best for
Run tree	`codebolt agent tree`	"What's the shape?"
Flow view	UI flow panel	"Which node is running?"
Phase trace	`codebolt agent trace`	"What did each step do?"
Event log query	`codebolt events query`	"Show me every X"

Start high-level; drill down only when needed.

Run tree

For hierarchies of parent and child runs:

codebolt agent tree <root-run-id>

Output:

run_xyz (orchestrator) ▶ running · 2m15s
├── run_aaa (planner) ✓ 12s
├── run_bbb (coder) ✓ 45s
│   └── run_eee (test-runner) ✓ 8s
├── run_ccc (reviewer) ▶ running · 18s
└── (pending)

Shows every descendant, their status, and elapsed time.

Flow view (UI)

For flows specifically. See Reading a Flow. Graph-style rendering with click-through to node detail.

Phase trace

For a specific run:

codebolt agent trace <run-id>
codebolt agent trace <run-id> --phase 3
codebolt agent trace <run-id> --type llm.chat
codebolt agent trace <run-id> --type tool.call
codebolt agent trace <run-id> --tail 20

The phase trace is the first place to look when something looks wrong. It's the ground truth.

Event log queries

For cross-cutting questions that span many runs:

"Show me all failing agents today"

codebolt events query 'type == run.ended and status == "failed"' --since "today"

"Which tool is slowest?"

codebolt events query 'type == tool.call' --since "1 day ago" --json | \
  jq -r '.tool + " " + (.duration_ms | tostring)' | sort -k2 -n | tail -20

"How much has agent X cost this week?"

codebolt events query 'type == llm.chat and agent == "my-agent"' --since "7 days ago" --json | \
  jq '[.[] | .cost_usd] | add'

"Every guardrail denial in the last hour"

codebolt events query 'type == guardrail.verdict and verdict == "deny"' --since "1 hour ago"

See Query the event log for the full query DSL.

Real-time watching

codebolt events watch
codebolt events watch --type run.started
codebolt events watch --filter "agent == 'reviewer'"
codebolt events watch --filter "descendent_of <run-id>"

Streams events as they happen. Good for "what is this flow doing right now".

Metrics (for self-hosted)

Prometheus metrics are exposed when enabled. Key dashboards:

Run throughput — runs/minute by agent, status.
LLM latency — p50/p95/p99 by provider, model.
Tool latency — p50/p95/p99 by tool.
Queue depth — backlog of agent runs waiting to start.
Event log ingest lag — how far behind ingestion is.

See Self-hosting → Scaling.

Distributed tracing (OpenTelemetry)

For deep trace analysis, Codebolt emits OpenTelemetry traces:

# codebolt-server.yaml
telemetry:
  otlp:
    endpoint: https://otlp.my-observability.com
    protocol: grpc

Traces from Codebolt span:

The initial HTTP/WS request.
The agent run.
Each LLM call.
Each tool call.
Each guardrail check.

Visualise in Jaeger, Tempo, Honeycomb, or any OTLP-compatible backend.

Debugging patterns

"Agent X is suddenly slow"

Compare recent trace timings to older ones (codebolt agent trace with older run IDs).
Is the slowness in LLM calls (provider issue)?
In tool calls (tool or plugin issue)?
In context assembly (memory/vector growing too big)?

The trace breakdown tells you where the time went.

"One run is making a burst of tool calls"

codebolt events query 'type == tool.call and run_id == "<id>"' --json | \
  jq '.[] | .tool' | sort | uniq -c | sort -rn

Shows the tool call frequency. If one tool dominates, the agent is stuck on it.

"Cost is exploding"

codebolt provider usage --since "1 day ago" --by agent

Identifies which agent is burning the most. Drill into its runs from there.

"Users report intermittent failures"

codebolt events query 'type == run.ended and status == "failed"' --since "today" --json | \
  jq 'group_by(.error) | map({error: .[0].error, count: length}) | sort_by(.count) | reverse'

Groups failures by error message. Often one root cause dominates.

The levels of observability​

Run tree​

Flow view (UI)​

Phase trace​

Event log queries​

"Show me all failing agents today"​

"Which tool is slowest?"​

"How much has agent X cost this week?"​

"Every guardrail denial in the last hour"​

Real-time watching​

Metrics (for self-hosted)​

Distributed tracing (OpenTelemetry)​

Debugging patterns​

"Agent X is suddenly slow"​

"One run is making a burst of tool calls"​

"Cost is exploding"​

"Users report intermittent failures"​

See also​