Evaluation & Optimization
Evaluate how well your agents, skills, and action blocks perform on specific tasks — then automatically optimize them using an agent-driven improvement loop.
What It Does
- Define experiments — tasks with instructions, environments, and evaluators.
- Run subjects (agents, skills, MCPs, action blocks) against those experiments.
- Score results using weighted evaluators (string matching, script, agent-judge, deliberation).
- Optimize automatically — an optimizer agent reads the subject's code, makes targeted changes, re-evaluates, and keeps improvements.
Architecture
Key Concepts
| Concept | What it is |
|---|---|
| Task (Experiment) | Defines what to test: instruction, environment, evaluators, optional optimization |
| Subject | The thing being evaluated: agent, skill, action-block, capability, or MCP |
| Suite | A folder grouping related tasks |
| Run | Executes subjects against tasks, produces scored results |
| Evaluator | Scores the subject's output (expected-output, script, agent-judge, deliberation) |
| Optimization | Agent-driven iterative improvement of the subject |
Subject Types
| Type | What it is |
|---|---|
agent | An installed agent |
skill | A skill |
action-block | An action block |
capability | A capability |
mcp | An MCP server |
Data Storage
All eval data is stored as JSON files in .codebolt/evals/:
.codebolt/evals/
├── index.json
├── tasks/
├── suites/
└── runs/
Workflow
- Open the Eval Panel in Codebolt (Experiments tab).
- Create an experiment — define instruction, environment, evaluators.
- Switch to Runs tab, create a run — select subjects.
- Click Start — subjects execute, evaluators score, results update in real time.
- Optionally enable optimization — optimizer agent iterates to improve scores.
- Review the leaderboard — ranked subjects by score.
See Also
- Creating Experiments — define tasks, instructions, environments
- Evaluators — configure scoring methods
- Optimization Loop — agent-driven iterative improvement
- Running Evals and Results — execute runs, view results