Skip to main content

Auto-Optimize Agents

Once your agent basically works, the next question is not "does it run?" but "does it reliably perform well?"

That is where Codebolt's eval and optimization system fits.

Keep the full Evaluation & Optimization section separate in the docs, because the same system is used for more than agents:

agents
skills
capabilities
tools and MCP integrations
prompt and context strategies

But for agent authors, this is the natural next step after Testing and Debugging.

When to use optimization

Reach for optimization when:

your agent works, but quality is inconsistent
you want to compare prompt or model variants
you added tools or capabilities and want evidence they help
you want to reduce cost or latency without harming quality
you are preparing an agent for publishing or wider internal use

Do not start here. First make the agent correct enough to be worth measuring.

The practical sequence

For custom agents, the workflow is usually:

Build the agent.
Run it manually on real tasks.
Add tests and replay coverage.
Create an eval set from the kinds of tasks the agent should handle well.
Run optimization loops to compare variants.
Promote the winning version.

In short:

build -> test -> replay -> eval -> optimize -> publish

What you can optimize

For agents, common optimization targets are:

system prompt wording
model choice
decoding settings
tool allowlists
capability activation
context assembly choices

The goal is not "make it smarter" in the abstract. The goal is to improve a measurable outcome on a known task set.

Why this stays outside Creating Agents

The eval system is broader than agent authoring.

It is also the right place to measure:

whether a skill improves a task class
whether an MCP tool is called correctly
whether a capability helps or harms
whether a provider or model swap changes cost, latency, or quality

So the top-level Evaluation & Optimization section should stay separate. This page is just the bridge for agent builders.

Start here next

Evaluation & Optimization Overview — the full system
Replay and Traces — use real runs as eval material
Writing Evals — build a useful eval set
Optimization Loop — generate and compare variants
Metrics & Scoring — decide what "better" means

See also

When to use optimization
The practical sequence
What you can optimize
Why this stays outside Creating Agents
Start here next
See also