Evals · Golden sets

Golden conversation sets.

Curate named collections of test conversations per agent. Each case asserts a specific behaviour — must cite a doc, must call a tool, must refuse, must escalate, or any grounded answer — and the eval runner replays every case against the agent end-to-end.

New golden set

Filter by agent

All agents Support Agent

No golden sets yet. Create one to start curating eval cases.