Evals & quality
Evals are the test cases, judges, A/B experiments, and alerts that keep agents on-brief.
- Golden sets are versioned test cases; eval runs replay them against any config.
- A/B tests compare two configs in production traffic with a confidence bound.
- Alert rules trigger on metric thresholds and feed the suggestions inbox.
Related
Press ? from anywhere in the dashboard to open this drawer.