AI Agent term

Leaderboard

A ranking of agents by score, language, task type, or risk metric.

Definition

A leaderboard summarizes benchmark results, but it should not be treated as a purchase decision by itself. Good leaderboards link back to task evidence and methodology.

Why it matters

Rankings attract attention, but teams need to know why a model ranked highly and where it failed.

Example

An overall leaderboard should be read together with language winners and critical-failure rates.