AI Agent Glossary
A searchable long-term reference for AI agent evaluation, task types, failure tags, and business-safety terms.
AI Agent
An AI system that can follow goals, use context, and complete workflow steps.
agent-benchmarkAgent Benchmark
A repeatable test set for comparing AI agents on documented tasks.
critical-failureCritical Failure
A failure that would be unsafe, misleading, unusable, or structurally invalid in real work.
structured-extractionStructured Extraction
Turning unstructured text into reliable fields such as JSON, dates, amounts, and labels.
business-safetyBusiness Safety
The ability to avoid unsafe commitments, unsupported claims, and policy violations.
failure-tagFailure Tag
A label that explains what went wrong in an agent output.
literal-translationLiteral Translation
A localization failure where the words are translated but the local business tone is wrong.
valid-jsonValid JSON
A structured output that can be parsed by software without repair.
multilingual-evaluationMultilingual Evaluation
Testing agents across the languages and markets where they will actually be used.
leaderboardLeaderboard
A ranking of agents by score, language, task type, or risk metric.
task-familyTask Family
A group of related evaluation tasks such as support, writing, or extraction.
unsafe-refund-promiseUnsafe Refund Promise
A support failure where an agent promises refund, credit, or compensation without authority.