Reference library

AI Agent Glossary

A searchable long-term reference for AI agent evaluation, task types, failure tags, and business-safety terms.

ai-agent

AI Agent

An AI system that can follow goals, use context, and complete workflow steps.

agent-benchmark

Agent Benchmark

A repeatable test set for comparing AI agents on documented tasks.

critical-failure

Critical Failure

A failure that would be unsafe, misleading, unusable, or structurally invalid in real work.

structured-extraction

Structured Extraction

Turning unstructured text into reliable fields such as JSON, dates, amounts, and labels.

business-safety

Business Safety

The ability to avoid unsafe commitments, unsupported claims, and policy violations.

failure-tag

Failure Tag

A label that explains what went wrong in an agent output.

literal-translation

Literal Translation

A localization failure where the words are translated but the local business tone is wrong.

valid-json

Valid JSON

A structured output that can be parsed by software without repair.

multilingual-evaluation

Multilingual Evaluation

Testing agents across the languages and markets where they will actually be used.

leaderboard

Leaderboard

A ranking of agents by score, language, task type, or risk metric.

task-family

Task Family

A group of related evaluation tasks such as support, writing, or extraction.

unsafe-refund-promise

Unsafe Refund Promise

A support failure where an agent promises refund, credit, or compensation without authority.