AI Agent term

Critical Failure

A failure that would be unsafe, misleading, unusable, or structurally invalid in real work.

Definition

A critical failure is not just a low-quality answer. It is a mistake that could harm a business workflow, such as unsafe refund promises, fabricated contract fields, unsupported security claims, or invalid JSON in automation.

Why it matters

Average score can hide severe risks. Teams should inspect critical-failure rate before using an agent in customer-facing or compliance-sensitive work.

Example

An agent tells a customer they will receive a refund even though the policy does not allow it.