Definition
A critical failure is not just a low-quality answer. It is a mistake that could harm a business workflow, such as unsafe refund promises, fabricated contract fields, unsupported security claims, or invalid JSON in automation.
A failure that would be unsafe, misleading, unusable, or structurally invalid in real work.
A critical failure is not just a low-quality answer. It is a mistake that could harm a business workflow, such as unsafe refund promises, fabricated contract fields, unsupported security claims, or invalid JSON in automation.
Average score can hide severe risks. Teams should inspect critical-failure rate before using an agent in customer-facing or compliance-sensitive work.
An agent tells a customer they will receive a refund even though the policy does not allow it.