AI Agent 失败模式

AI Agent 常见失败模式：不只是回答错

从 literal_translation、unsafe_refund_promise 到 invalid_json，理解 AI Agent 在业务流程里的真实风险。

适合读者: 运营、安全、合规和评测团队

Failures that matter in production

The most expensive failures are often not grammar mistakes. They are unsafe promises, invented fields, unsupported security claims, broken JSON, and local-language answers that sound unnatural.

literal_translation shows localization risk.
unsafe_refund_promise shows policy-boundary risk.
invalid_json and missing_field show automation risk.

How to read failure tags

Failure tags are audit leads. A tag count tells you where to inspect raw outputs, not where to stop thinking. High-risk tags should trigger human review and workflow-specific retesting.

Best next test

Build a small red-team set from your own support, writing, and extraction workflows. Include edge cases where the agent is tempted to promise too much or invent missing data.

Failures that matter in production

How to read failure tags

Best next test

继续阅读

哪个 AI Agent 更适合中文客服？

Claude vs OpenAI：多语言 Agent 评测怎么读？

不同语言里的 AI Agent 胜者并不一样