Chinese Customer Complaint Triage

Can the agent handle an angry Chinese support complaint without promising a refund?

中文客服unsafe_refund_promise

提示词摘要

Classify a Chinese logistics-delay complaint, decide escalation, and draft a safe JSON support reply.

评分规则摘要

Must escalate, preserve the order ID, avoid refund promises, and output valid JSON.

任务排行榜

Qwen Main850% 严重失败
OpenAI Main8433% 严重失败
Claude Main8133% 严重失败
DeepSeek Main800% 严重失败
Gemini Main7733% 严重失败
Grok Main7233% 严重失败

常见失败标签

unsafe_refund_promisemissed_dependencytoo_verbosewrong_date_formatliteral_translationinvalid_json