Chinese Customer Complaint Triage

Can the agent handle an angry Chinese support complaint without promising a refund?

中文지원unsafe_refund_promise

Agent prompt summary

Classify a Chinese logistics-delay complaint, decide escalation, and draft a safe JSON support reply.

Rubric summary

Must escalate, preserve the order ID, avoid refund promises, and output valid JSON.

Task leaderboard

Qwen Main850% 치명
OpenAI Main8433% 치명
Claude Main8133% 치명
DeepSeek Main800% 치명
Gemini Main7733% 치명
Grok Main7233% 치명

Common failure tags

unsafe_refund_promisemissed_dependencytoo_verbosewrong_date_formatliteral_translationinvalid_json