Chinese App Review Pain Point Summary

Can the agent summarize messy Chinese app reviews without inventing pain points?

中文작성hallucinated_issue

Agent prompt summary

Extract pain points, counts, severity, representative comments, and three product suggestions.

Rubric summary

Must merge similar issues, count accurately, cite evidence, and avoid unsupported suggestions.

Task leaderboard

OpenAI Main890% 치명
Qwen Main850% 치명
Claude Main830% 치명
Gemini Main800% 치명
DeepSeek Main7933% 치명
Grok Main7733% 치명

Common failure tags

hallucinated_issueweak_ctaliteral_translationunsupported_claimunsafe_refund_promise