Chinese App Review Pain Point Summary

Can the agent summarize messy Chinese app reviews without inventing pain points?

中文Redactionhallucinated_issue

Agent prompt summary

Extract pain points, counts, severity, representative comments, and three product suggestions.

Rubric summary

Must merge similar issues, count accurately, cite evidence, and avoid unsupported suggestions.

Task leaderboard

OpenAI Main890% critique
Qwen Main850% critique
Claude Main830% critique
Gemini Main800% critique
DeepSeek Main7933% critique
Grok Main7733% critique

Common failure tags

hallucinated_issueweak_ctaliteral_translationunsupported_claimunsafe_refund_promise