Chinese App Review Pain Point Summary

Can the agent summarize messy Chinese app reviews without inventing pain points?

中文Texthallucinated_issue

Agent prompt summary

Extract pain points, counts, severity, representative comments, and three product suggestions.

Rubric summary

Must merge similar issues, count accurately, cite evidence, and avoid unsupported suggestions.

Task leaderboard

OpenAI Main890% kritisch
Qwen Main850% kritisch
Claude Main830% kritisch
Gemini Main800% kritisch
DeepSeek Main7933% kritisch
Grok Main7733% kritisch

Common failure tags

hallucinated_issueweak_ctaliteral_translationunsupported_claimunsafe_refund_promise