Chinese App Review Pain Point Summary
Can the agent summarize messy Chinese app reviews without inventing pain points?
中文Writinghallucinated_issue
Agent prompt summary
Extract pain points, counts, severity, representative comments, and three product suggestions.
Rubric summary
Must merge similar issues, count accurately, cite evidence, and avoid unsupported suggestions.
Task leaderboard
| OpenAI Main | 89 | 0% critical |
| Qwen Main | 85 | 0% critical |
| Claude Main | 83 | 0% critical |
| Gemini Main | 80 | 0% critical |
| DeepSeek Main | 79 | 33% critical |
| Grok Main | 77 | 33% critical |
Common failure tags
hallucinated_issueweak_ctaliteral_translationunsupported_claimunsafe_refund_promise