Japanese Appointment Intent Classification
Can the agent classify short Japanese appointment messages into stable intent labels?
日本語Supportwrong_intent
Agent prompt summary
Classify messages as booking, cancellation, reschedule, pricing_question, or other.
Rubric summary
Must use only allowed labels and include short reasons.
Task leaderboard
| Claude Main | 92 | 33% critical |
| Qwen Main | 81 | 33% critical |
| OpenAI Main | 79 | 33% critical |
| Gemini Main | 79 | 33% critical |
| DeepSeek Main | 79 | 0% critical |
| Grok Main | 73 | 33% critical |
Common failure tags
wrong_intentmissed_dependencytoo_verbosewrong_date_formatliteral_translationinvalid_json