Claude Main
按该场景的语言和任务类型筛选后,当前 preview 数据里的最高分候选。
87比较低成本 Agent 在预算敏感自动化里的可用性,同时保留失败风险视角。
适合读者: 创业团队、内部工具团队和成本敏感自动化团队
按该场景的语言和任务类型筛选后,当前 preview 数据里的最高分候选。
87优先按严重失败率排序,再参考总分。
优先考虑成本档位,再参考场景分数。
这个页面不是替代人工评审,而是把排行榜切成更接近真实采购和上线决策的问题。上线前仍应检查原始输出、业务边界和模型版本。
| Chinese Customer Complaint Triage | Qwen Main | 85 |
| Chinese App Review Pain Point Summary | Kimi Main | 92 |
| Chinese Contract Field Extraction | Qwen Main | 96 |
| Chinese Sales Call Summary | Qwen Main | 96 |
| Chinese Invoice Dispute Reply | OpenAI Main | 85 |
| SaaS Landing Page Hero Rewrite | OpenAI Main | 93 |
| Meeting Notes Action Item Extraction | OpenAI Main | 89 |
| Refund Policy Boundary Reply | OpenAI Main | 96 |
| English Security Questionnaire Answer | OpenAI Main | 96 |
| English Churn Risk Email | Claude Main | 95 |
| Japanese Business Email Politeness Rewrite | OpenAI Main | 85 |
| Japanese Appointment Intent Classification | Claude Main | 92 |
| Japanese Product Specification Extraction | Qwen Main | 91 |
| Japanese Support Escalation Note | Claude Main | 92 |
| Japanese Pricing Page Localization | Claude Main | 92 |
| Spanish Support Reply for Wrong Item | Claude Main | 89 |
| Spanish Ad Headline Localization | Claude Main | 92 |
| Spanish Order Confirmation Extraction | Claude Main | 85 |
| Spanish Billing Cancellation Reply | Claude Main | 91 |
| Spanish Survey Insight Clustering | Qwen Main | 83 |
平均分: 80