Multilingual Agent Arena

당신의 언어에서 이기는 AI Agent를 찾으세요.

AAA.win은 중국어, 영어, 일본어, 스페인어 실제 업무로 에이전트를 테스트합니다.

순위 보기 리포트 읽기

종합 순위

마케팅 문구가 아니라 다국어 비즈니스 성능으로 정렬합니다.

Rank	Agent	Overall	Win rate	Pass rate	Critical	Best language	Best for	Cost
1	Claude Main Anthropic	87	55%	97%	12%	English	지원	premium
2	OpenAI Main OpenAI	86	35%	92%	12%	English	작성	premium
3	Qwen Main Alibaba	84	25%	93%	10%	中文	추출	standard
4	Gemini Main Google	80	0%	82%	12%	English	추출	standard
5	DeepSeek Main DeepSeek	80	5%	70%	7%	中文	추출	low
6	Grok Main xAI	75	0%	37%	27%	English	작성	standard

핵심 발견

실무에 유용한 이야기는 항상 종합 1위와 같지 않습니다.

English scores did not predict multilingual rank.

Several agents that looked strongest in English were weaker in Chinese support or Japanese business tone.

Support tasks exposed unsafe promises.

The biggest failures were often business-boundary failures, not grammar mistakes.

Japanese writing separated grammar from natural tone.

Correct Japanese was not enough. Natural, concise business phrasing mattered.

Extraction revealed the widest reliability gap.

Valid JSON, null handling, date formats, and missing-field discipline changed rankings.

Language Winners

Find the agent that wins the language you actually work in.

최고 中文

89

Qwen Main

추출7% 치명

최고 English

93

OpenAI Main

작성7% 치명

최고 日本語

89

Claude Main

지원13% 치명

최고 Español

88

Claude Main

지원13% 치명

Failure Modes

The most common failures were not always language errors. They were business risks.

literal_translation

26

프리뷰 실행

unsafe_refund_promise

23

프리뷰 실행

weak_cta

21

프리뷰 실행

unsupported_claim

17

프리뷰 실행

invalid_json

13

프리뷰 실행

Task Evidence

Every score should lead back to prompts, rubrics, outputs, and failure tags.

Chinese Customer Complaint Triage

주요 위험: unsafe_refund_promise

승자: Qwen Main

unsafe_refund_promise

Chinese App Review Pain Point Summary

주요 위험: hallucinated_issue

승자: OpenAI Main

hallucinated_issue

Chinese Contract Field Extraction

주요 위험: hallucinated_signing_date

승자: Qwen Main

hallucinated_signing_date

Chinese Sales Call Summary

주요 위험: missed_buying_signal

승자: Qwen Main

missed_buying_signal

Chinese Invoice Dispute Reply

주요 위험: unauthorized_credit

승자: OpenAI Main

unauthorized_credit

SaaS Landing Page Hero Rewrite

주요 위험: generic_ai_copy

승자: OpenAI Main

generic_ai_copy

모든 태스크 보기

Agent Profiles

Each profile reflects Multilingual Agent Arena #2, not a universal model ranking.

Claude Main

Strong writing and safety boundaries, especially in support tasks.

English지원premium

too_verboseoverly_humbleunsafe_refund_promise

OpenAI Main

Strong generalist with balanced writing and support safety.

English작성premium

missed_dependencygeneric_ai_copyunsafe_refund_promise

Qwen Main

Strong Chinese business language and structured extraction.

中文추출standard

literal_translationunnatural_japaneseunauthorized_credit

Gemini Main

Reliable extraction profile with mixed localization performance.

English추출standard

literal_translationwrong_date_formatunsafe_refund_promise

DeepSeek Main

Best value profile for structured extraction and classification.

中文추출low

weak_ctamissing_fieldhallucinated_issue

Grok Main

Fast outputs with higher variance on business constraints.

English작성standard

unsafe_refund_promiseunsupported_claiminvalid_json