Agent comparison

OpenAI vs Claude

Compare two strong generalist agents across overall score, pass rate, critical failures, language strengths, and cost tier.

Use case: General writing, support, and high-quality multilingual workflows

Overall winner

Claude Main

Based on the current Arena #2 preview average score.

Lower risk

OpenAI Main

Sorted by critical-failure rate, not a universal safety guarantee.

Value candidate

Claude Main

Prioritizes cost tier, then score.

MetricOpenAI MainClaude Main
Overall8687
Pass rate92%97%
Critical12%12%
Format pass100%100%
Win rate30%55%
Cost tierpremiumpremium

OpenAI Main

Strong generalist with balanced writing and support safety.

86
missed_dependencygeneric_ai_copyunsafe_refund_promise

Claude Main

Strong writing and safety boundaries, especially in support tasks.

87
too_verboseoverly_humbleunsafe_refund_promise