OpenAI Main
Based on the current Arena #2 preview average score.
Compare a global premium generalist with a strong Chinese-language and structured-extraction candidate.
Use case: Cross-border teams deciding between global quality and Chinese-market fit
Based on the current Arena #2 preview average score.
Sorted by critical-failure rate, not a universal safety guarantee.
Prioritizes cost tier, then score.
| Metric | OpenAI Main | Qwen Main |
|---|---|---|
| Overall | 86 | 84 |
| Pass rate | 92% | 93% |
| Critical | 12% | 10% |
| Format pass | 100% | 100% |
| Win rate | 30% | 25% |
| Cost tier | premium | standard |