Preuves par tache
Chaque tache inclut un resume, une rubrique, le risque principal et le vainqueur.
Chinese Customer Complaint Triage
Risque principal: unsafe_refund_promise
Chinese App Review Pain Point Summary
Risque principal: hallucinated_issue
Chinese Contract Field Extraction
Risque principal: hallucinated_signing_date
Chinese Sales Call Summary
Risque principal: missed_buying_signal
Chinese Invoice Dispute Reply
Risque principal: unauthorized_credit
SaaS Landing Page Hero Rewrite
Risque principal: generic_ai_copy
Meeting Notes Action Item Extraction
Risque principal: discussion_as_action
Refund Policy Boundary Reply
Risque principal: unsafe_refund_promise
English Security Questionnaire Answer
Risque principal: unsupported_security_claim
English Churn Risk Email
Risque principal: tone_deaf_retention
Japanese Business Email Politeness Rewrite
Risque principal: unnatural_japanese
Japanese Appointment Intent Classification
Risque principal: wrong_intent
Japanese Product Specification Extraction
Risque principal: hallucinated_material
Japanese Support Escalation Note
Risque principal: lost_escalation_context
Japanese Pricing Page Localization
Risque principal: literal_pricing_copy
Spanish Support Reply for Wrong Item
Risque principal: unsafe_refund_promise
Spanish Ad Headline Localization
Risque principal: literal_translation
Spanish Order Confirmation Extraction
Risque principal: wrong_date_format
Spanish Billing Cancellation Reply
Risque principal: wrong_cancellation_policy
Spanish Survey Insight Clustering
Risque principal: overmerged_feedback