Anthropic has released Claude 4 Opus, which achieved state-of-the-art results on every major AI reasoning benchmark, surpassing GPT-5.4 and Gemini Ultra 2.0 in multiple categories.
Benchmark Results
- GPQA Diamond: 78.2% (GPT-5.4: 72.1%, Gemini: 69.8%)
- MATH-500: 96.4% (GPT-5.4: 94.1%)
- SWE-bench Verified: 72.0% (GPT-5.4: 64.2%)
- HumanEval+: 98.2% (GPT-5.4: 96.8%)
- ARC-AGI-2: 42.1% (GPT-5.4: 35.7%)
Key Improvements
Claude 4 Opus features a 500K token context window, improved instruction following, and significantly reduced hallucination rates. Anthropic credits advances in constitutional AI training and chain-of-thought reasoning.
Pricing
API pricing is $15 per million input tokens and $75 per million output tokens. A free tier is available through claude.ai with usage limits.