OpenAI GPT-5 Benchmarks Leak Showing Human-Level Reasoning on Graduate-Level Math and Science

Leaked benchmark results purportedly from OpenAI's upcoming GPT-5 model show performance matching or exceeding human experts on graduate-level mathematics, physics, and legal reasoning tasks, sparking intense debate about AI capabilities.

Benchmark Results

The leaked evaluation results, first reported by The Information and partially confirmed by OpenAI insiders, show dramatic improvements over GPT-4.

GPQA Diamond (graduate-level science): 89.2% (human expert average: 81%)
MATH benchmark (competition math): 96.4% (GPT-4: 76.6%)
Multilingual legal reasoning: 91.3% across 12 languages
Long-context coherence: maintains accuracy across 500,000+ token contexts

Industry Reaction

AI researchers are divided between those who see the results as a clear path toward artificial general intelligence and skeptics who argue benchmarks do not capture the full spectrum of human reasoning. OpenAI has declined to comment on the leak but is expected to announce GPT-5 at a May event.

OpenAI GPT-5 Benchmarks Leak Showing Human-Level Reasoning on Graduate-Level Math and Science

Benchmark Results

Industry Reaction

Share This Article

Related Articles

OpenAI Reaches $200 Billion Valuation as Largest Private Company Ever

Anthropic Claude 4 Passes Medical Licensing Exam With Near-Perfect Score

Meta Releases Llama 4 as Open Source Challenging OpenAI Dominance