Google DeepMind has released Gemini Ultra 2, claiming the model achieves human-level performance on complex reasoning benchmarks for the first time in AI history.

Benchmark Results

Gemini Ultra 2 scored 92% on the ARC-AGI benchmark, surpassing the human baseline of 85%. It also achieved record scores on mathematical reasoning, code generation, and scientific analysis tasks.

The model uses a novel architecture combining transformer networks with symbolic reasoning modules.

Skepticism Remains

Some AI researchers caution that benchmark performance doesn't equate to general intelligence, noting the model still struggles with certain common-sense reasoning tasks.