Anthropic's Claude 4 has achieved a 96.5% score on the USMLE Step 1 medical licensing exam, the highest by any AI model and surpassing the average human score of 73%, raising questions about AI's role in healthcare.
Implications
While the score doesn't mean AI can practice medicine, it demonstrates remarkable medical knowledge reasoning. Hospitals are evaluating Claude 4 for diagnostic assistance, research, and patient education roles.
Safety Focus
Anthropic emphasized that Claude 4 is designed to assist rather than replace physicians, with built-in guardrails that defer to human medical judgment and clearly communicate uncertainty.
- 96.5% USMLE score (human average: 73%)
- Highest AI medical exam score ever
- Hospitals evaluating for diagnostic assistance
- Built-in safety guardrails for medical use