Anthropic's Claude 4 has achieved a 96.5% score on the USMLE Step 1 medical licensing exam, the highest by any AI model and surpassing the average human score of 73%, raising questions about AI's role in healthcare.

Implications

While the score doesn't mean AI can practice medicine, it demonstrates remarkable medical knowledge reasoning. Hospitals are evaluating Claude 4 for diagnostic assistance, research, and patient education roles.

Safety Focus

Anthropic emphasized that Claude 4 is designed to assist rather than replace physicians, with built-in guardrails that defer to human medical judgment and clearly communicate uncertainty.