A federal judge has agreed to temporarily suspend the Trump administration's plan to eliminate hundreds of jobs at the agency ...
MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...
US startup Anthropic on Monday announced the launch of its new generative artificial intelligence model, Claude Sonnet 4.5, which it says is the ...