Leonid Herasimau's Avatar
AI·4 weeks ago

Anthropic Releases Next-Generation Hybrid Language Models — Claude Sonnet 4 and the “World’s Best Programming Model” Claude Opus 4

They can respond in two modes: with 'reasoning', but longer, or without it, but 'practically instantly'. 

  • Claude Opus 4 — 'the most powerful' model released by Anthropic, and also 'the world's best model for working with code,' the startup said. It 'excels' at programming and solving complex tasks, which will be useful for AI agent developers: especially if the model needs to work for 'several hours' and perform 'thousands of steps'.
  • When tested on the SWE-Bench Verified and Terminal-Bench benchmarks, the Claude Opus 4 model showed performance at 72.5% and 43.2% respectively. Higher than o3 and GPT-4.1 from OpenAI and Gemini 2.5 Pro from Google.
  • Claude Sonnet 4 is inferior to Opus 4 in 'most areas,' but surpasses its predecessor Sonnet 3.7 and demonstrates 'advanced' performance when working on everyday tasks. Its score in tests on the SWE-Bench Verified benchmark is 72.7%.
  • GitHub will use Sonnet 4 as the basis for its new coding agent in GitHub Copilot.
Comparison of models on different benchmarks. Source: Anthropic
Comparison of models on different benchmarks. Source: Anthropic
  • The new models are better than their predecessors at running multiple tools simultaneously and with memorization, follow user instructions more accurately, and can use different tools (e.g., internet search), including during 'reasoning'.
Claude Opus 4 plays Pokemon. Source: Anthropic
  • Subscribers to Pro, Max, Team, and Enterprise plans will get access to both models. Users of the free plan will get access to Claude Sonnet 4.
  • Both models have also been added to Amazon Bedrock, Vertex AI in Google Cloud, and Anthropic's own API. 1 million 'input' tokens will cost $15 for Opus 4 and $3 for Sonnet 4. 1 million 'output' tokens will cost $75 and $15 respectively.
0
78
Comments
Please log in to add comments.
Loading comments...
Recommendations