Test Claude Sonnet 4.6: Benchmark Results and Usage Guide
Anthropic has unveiled its latest Large Language Model (LLM), Claude Sonnet 4.6. This new model was released shortly after the launch of its previous offering, Claude Opus 4.6, on February 5. Anthropic describes Claude Sonnet 4.6 as their most advanced Sonnet model to date.
Key Features of Claude Sonnet 4.6
One of the standout features of this model is its 1 million token context window, which is currently in beta. Additionally, Claude Sonnet 4.6 has undergone internal safety evaluations, revealing a low tendency for hallucination and sycophancy. Anthropic emphasizes the model’s enhanced coding abilities, making it particularly appealing for developers.
How to Access Claude Sonnet 4.6
Users can access Claude Sonnet 4.6 through various platforms:
- Available as the default model on claude.ai and Claude Cowork for both free and Pro subscribers.
- Accessible via API and major cloud platforms.
Free users will encounter usage limitations that reset every five hours, influenced by current demand. For those needing higher usage limits, Madame Claude Sonnet 4.6 is available at the same pricing as previous models. The Claude Pro plan costs $20 per month or $17 a month with an annual payment. API access starts at $3 per million input tokens and $15 for each million output tokens.
Benchmark Performance of Claude Sonnet 4.6
According to Anthropic’s benchmark results, Claude Sonnet 4.6 excels in agentic financial analysis and office-related tasks. It has outperformed competitors such as Google’s Gemini 3 Pro and OpenAI’s GPT 5.2. Notably, Claude Sonnet 4.6 surpassed Anthropic’s own Opus 4.6 in various tasks, indicating its effectiveness.
Benchmark Statistics
Here are some of the benchmark test scores for Claude Sonnet 4.6:
| Benchmark | Score |
|---|---|
| GPQA Diamond | 89.9% |
| ARC-AGI-2 | 58.3% |
| MMMLU | 89.3% |
| SWE-bench Verified | 79.6% |
| Humanity’s Last Exam (HLE) | With tools: 49.0% / Without tools: 33.2% |
In a noteworthy endorsement, AI-driven insurance company Pace reported that Claude Sonnet 4.6 achieved the highest score among all Claude models on its complex insurance computer use benchmark. This success is significant, as the Claude Opus models are typically viewed as the benchmark for complex reasoning tasks.
Conclusion
Overall, Claude Sonnet 4.6 not only surpasses several previous models in performance but is also more affordable. With accessible pricing of $3 per million input tokens and $15 per million output tokens, it presents a compelling option for users seeking advanced AI capabilities.