Simple Bench - Basic Reasoning

Rank Model Score Company
- Human (avg) 92% n/a
1st Claude 3.5 Sonnet 27% Anthropic
2nd GPT-4 Turbo-Preview 26% OpenAI
3rd Claude 3 Opus 25% Anthropic
4th Llama 3 405B Turbo 22% Meta
5th Gemini 1.5 Pro (08-24) 21% Google
6th GPT-4 (06-23) 18% OpenAI
7th GPT-4o 16% OpenAI
8th DeepSeek-V2 15% DeepSeek
9th Mistral Large v2 13% Mistral AI
10th GPT-4o Mini 5% OpenAI
temperature: 0.2, top-p: 0.95, last run: 14-08-24