Rank | Model | Score | Company |
---|---|---|---|
- | Human (avg) | 92% | n/a |
1st | Claude 3.5 Sonnet | 27% | Anthropic |
2nd | GPT-4 Turbo-Preview | 26% | OpenAI |
3rd | Claude 3 Opus | 25% | Anthropic |
4th | Llama 3 405B Turbo | 22% | Meta |
5th | Gemini 1.5 Pro (08-24) | 21% | |
6th | GPT-4 (06-23) | 18% | OpenAI |
7th | GPT-4o | 16% | OpenAI |
8th | DeepSeek-V2 | 15% | DeepSeek |
9th | Mistral Large v2 | 13% | Mistral AI |
10th | GPT-4o Mini | 5% | OpenAI |