CNM 在 Gemini 2.0,拿下! 中发帖
LM竞技场
Model
Overall
Overall w/ Style Control
Hard Prompts
Hard Prompts w/ Style Control
Coding
Math
Creative Writing
Instruction Following
Longer Query
Multi-Turn
gemini-exp-1206
1
1
1
1
1
1
1
1
1
1
chatgpt-4o-latest-20241120
1
1
3
4
1
5
1
2
1
1
gemini-2.0-flash-exp
3
3
2
2
3
1
2
2
1
1
o1-preview
4
3
2
1
1
1
4
2
3
3
o1-mini
5
7
3
4
1
1
16
5
4
5
gemini-1.5-pro-002
5
6
6
7
7
5
4
5
5...